不同模型在渔业CPUE标准化中的比较分析

杨胜龙; 张禹; 张衡; 樊伟

doi:10.11975/j.issn.1002-6819.2015.21.034

摘要: 为了提高渔业数据单位捕捞努力量渔获量（catch per unite of effort，CPUE）标准化数据的质量和模型连续稳定预测能力，该文采用人工神经网络（artificial neural network，ANN）、回归树（regression trees，RT）、随机森林（random forest，RF）和支持向量机（support vector machine，SVM）等机器学习方法和传统的广义线性模型（generalized linear model，GLM）等方法，对2000－2013年大西洋大眼金枪鱼（Thunnus obesus）延绳钓CPUE数据进行标准化。采用平均绝对误差、平均均方误差、3种相关系数（Pearson's，Kendall's和Spearman's）和标准化均方误差等评价指标对不同模型标准化结果进行对比，寻找较优的标准化方法。研究结果表明，在验证数据集SVM方法得到的3种相关系数（0.596，0473和0.632）和RF（0.623，0.456，0.621）相似，高于RT(0.516，0.432和0.586)、ANN(0.428，0.249和0.365)和GLM(0.199,0.106和0.159)。SVM预测的均方误差（11.25）、平均绝对误差（2.107）和标准化均方误差（0.652）略低于RF（11.655，2.377和0.661），明显低于RT（14.999，2.434和0.801）、ANN(16.692，2.883和0.823)和GLM（16.517，2.777和0.993）。各项指标揭示SVM方法要优于其他4种方法，RF次之，GLM计算结果在所有方法中最差，不适合渔业数据CPUE标准化。SVM和RF方法应该被优先考虑用于渔业数据CPUE标准化。研究结果为渔业资源管理和保护提供更好的支持。

Abstract: Abstract: Catch per unite of effort (CPUE) is often used as an index of relative abundance in fisheries stock assessments. However, the trends in nominal CPUE can be influenced by many factors in addition to stock abundance, including the choice of fishing location and target species, and environmental conditions. Therefore CPUE standardization is a basic work in stock assessment and management. CPUE standardization research is a rapidly developing field, and many statistical models have been used in this field. Improvement of data quality and continued evaluation of model performance should be given priority so as to provide recommendation for management and conservation. In this paper, we evaluated the performance of 5 candidate methods (artificial neural network (ANN), regression trees (RT), random forest (RF), support vector machine (SVM) and generalized linear model (GLM)) using the actual fishery data for bigeye tuna (Thunnus obesus) from the International Commission for the Conservation of Atlantic Tunas (ICCAT). Statistical performances of these 5 models were compared based on mean square error (MSE), mean absolute error (MAE), 3 kinds of correlation coefficients (the Person's, Kendall's rank and Spearman's rank) and normalized mean square error (NMSE), which were measured by the difference between the observed and the corresponding predicted values. The results showed that the performance of the SVM was better than (or equivalent to) the RF, and their MSE, MAE, 3 kinds of correlation coefficients and NMSE were almost the same. These 2 algorithms were superior to the other methods based on the results from the training and testing dataset and all data, except the NMSE value in training dataset. The NMSE value of the RT was better than the SVM and RF. The performance of the RT was better than that of the ANN, but inferior to that of the SVM and RF except the NMSE value in training dataset. The performance of the ANN was better than that of the GLM. The performance of the GLM was almost the lowest in all the models, which suggested the performance of the traditional statistical method (GLM) was inferior to the other nonlinear statistical models in fishery data CPUE standardization. The annual trends of the standardized CPUE from the ANN, RT, RF and SVM models were similar to nominal CPUE from 2001 to 2013. But the annual trends of the GLM did not coincide with nominal CPUE. The average CPUE for the SVM method was almost always lower than that of the nominal CPUE value from 2001 to 2013. In this regard, because the more important and essential point was the comparison of three parameter selection in the testing data based on the validation, it was concluded that the SVM and RF were the best methods in fishery data CPUE standardization. The SVM and RF should be considered as potential statistical methods for fishery data CPUE standardization in fisheries stock assessment and management.

不同模型在渔业CPUE标准化中的比较分析

Comparison and analysis of different model algorithms for CPUE standardization in fishery