改进YOLOv8n模型对复杂农业场景下西红柿目标的检测

戈刚; 杨珺; 刘毅; 胡逸轩; 刘昊辉

doi:10.11975/j.issn.1002-6819.202411147

改进YOLOv8n模型对复杂农业场景下西红柿目标的检测

Detecting tomatoes in complex agricultural environments using improved YOLOv8n

摘要

摘要: 针对复杂农业场景下西红柿果实大量重叠，叶片、背景遮挡干扰严重，现有模型检测时鲁棒性、检测精度、效果欠佳等问题，提出一种基于改进YOLOv8n的西红柿目标检测模型：TOMO-YOLO。首先，提出了WAFF特征融合策略，替换YOLOv8n的多尺度特征融合方法，优化不同尺度特征的权重分配，增强特征图的表达能力。其次，为解决YOLOv8n在面对复杂农业场景时鲁棒性和泛化能力弱、精度不佳的问题，设计了AWD检测头，提升模型在复杂农业场景的鲁棒性和检测性能。最后，在公共西红柿数据集、自建西红柿数据集上进行模型对比试验。试验结果表明，TOMO-YOLO的整体性能优于YOLO系列模型：YOLOv8n、YOLOv10n、YOLOv6n、YOLOv5n，其中TOMO-YOLO在公共西红柿数据集、自建西红柿数据集上的边框精度分别为88.6%和77.2%，召回率分别为85.0%和59.1%，mAP_0.5分别为90.9%和66.7%，mAP_0.5～0.95分别为55.0%和37.9%，F1分数分别为85.0%和68.0%，相较于YOLOv8n，边框精度分别提升0.7和3.9个百分点，召回率分别提升5.8和2.6个百分点， mAP_0.5分别提升1.4和3.0个百分点、mAP_0.5～0.95分别提升1.6和1.2个百分点，F1分数分别提升2.0和4.0个百分点，证明了改进模型的有效性，为西红柿自动化检测与识别提供技术支持。

Abstract: Tomato target detection has been confined to the severe challenges under the complex agricultural scenarios at present. Specifically, the tomato fruits often appear in dense and overlapping states, while the leaves and complex backgrounds can cause significant occlusion and interference during detection. Furthermore, the existing detection models cannot fully meet the actual demands of high accuracy in such complex scenarios, particularly for the low robustness. In this study, a detection model was proposed for the tomato targets using the improved YOLOv8n, named TOMO-YOLO. A systematic optimization effectively enhanced the performance of the detection, due to the unknown disturbance factors in the complex agricultural scenarios. Three innovations of the improved model are shared as follows. Firstly, a weight allocation feature fusion strategy (WAFF) was proposed to replace the normally fixed feature fusion. The weight coefficients of multi-scale features were dynamically adjusted to effectively enhance the semantic expression of feature maps. A more flexible and precise fusion of multi-scale features was allowed to process the complex image information, according to the different image characteristics and detection needs. Secondly, an adaptive weight detection head mechanism (AWD) was designed during the actual detection. The contribution rates of the different features depended on the complexity and interference of the images. The AWD module was established for the scientific and reasonable feature quality assessment function. The real-time and dynamic adjustment was realized for the feature selection weight of the detection head. This adaptive adjustment of weights was used to focus on more key features against the complex interference. The robustness and generalization of the improved model were enhanced to maintain the high detection performance in agricultural scenarios with varying complexity. Finally, three strategies were adopted to optimize and then refine the improved model at the structure level in the multiple dimensions. 1) The spatial pyramid pooling with the ELAN (SPPELAN) module was used to replace the standard spatial pyramid pooling (SPPF) structure. The receptive field was expanded after parallel pooling in the SPPELAN module. The image feature was captured over a wider range. The feature information loss was effectively reduced during convolution, compared with the standard SPPF structure. The performance of the model was enhanced to capture the image details. 2) The partial self-attention (PSA) mechanism and feature enhancement module (FEM) were embedded in the network of the feature extraction. The PSA mechanism was used to guide the model on the key regions and important features in the image. While the FEM further strengthened these key features. The two strategies were synergistically operated to significantly enhance the representational ability of the occluded targets after multi-scale feature interaction. Accurate identification and localization of tomato targets were realized in complex situations, such as the leaf and background occlusion. 3) Partial Ghost Conv convolution kernels were introduced to optimize the effective compression of model parameters. The high accuracy of the detection reduced the computational cost for the storage requirements of the model. Its operational efficiency was also improved for the more lightweight and flexible in practical applications. The improved model was deployed on the different hardware platforms. A series of experimental tests were conducted to verify the effectiveness of the improved model. The results show that the overall performance of the TOMO-YOLO was superior to that of the YOLO series models: YOLOv8n, YOLOv10n, YOLOv6n, and YOLOv5n. On both the public and the self-built tomato dataset, the TOMO-YOLO has achieved the border precision of 88.6% and 77.2%, recall rate of 85.0% and 59.1%, mAP_0.5 of 90.9% and 66.7%, mAP_0.5-0.95 of 55.0% and 37.9%, and F1 score of 85.0% and 68.0%, respectively. Compared with the YOLOv8n, the border precision increased by 0.7 and 3.9 percentage points, the recall rate by 5.8 and 2.6 percentage points, mAP_0.5 by 1.4 and 3.0 percentage points, mAP_0.5-0.95 by 1.6 and 1.2 percentage points, and F1 scores by 2.0 and 4.0 percentage points, respectively. The effectiveness of the improved model was verified to provide technical support for tomato target detection in such environments. The finding can also provide technical support to automatically detect and recognize the tomatoes.

HTML全文

参考文献(32)

施引文献

资源附件(0)