基于改进YOLOv11n的复杂环境下柑橘识别

罗一鸣; 达列雄; 张鹏超; 吴凡凡

doi:10.11975/j.issn.1002-6819.202509003

基于改进YOLOv11n的复杂环境下柑橘识别

Recognizing citrus in complex environments using the improved YOLOv11n

摘要

摘要: 针对复杂环境下柑橘果实曝光不足、枝叶遮挡且现有检测方法难以准确识别的问题，提出一种基于改进YOLOv11n的复杂环境柑橘检测模型。该研究设计了倒置残差注意力移位卷积IRSC（inverted residual-attention shiftwise convolution）模块并使用它对原主干网络的C3k2（cross-stage partial with kernel-size 2）模块进行改进，提高了模型的检测性能。为了改善模型在低光图像上特征提取难的问题，使用低光照增强网络Retinexformer（one-stage retinex-based transformer）模块进行图像增强，提高了模型的特征提取能力。最后使用下采样ADown模块对部分普通卷积进行替换，降低了模型复杂度。试验结果表明，改进后的YOLOv11n模型在复杂环境下柑橘检测中，平均精度均值mAP@0.5为87.1%，召回率为79.1%，与原模型相比，分别提高了1.9、3.0个百分点，参数量和模型大小分别降低了8.5%、4.0%。与Faster R-CNN、SSD、EfficientDet、YOLOv5n、YOLOv8n、YOLOv10n、YOLOv12n、RT-DETR-l主流目标检测模型相比，改进模型在检测精度和参数量方面均表现出优势。该研究提出的方法有效提高了复杂环境下柑橘的检测精度，为柑橘的自动化管理和采摘提供技术参考。

Abstract: A precise identification is often required to reduce the obstruction by the branches and leaves in an orchard. In this study, an accurate and rapid model was proposed to detect citrus fruits under complex environments using the improved YOLOv11n. An IRSC (inverted residual-attention shiftwise convolution) module was used to improve the C3k2 (cross-stage partial with kernel-size 2) module of the original backbone network. A spatial weighting mechanism was adopted with a morphological prior Gaussian bias in order to enhance the feature response of the key parts. More attention was focused on the center of the target region in order to fully utilize the approximately round morphological features of the citrus targets. Important regions were emphasized to better focus on the critical aspects of the citrus fruits, even when they were occluded by the branches and leaves. Meanwhile, an inverted residual design was combined to expand the receptive field of the weak features. More contextual information was also captured for the small targets and weak features under the low-light conditions. Moreover, the IRSC module also broke through the local receptive field of the small convolution kernels. The network architecture was then realized to effectively simulate the global context modelling of the large-kernel convolution. The more targeted performance of the detection was improved significantly, compared with the original network structure. The feature extraction was suitable for the low-light images. The Retinexformer (one-stage retinex-based transformer) module was utilized for the low-light enhancement of the network. Strong multi-scale illumination decomposition was obtained for the end-to-end enhancement of the dark areas. The illumination was then decomposed at multiple scales. The better performance was achieved to accurately enhance the dark regions, where citrus fruits were located, and the feature extraction was greatly enhanced after optimization. In the case of the underexposure, the object detection was often required to extract the features of the citrus fruits in orchards, due to the low contrast of the image, detailed contours, and uneven illumination. This integration effectively improved the contrast and illuminance of the image with less noise. The better feature extraction of the citrus fruits improved the accuracy of the object detection. Furthermore, the ADown module was employed to replace some common convolutions during downsampling. Multi-branch parallel processing and feature fusion were designed to extract the features from the different branches and then effectively fuse them. The model complexity was reduced the number of parameters for high accuracy. The results show that the average accuracy of the improved YOLOv11n model reached an mAP@0.5 of 87.1%, with a recall rate of 79.1%, in the citrus detection under the complex environments. Compared with the original model, the mAP@0.5 and recall rate increased by 1.9 and 3.0 percentage points, respectively. Additionally, the number of model parameters and size was reduced by 8.5% and 4.0%, respectively. Ablation studies demonstrate that each module contributes significantly to the overall performance improvement: the Retinexformer module alone increases mAP@0.5 by 1.1 percentage points, the C3k2-IRSC module contributes an additional 0.6 percentage points, and the combination of Retinexformer and C3k2-IRSC achieves 1.5 percentage points improvement. The ADown module reduces parameters by 17.4% when used alone, while the final model with all three modules achieves an 8.5% parameter reduction. An effective collaborative mechanism was formed: Retinexformer provided standardized input through image enhancement, IRSC extracted global contextual features based on the enhanced input, and ADown balanced feature preservation and compression while controlling parameter growth. It achieves the best balance between accuracy and efficiency, with the collaborative mechanism providing synergistic gains beyond the sum of individual module contributions. This collaborative design enables the detector to cope with underexposure, heavy occlusion, and complex backgrounds in orchards within a unified framework. In comparison with mainstream object detection models, including Faster R-CNN, SSD, EfficientDet, YOLOv5n, YOLOv8n, YOLOv10n, YOLOv12n, and RT-DETR-l, the proposed model attains higher mAP@0.5 by 1.8 to 25.6 percentage points, with parameter reductions ranging from 5.6% to 92.6%, indicating clear advantages in both detection accuracy and model compactness. The high accuracy was effectively achieved in detecting the citrus in the complex environments. The finding can also provide valuable technical references for the picking of the citrus in a more efficient and intelligent fruit industry.

HTML全文

参考文献(39)

施引文献

资源附件(0)