基于改进YOLOv10n的自然场景下芒果果实与果梗检测方法

李新龙; 兰玉彬; 王会征

doi:10.11975/j.issn.1002-6819.202504002

基于改进YOLOv10n的自然场景下芒果果实与果梗检测方法

Detecting mango fruits and peduncles in natural scenes using improved YOLOv10n

摘要

摘要: 为了精准快速检测自然环境下的芒果果实与果梗，该研究基于YOLOv10n提出一种改进的检测模型MAL-YOLOv10n。首先，将骨干网络中传统的C2f模块替换为融合了感受野注意力卷积（receptive-filed attention convolution, RFAConv）与卷积注意力机制（convolutional block attention module, CBAM）的RFAM-C2f模块，以增强模型在自然环境下对果实特征的提取能力。其次，针对细小果梗检测精度低的问题，引入双向特征金字塔网络（bidirectional feature pyramid network, BiFPN），实现特征提取过程中双向信息的高效流通，从而提升对小目标果梗的感知效果。最后，在颈部网络中引入部分卷积（partial convolution, PConv），减少模型参数量并提高计算效率。试验结果表明，MAL-YOLOv10n模型的精确率为94.9%，召回率为89.7%，平均精度均值mAP为95.5%，与YOLOv10n相比，分别提高了3.1、3.3和2.5个百分点。同时，模型进一步轻量化，浮点运算量、参数量和模型大小分别降低了12%、3.7%和8.6%，检测帧率达到119.6帧/s。该研究可为复杂自然场景下的芒果自动化采摘提供技术支持。

Abstract: Accurate and rapid detection of mango fruits and peduncles can often be required in natural environments. However, the current detection is limited to similar background and target colors, foliage occlusion, as well as overlapping fruits. In this study, an improved model, MAL-YOLOv10n, was proposed to detect the mango fruits and peduncles using the YOLOv10n framework. Specifically, the detection accuracy of small target peduncles was optimized to tackle the challenges. Several modifications were made to the architecture and key modules, resulting in significant improvements in the model performance. Firstly, the original C2f module was reengineered in the backbone network in order to create an RFAM-C2f module. In the traditional C2f module, the Bottleneck structure relied mainly on standard 3×3 convolutions for feature extraction. The current approach often failed to capture the global contextual information in the complex scenes with similar background hues, occlusion, or overlapping fruits. To overcome this limitation, the 3×3 convolution in the Bottleneck was replaced by the receptive-field attention convolution (RFAConv). An attention mechanism, RFAConv, was integrated to expand the receptive field. Global contextual cues were captured to extract the initial feature at a more robust stage. Additionally, a Convolutional Block Attention Module (CBAM) was added after the modified Bottleneck to further refine feature selection. The CBAM was sequentially applied to the channel and spatial attention in order to automatically focus on the target regions and effectively suppress background noise and interference. The RFAM-C2f module was used to accurately extract the effective features of the mango fruits and peduncles. Secondly, a bidirectional feature pyramid network (BiFPN) was introduced into the feature pyramid network in order to improve the detection accuracy of small target peduncles. Conventional networks of unidirectional feature pyramids often suffered from insufficient information flow during multi-scale feature fusion. A bidirectional transmission, along with learnable weighted coefficients, was employed to adaptively integrate the features at the different scales. The low-level information was well fused with the high-level semantic information in order to effectively mitigate the loss of feature information due to the small size of the targets. Experimental results demonstrated that the incorporation of BiFPN significantly improved both recall and precision for the small peduncle detection. Finally, the neck network was optimized to introduce the partial convolution (PConv). A lightweight module, PConv-C2f, was formed for the localized partial convolution, thereby reducing unnecessary computations and memory accesses while maintaining effective feature extraction. A robust extraction of some feature was obtained to decrease the computational complexity and parameter count. The MAL-YOLOv10n significantly outperformed the original YOLOv10n model at various metrics. Specifically, the improved model achieved a precision of 94.9%, a recall of 89.7%, and a mean average precision (mAP) of 95.5%, indicating the improvements of 3.1%, 3.3%, and 2.5% over YOLOv10n, respectively. The detection was achieved at a frame rate of 119.6 frames per second. In light weighting, MAL-YOLOv10n reduced the floating-point operations, parameter count, and model size by 12%, 3.7%, and 8.6%, respectively. Furthermore, the superior performance of the MAL-YOLOv10n was achieved in complex scenarios and small target detection, compared with the mainstream object models, including Faster-RCNN, SSD, YOLOv5s, YOLOv7-tiny, YOLOv8n, YOLOv8s, YOLOv10n, YOLOv10s, YOLO11s, YOLOv12n, and RT-DETR. In summary, an optimal balance between detection speed and accuracy was obtained for the exceptional robustness under complex environmental conditions. The finding can also provide valuable technical support for mango harvesting in challenging natural scenes.

HTML全文

参考文献(30)

施引文献

资源附件(0)