基于YOLOv11-MML的马铃薯表面缺陷实时检测方法

朱然辉; 王相友; 吴海涛; 刘书玮; 黄杰; 李继昊; 王恒仁

doi:10.11975/j.issn.1002-6819.202504149

基于YOLOv11-MML的马铃薯表面缺陷实时检测方法

Real-time detection method for potato surface defects based on YOLOv11-MML

摘要

摘要: 针对现有马铃薯表面缺陷检测方法精度不足、模型冗余高和实时部署性能受限等问题，该研究提出一种基于 YOLOv11-MML的马铃薯表面缺陷实时检测方法。首先，采用多尺度边缘信息选择机制（multiscale edge information select, MEIS）替代 C3k2 结构中的卷积单元，结合双域选择注意力机制（dual-domain selection mechanism, DSM）增强边缘与细节特征提取能力，提高检测精度；其次，引入多尺度精准下采样结构（multi-scale precision-efficient downsampling, MPED），提升模型对不同尺度缺陷的感知能力；最后，设计轻量化细节增强卷积检测头（lightweight detail-enhanced convolutional head, LDECH），在保证检测精度的同时提升模型轻量化水平。改进后的 YOLOv11-MML 模型的准确率、召回率和平均精度均值分别达到 96.5% 、91.3%和 96.7%，较原模型提升了 5.8 、5.8和 4.2 个百分点；参数量和权重分别为 1.9 M 和 4.6 MB，较原模型减少了 26.9% 和 13.2%。在实际部署中，YOLOv11-MML 模型应用于双通道马铃薯缺陷检测分选机，推理速度达 171.3 帧/s，可满足马铃薯12个/s的实时检测需求，整体检测准确率达 94.0%，验证了其在实际工况下的实用性与工程适应性，为马铃薯表面缺陷在线检测提供了一种高效精准的参考方案。

Abstract: Surface defects in potatoes have seriously threatened the product quality, food safety, and commercial value during grading and sorting in modern agriculture. However, the conventional deep learning approaches can often struggle to balance the accuracy, computational efficiency, and deployment flexibility, particularly under variable lighting and complex backgrounds with small or ambiguous defect features. An accurate and rapid detection is often required for high accuracy, low model complexity, and real-time applicability. In this study, a real-time detection framework was proposed using an enhanced YOLOv11-MML (Multimodule Lightweight). The original YOLOv11n architecture was also enhanced to incorporate multiple modules, multiscale feature extraction, fine-detail representation, and overall model compactness. The model was more suitable for deployment on embedded or edge computing devices with limited computational resources. The high accuracy and robust performance were achieved in real-world industrial environments. Firstly, the Multiscale Edge Information Select (MEIS) module replaced the standard convolutional unit in the C3k2 backbone. MEIS integrated the multi-receptive field aggregation with the edge-aware attention. The high-frequency and fine-grained texture features were captured after integration. Especially, the subtle surface defects were effectively detected, such as the wormholes, shallow scratches, and minor scabs typically difficult to identify. Secondly, the Dual-Domain Selection Mechanism (DSM) was introduced to simultaneously enhance the spatial and channel attentions. This dual-domain attention of the network better distinguished the defect-relevant regions from background noise and clutter. Particularly, the detection of ambiguous defects was improved under non-uniform lighting or partially occluded scenarios, such as the green skin and decay. Thirdly, the Multi-Scale Precision-Efficient Downsampling (MPED) structure was implemented to improve the sensitivity to the defects at varying scales. Semantic consistency was preserved during resolution reduction. The critical multiscale contextual information was retained to reduce the loss of fine structural details. This module was enhanced to recognize the defects with diverse sizes, shapes, and orientations. Furthermore, the Lightweight Detail-Enhanced Convolutional Head (LDECH) was designed to reduce the computational overhead for high classification and localization precision. These tasks were decoupled to optimize the network depth. As such, the LDECH enhanced the boundary delineation for the inference speed, efficient deployment on the edge computing platforms for high-throughput industrial systems. Experimental results demonstrate that the optimal YOLOv11-MML model was achieved in a precision of 96.5%, a recall of 91.3%, and a mean average precision (mAP) of 96.7%. Compared with the baseline YOLOv11n, these values were improved by 5.8, 5.8, and 4.2 percentage points, respectively. Additionally, the parameter count and size were reduced to 1.9 million and 4.6 MB, respectively, which were reduced by 26.9% and 13.2%, respectively. In real-world deployment, the YOLOv11-MML model was integrated into a dual-channel potato defect detection and sorting system equipped with high-speed industrial cameras and a conveyor-based flipping mechanism. Multi-frame image acquisition was used for the surface coverage. An inference speed of 171.3 frames per second (FPS) was achieved in the real-time processing of 12 potatoes per second with an overall classification accuracy of 94.0%. The superior performance of the YOLOv11-MML model was achieved in the detection accuracy, efficiency, and real-time applicability. The finding can offer a scalable and reliable solution to inspect the intelligent surface defects in modern agro-industrial production.

HTML全文

参考文献(36)

施引文献

资源附件(0)