Abstract:
Surface defects in potatoes have seriously threatened the product quality, food safety, and commercial value during grading and sorting in modern agriculture. However, the conventional deep learning approaches can often struggle to balance the accuracy, computational efficiency, and deployment flexibility, particularly under variable lighting and complex backgrounds with small or ambiguous defect features. An accurate and rapid detection is often required for high accuracy, low model complexity, and real-time applicability. In this study, a real-time detection framework was proposed using an enhanced YOLOv11-MML (Multimodule Lightweight). The original YOLOv11n architecture was also enhanced to incorporate multiple modules, multiscale feature extraction, fine-detail representation, and overall model compactness. The model was more suitable for deployment on embedded or edge computing devices with limited computational resources. The high accuracy and robust performance were achieved in real-world industrial environments. Firstly, the Multiscale Edge Information Select (MEIS) module replaced the standard convolutional unit in the C3k2 backbone. MEIS integrated the multi-receptive field aggregation with the edge-aware attention. The high-frequency and fine-grained texture features were captured after integration. Especially, the subtle surface defects were effectively detected, such as the wormholes, shallow scratches, and minor scabs typically difficult to identify. Secondly, the Dual-Domain Selection Mechanism (DSM) was introduced to simultaneously enhance the spatial and channel attentions. This dual-domain attention of the network better distinguished the defect-relevant regions from background noise and clutter. Particularly, the detection of ambiguous defects was improved under non-uniform lighting or partially occluded scenarios, such as the green skin and decay. Thirdly, the Multi-Scale Precision-Efficient Downsampling (MPED) structure was implemented to improve the sensitivity to the defects at varying scales. Semantic consistency was preserved during resolution reduction. The critical multiscale contextual information was retained to reduce the loss of fine structural details. This module was enhanced to recognize the defects with diverse sizes, shapes, and orientations. Furthermore, the Lightweight Detail-Enhanced Convolutional Head (LDECH) was designed to reduce the computational overhead for high classification and localization precision. These tasks were decoupled to optimize the network depth. As such, the LDECH enhanced the boundary delineation for the inference speed, efficient deployment on the edge computing platforms for high-throughput industrial systems. Experimental results demonstrate that the optimal YOLOv11-MML model was achieved in a precision of 96.5%, a recall of 91.3%, and a mean average precision (mAP) of 96.7%. Compared with the baseline YOLOv11n, these values were improved by 5.8, 5.8, and 4.2 percentage points, respectively. Additionally, the parameter count and size were reduced to 1.9 million and 4.6 MB, respectively, which were reduced by 26.9% and 13.2%, respectively. In real-world deployment, the YOLOv11-MML model was integrated into a dual-channel potato defect detection and sorting system equipped with high-speed industrial cameras and a conveyor-based flipping mechanism. Multi-frame image acquisition was used for the surface coverage. An inference speed of 171.3 frames per second (FPS) was achieved in the real-time processing of 12 potatoes per second with an overall classification accuracy of 94.0%. The superior performance of the YOLOv11-MML model was achieved in the detection accuracy, efficiency, and real-time applicability. The finding can offer a scalable and reliable solution to inspect the intelligent surface defects in modern agro-industrial production.