Abstract:
Potatoes can be the fourth largest food crop in the world. However, conventional harvesting has not fully met the requirement of large-scale production in recent years. Particularly, manual inspection of the impurity sorting has severely constrained the harvesting efficiency. Impurity detection is often required for the intelligence level. Furthermore, existing detection has commonly suffered from high computational complexity, excessive memory consumption, and low real-time performance. Particularly, the complex environments of potato pickup harvesters can also exacerbate the difficulty in the detection. In this study, an efficient impurity detection was developed for the unmanned impurity sorting in the potato pickup harvesters. A lightweight model (named PLP-net) was proposed using YOLOv8n. Firstly, the backbone network (P-Backbone) and detection head (P-Head) were redesigned from the original model. The P-Backbone preserved the original semantic information, according to the down-sampling branch. The multi-scale features were integrated to significantly enhance the feature extraction. The P-Head was used to eliminate the small-object detection head for the medium and large targets. The detection was improved to tailor for the impurity scene. Secondly, the ECA attention mechanism was introduced into the C2f module of the model. The appropriate weights were assigned to the different features. The critical information was focused on suppressing the irrelevant details. The accuracy of impurity recognition was enhanced for the favorable conditions after pruning. Additionally, the Focal-DIoU loss function was adopted to alleviate the imbalanced distribution of the positive and negative samples in the impurity datasets. The Focal Loss and DIoU functions were combined to reduce the loss contribution from the easily classified samples. The bounding box regression was optimized to accelerate the convergence. Finally, a structured pruning pipeline was achieved in sparse training, channel pruning, and model finetuning. The redundant channels were effectively eliminated for the lightweight model. The computational load and memory usage were reduced to maintain high accuracy. A series of experiments were carried out to evaluate the performance of the improved model. Multiple metrics were employed, including precision, recall, mean average precision (mAP), floating-point operations (FLOPs), frames per second (FPS), and model size. Ablation tests demonstrate that the superior overall performance of the PLP-net model was achieved, with a substantial reduction of 7.2 GFLOPs in the computational complexity, a 99.4 FPS improvement in frame rate, a 2.1 MB reduction in model size, and only marginal accuracy degradation. The computational efficiency, real-time capability, and memory footprint were highly suitable for the deployment of the embedded devices. The TensorRT inference framework was also utilized to deploy the PLP-net model on an industrial computer. There was an accelerated inference speed of 52.7 FPS—1.7 times faster than its pre-optimized version. An impurity detection was developed using PyQt5 supports multiple input modalities, including images, videos, and camera feeds. The real-time outputs facilitated the operator's observation of the detection, such as the detection time, target counts, and positional coordinates. In summary, the robust performance of impurity detection was achieved with the lightweight PLP-net model in the practical potato scene. A reliable technical solution can be offered for unmanned sorting in potato pickup harvesters. This advancement can also provide a strong practical reference and theoretical support to the intelligent application in the potato industry.