Abstract:
Litchi is one of the most important fruits widely planted in southern China. However, manual identification cannot fully meet the large-scale production at present, due to the time-consuming and laborious. Particularly, Litchi is rich in a variety of resources, leading to the large similarity among varieties. Furthermore, the rapid and accurate identification of litchi varieties is often required in the process of intelligent production and sales. In this study, a recognition model (SCL-YOLO11) was proposed for the litchi fruit variety using improved YOLO11. Firstly, the C2f module (CSP2f, cross-stage partial fusion) of YOLO11model was added with the SIMAM attention mechanism to replace the original C2PSA module. Different feature dimensions of litchi images were focused to automatically adjust the features of each spatial position in the key areas; Secondly, CMUNeXt convolution with the large kernel and separable depth was applied to the C3K2 module, in order to improve the perception of the litchi geometric features under various occlusion. The experimental results show that the recognition accuracy of the improved SCL-YOLO11 model was 99.61%, which was improved by 27.10%, 15.06%, 12.47%, 7.05%, 4.83% and 2.89%, respectively, compared with the VGG-19, VIT, AlexNet, ResNet-50, YOLOv8 and YOLO11 model. The parameter size of the model was 1.27M, and the calculation was 3.1G, which was reduced by 17.5% and 6.1%, respectively, compared with the YOLO11 model. The improved SCL-YOLO11 model was used to carry out real-time and accurate variety recognition for the litchi fruits with subtle texture differences and shape differences. The network scale was then reduced after data collection. Furthermore, the attention mechanism was fused with the C2f module. The weight of the SimAM attention mechanism was adjusted for the features of the C2f module after splicing. The enhanced features were outputted after segmentation. The accuracy and recall rate of the improved model were enhanced by 1.29% and 1.09%, respectively, compared with the original. In the convolution module, the CMUNeXt block with the large-core depth separable convolution was applied in the parallel convolution layer of the C3K2 module. The diverse geometric features of the litchi were then extracted to maintain the depth of the feature extraction in the C3K2 module. A series of tests were conducted to verify the improved SCL-YOLO11 model in the natural environment. The data sets were selected with different light intensity and occlusion conditions. The recognition accuracy of the SCL-YOLO11 model was 96.89%, 97.87%, and 98.48% under the conditions of insufficient illumination, strong light, and shadow, respectively; whereas the recognition accuracy of the SCL-YOLO11 model was 98.18% and 96.89% under the condition of the fruit and branch-leaf occlusion, respectively. There were better recognition accuracy and robustness, compared with the YOLOv8 and YOLO11. In a simulation, the reasoning time of the improved SCL-YOLO11 model was much less than that of the YOLOv8 and YOLO11 under different detection quantities. Especially when the detection quantity reached 1000, the reasoning time was reduced by 50.58% and 22.73%, respectively, indicating higher accuracy than the rest. As such, both parameters and calculations were reduced for the real-time accuracy. The finding can provide the technical reference to develop intelligent equipment for the quality detection of litchi fruits.