Abstract:
To address the issues of high missed detection rates and large tracking and counting errors in citrus yield estimation using UAV remote sensing, caused by dense fruit occlusion and small target size, this study utilizes video data collected by a DJI Phantom 4 UAV at an angle of approximately 45° and constructs a citrus target detection and tracking dataset. A novel automatic citrus counting method based on UAV video streams, combining a lightweight YOLO11-PMSL model with an improved ByteTrack algorithm, is proposed. Based on the YOLO11n network architecture, this study simplifies the detection head layer to a three-level structure (P2-P4) by reconstructing the feature pyramid structure. By removing deep redundant modules and fusing high-resolution shallow features, the model's ability to perceive small targets is significantly improved. Secondly, the C3k2-MSEIE multi-scale edge enhancement module is introduced. Through adaptive scale fusion and contour enhancement, it not only strengthens the ability to express local details and improves the ability to extract fruit contours, but also preserves the overall morphological features of the fruit, exhibiting better feature expression capabilities in densely populated fruit areas. Subsequently, the loss function is replaced by SIoU, and a direction-sensitive constraint is introduced to improve the localization accuracy and training stability of the detection boxes. Finally, the LAMP method is used to prune the model, remove redundant weights, reduce the number of parameters and floating-point operations, compress the model size, and achieve model lightweighting. Regarding improvements to the ByteTrack algorithm framework, to address the shortcomings of IoU in spatial location measurement and further enhance the accuracy and stability of fruit tracking in complex orchard environments, the similarity metric in ByteTrack was replaced with DIoU. Simultaneously, a region counting anti-shake mechanism was embedded in the algorithm, effectively solving the target ID jump problem caused by occlusion and achieving accurate counting of citrus fruits. Experimental results show that all improvements to the YOLO11-PMSL model effectively improved model performance. Specifically, compared to the original YOLO11n object detection model, after reconstructing the feature pyramid into a P2-P4 three-level detection structure, the number of model parameters was reduced to 116 m, the model size was compressed to 2.6 MB, and the recall and mAP0.5 metrics were significantly improved by 7.9 and 5.1 percentage points, respectively. This verifies that the reconstructed model is more lightweight and has higher detection accuracy for small targets. After introducing the C3k2-MSEIE edge enhancement module, the model's precision, recall, and mAP0.5 improved by 2.2, 10.7, and 8.7 percentage points, respectively. Replacing the loss function from CIoU to SIoU accelerated the model's convergence speed and further improved its performance. To meet the lightweight deployment requirements of edge terminals, the LAMP algorithm was used to prune the model. After pruning, the model's performance remained at the baseline level before pruning, while the number of parameters, floating-point operations, and model size were significantly reduced compared to before pruning. Ultimately, in the object detection task, the improved model achieved improvements in precision, recall, and mAP
0.5 by 3.3, 11.6, and 9.3 percentage points, respectively. In terms of lightweighting, compared to the original model, the number of parameters, model size, and floating-point operations were reduced by 86.05%, 76.36%, and 26.98%, respectively. Regarding detection speed, compared to the original model, the speed was improved by 65.18%, indicating that the improved YOLO11-PMSL model proposed in this study achieved a dual improvement in detection accuracy and speed on the citrus dataset. In the object tracking task, the improved ByteTrack multi-object tracking algorithm achieved an accuracy of 92.8% and a tracking precision of 81.7%. Compared to SORT, DeepSORT, and BotSort algorithms, the tracking accuracy was improved by 5.5, 5.7, and 4.3 percentage points, respectively, and the tracking precision was improved by 19.2, 19.4, and 10.8 percentage points, respectively. Comparing the counting results of the improved model with those of manual counting, the average accuracy of citrus counting reached 88.4%, and the counting error of the improved model was smaller than that of manual counting. This method can effectively realize citrus counting in farmland scenarios, providing a technical approach for citrus yield prediction.