Abstract:
Citrus yield estimation is often required for low missed detection rates, tracking, and counting errors using UAV remote sensing, particularly for dense fruit occlusion and small target size. In this study, citrus tracking and counting were proposed using UAV remote sensing video imagery with improved YOLO11. The video data was first collected by a DJI Phantom 4 UAV at an angle of approximately 45°. A citrus target detection was constructed on the tracking dataset. An automatic citrus counting was realized using UAV video streams. A lightweight YOLO11-PMSL model was combined with an improved ByteTrack algorithm. The detection head layer was simplified to a three-level structure (P2-P4) using the YOLO11n network architecture. The feature pyramid structure was reconstructed. Deep redundant modules were removed to fuse the high-resolution shallow features, and then perceive small targets. Secondly, the C3k2-MSEIE multi-scale edge module was introduced after adaptive scale fusion and contour enhancement. The local details were expressed to extract the fruit contours. The overall morphological features of the fruit were preserved, with better feature expression in the densely populated fruit areas. Subsequently, the loss function was replaced by SIoU. A direction-sensitive constraint was introduced to improve the localization accuracy and training stability of the detection boxes. Finally, the LAMP was used to prune the model for the removal of redundant weights. The number of parameters and floating-point operations was then reduced to compress the model size for model lightweighting. The ByteTrack algorithm framework was improved, rather than using IoU in spatial location measurement. The accuracy and stability of fruit tracking were further enhanced in complex orchard environments. The similarity metric in ByteTrack was replaced with the DIoU. Simultaneously, a region-counting anti-shake mechanism was embedded in the algorithm. The target ID jump problem under occlusion was effectively solved for the accurate counting of citrus fruits. Experimental results showed that the YOLO11-PMSL model effectively improved the performance of the model. Specifically, compared with the original YOLO11n object model, better performance was achieved in the feature pyramid into a P2-P4 three-level structure. The number of model parameters was reduced to 116 m, the model size was compressed to 2.6 MB, and the recall and mAP0.5 metrics were significantly improved by 7.9 and 5.1 percentage points, respectively. The more lightweight model was verified by the higher accuracy for small targets. The precision, recall, and mAP0.5 were improved by 2.2, 10.7, and 8.7 percentage points, respectively, with the C3k2-MSEIE edge module. Once the loss function was replaced from CIoU to SIoU, the convergence speed was accelerated to further improve its performance. The LAMP algorithm was used to prune the model, fully meeting the lightweight deployment of edge terminals. The performance remained at the baseline level before pruning. While the number of parameters, floating-point operations, and model size were significantly reduced from before. Ultimately, the precision, recall, and mAP0.5 were improved by 3.3, 11.6, and 9.3 percentage points, respectively, in the object detection task. In terms of lightweighting, the number of parameters, model size, and floating-point operations were reduced by 86.05%, 76.36%, and 26.98%, respectively, compared with the original model. The detection speed was improved by 65.18%. The YOLO11-PMSL model achieved a detection accuracy and speed on the citrus dataset. In the object tracking task, the ByteTrack multi-object tracking algorithm achieved an accuracy of 92.8% and a tracking precision of 81.7%. Compared with the SORT, DeepSORT, and BotSort algorithms, the tracking accuracy was improved by 5.5, 5.7, and 4.3 percentage points, respectively, and the tracking precision was improved by 19.2, 19.4, and 10.8 percentage points, respectively. The average accuracy of citrus counting reached 88.4%, compared with manual counting. The counting error was smaller than that of manual counting. Citrus counting was effectively realized in farmland scenarios. This finding can provide a technical approach for citrus yield prediction.