无人机遥感视频影像结合改进YOLO11的柑橘追踪计数方法

翁海勇; 杜璐; 张博昱; 苏磊磊; 许金钗; 肖桂淼; 孙大伟; 叶大鹏

doi:10.11975/j.issn.1002-6819.202508166

无人机遥感视频影像结合改进YOLO11的柑橘追踪计数方法

Research on citrus tracking and counting method based on UAV remote sensing video imagery combined with improved YOLO11

摘要

摘要: 针对无人机遥感尺度下，柑橘估产过程中存在跟踪计数误差大的问题，该研究提出一种轻量化 YOLO11-PMSL模型与ByteTrack-DIoU算法相融合的无人机视频流柑橘果实跟踪计数方法。首先，在YOLO11n架构的基础上，通过重构特征金字塔结构，将检测头层级精简为P2～P4三级结构，从而显著增强微小目标感知能力；其次引入C3k2-MSEIE（C3k2-multi-scale edge information enhance）多尺度边缘增强模块，通过自适应尺度融合与轮廓强化机制，有效提升果实轮廓表征能力；进一步，采用SIoU（scylla-IoU）损失函数替代CIoU（complete-IoU）损失函数，引入方向敏感性约束以期提升检测框定位质量与训练稳定性，最后通过LAMP（layer adaptive magnitude-based pruning）方法模型剪枝，去除冗余的权重，减少参数量和浮点运算量，压缩模型体积。在ByteTrack-DIoU算法中嵌入区域计数防抖机制，进一步解决遮挡导致的ID跳变问题。结果表明，改进后YOLO11-PMSL目标检测模型精确率(P)、均值平均精度(mAP_0.5)分别提高3.3、9.3个百分点。剪枝后，与原始的YOLO11n相比，在保持精度总体提升的同时，模型的参数量，浮点运算量和模型大小分别降低了86.05%，26.98%和76.36%，检测速度由84.83帧/s提高到140.12帧/s。与传统 SORT、DeepSORT 和BotSORT算法相比，ByteTrack-DIoU算法对多目标跟踪准确率分别提高5.5、5.7和 4.3个百分点，跟踪计数平均精度达88.4%。该方法可准确地实现果园柑橘果实的跟踪计数，为柑橘产量预测提供有效的技术方案。

Abstract: To address the issues of high missed detection rates and large tracking and counting errors in citrus yield estimation using UAV remote sensing, caused by dense fruit occlusion and small target size, this study utilizes video data collected by a DJI Phantom 4 UAV at an angle of approximately 45° and constructs a citrus target detection and tracking dataset. A novel automatic citrus counting method based on UAV video streams, combining a lightweight YOLO11-PMSL model with an improved ByteTrack algorithm, is proposed. Based on the YOLO11n network architecture, this study simplifies the detection head layer to a three-level structure (P2-P4) by reconstructing the feature pyramid structure. By removing deep redundant modules and fusing high-resolution shallow features, the model's ability to perceive small targets is significantly improved. Secondly, the C3k2-MSEIE multi-scale edge enhancement module is introduced. Through adaptive scale fusion and contour enhancement, it not only strengthens the ability to express local details and improves the ability to extract fruit contours, but also preserves the overall morphological features of the fruit, exhibiting better feature expression capabilities in densely populated fruit areas. Subsequently, the loss function is replaced by SIoU, and a direction-sensitive constraint is introduced to improve the localization accuracy and training stability of the detection boxes. Finally, the LAMP method is used to prune the model, remove redundant weights, reduce the number of parameters and floating-point operations, compress the model size, and achieve model lightweighting. Regarding improvements to the ByteTrack algorithm framework, to address the shortcomings of IoU in spatial location measurement and further enhance the accuracy and stability of fruit tracking in complex orchard environments, the similarity metric in ByteTrack was replaced with DIoU. Simultaneously, a region counting anti-shake mechanism was embedded in the algorithm, effectively solving the target ID jump problem caused by occlusion and achieving accurate counting of citrus fruits. Experimental results show that all improvements to the YOLO11-PMSL model effectively improved model performance. Specifically, compared to the original YOLO11n object detection model, after reconstructing the feature pyramid into a P2-P4 three-level detection structure, the number of model parameters was reduced to 116 m, the model size was compressed to 2.6 MB, and the recall and mAP0.5 metrics were significantly improved by 7.9 and 5.1 percentage points, respectively. This verifies that the reconstructed model is more lightweight and has higher detection accuracy for small targets. After introducing the C3k2-MSEIE edge enhancement module, the model's precision, recall, and mAP0.5 improved by 2.2, 10.7, and 8.7 percentage points, respectively. Replacing the loss function from CIoU to SIoU accelerated the model's convergence speed and further improved its performance. To meet the lightweight deployment requirements of edge terminals, the LAMP algorithm was used to prune the model. After pruning, the model's performance remained at the baseline level before pruning, while the number of parameters, floating-point operations, and model size were significantly reduced compared to before pruning. Ultimately, in the object detection task, the improved model achieved improvements in precision, recall, and mAP_0.5 by 3.3, 11.6, and 9.3 percentage points, respectively. In terms of lightweighting, compared to the original model, the number of parameters, model size, and floating-point operations were reduced by 86.05%, 76.36%, and 26.98%, respectively. Regarding detection speed, compared to the original model, the speed was improved by 65.18%, indicating that the improved YOLO11-PMSL model proposed in this study achieved a dual improvement in detection accuracy and speed on the citrus dataset. In the object tracking task, the improved ByteTrack multi-object tracking algorithm achieved an accuracy of 92.8% and a tracking precision of 81.7%. Compared to SORT, DeepSORT, and BotSort algorithms, the tracking accuracy was improved by 5.5, 5.7, and 4.3 percentage points, respectively, and the tracking precision was improved by 19.2, 19.4, and 10.8 percentage points, respectively. Comparing the counting results of the improved model with those of manual counting, the average accuracy of citrus counting reached 88.4%, and the counting error of the improved model was smaller than that of manual counting. This method can effectively realize citrus counting in farmland scenarios, providing a technical approach for citrus yield prediction.

HTML全文

参考文献(37)

施引文献

资源附件(0)