基于EMO-YOLOv5s的双关联套袋葡萄串实时检测与计数

吕佳; 张翠萍

doi:10.11975/j.issn.1002-6819.202412150

基于EMO-YOLOv5s的双关联套袋葡萄串实时检测与计数

吕佳,
张翠萍

Real-time detecting and counting dual-association bagged grape clusters using EMO-YOLOv5s

摘要

摘要: 实现套袋葡萄串的实时准确计数是保障后续果园产量估计准确性的关键前提。为解决现有果实计数方法实时性不足，且套袋葡萄串分布密集、遮挡现象以及相机移动不稳定等因素导致的目标追踪丢失问题，该文提出一种基于EMO-YOLOv5s的双关联套袋葡萄串实时检测与计数方法。首先在检测阶段，引入高效模型EMO替代YOLOv5s的原始骨干网络，以降低模型的参数量和计算量；其次在追踪阶段，在ByteTrack基础上提出一种基于BIoU和欧式距离的双关联方法，对套袋葡萄串的检测框和预测框进行二次关联，以缓解目标追踪丢失问题；最后在计数阶段，设计一种矩形区域计数方法，扩大套袋葡萄串的可计数范围，提升其计数准确性。试验结果表明，在检测性能方面，参数量和浮点运算量较YOLOv5s分别下降38.6%和39.0%，平均精度和检测速度分别为96.5%和77帧/s；在追踪性能方面，高阶追踪准确率、多目标追踪准确率和ID调和平均数较基线分别提高了3.6、4.1和6.0个百分点；在计数性能方面，平均计数精度为93.1%。该研究结果可为后续果园的产量预测提供可靠的依据。

Abstract: An accurate real-time counting is vital for the bagged grape clusters, in order to ensure the subsequent estimation of the orchard yield. However, some challenges are still remained on the current real-time fruit counting. The tracking target can be lost in the bagged grape clusters, due to the dense distribution, occlusion and unstable camera movement. In this study, a real-time detection and counting were combined for the dual-association bagged grape clusters using EMO-YOLOv5s. Dual-association tracking was used as the BIoU and Euclidean distance. The rectangular region counting was also selected after target tracking. Firstly, the efficient model (EMO) was introduced as the backbone network of YOLOv5s in the detection stage. The fewer parameters were sufficiently utilized by the window multi-head self-attention (W-MHSA) mechanism in the Swin Transformer and the depthwise separable convolution (DSConv). EMO-YOLOv5s was improved the inference speed. Secondly, a dual-association was realized using ByteTrack and buffered intersection over union (BIoU) in the tracking stage. Euclidean distance was proposed to solve the target loss in the bagged grape clusters tracking. The association-matching performance of the tracking stage was enhanced to conduct twice associations between the bagged grape clusters detection and the prediction boxes. Finally, the rectangular region was designed to improve the counting accuracy of the bagged grape clusters in the counting stage. The counting increased the probability of the effectively counting bagged grape clusters. The automatic counting of the bagged grape clusters was realized to enlarge the countable range of the fruits. The experimental dataset was collected from the Agricultural Science and Technology Demonstration Park in Bishan District, Chongqing, China. The coordinate longitude, latitude, and altitude of the demonstration park were 106.221°, 29.753°, and 353 m, respectively. The grapes were planted line by line, with the equal row spacing, and the relatively uniform distribution of the bagged grape clusters. OPPO Reno6pro+ and Redmi K40 mobile phones were used to capture the images of the bagged grape clusters. Four angles were selected as the front, top, upward, and side in three periods of 8:00, 12:00, and 18:00 on July 22, 2022. The growth of the bagged grape clusters was then evaluated at the different angles and periods. The shooting time was about 6 h, where the height was 1.0 m~1.8 m from the ground, and the horizontal distance was 0.1 m~1.0 m. The 500 original images of the bagged grape clusters were obtained to capture line by line, with the image resolutions of 4000×3000 pixels and 4096×3072 pixels. There were 6 valid videos of the bagged grape clusters. The video resolution was 1920×1080 pixels, where the video format was MP4, the video frame rate was 30 frames per second and the video time was about 20 s. In addition, 200 images of the bagged grape clusters were also taken to supplement the original dataset on September 1, 2023, from 9:00 to 11:00, with an image resolution of 4096×3072 pixels. The experiments were conducted on the self-built dataset of the bagged grape clusters. The results showed that: 1) The parameters and floating-point operations decreased by 38.6% and 39.0% in the performance of the detection, respectively, compared with the YOLOv5s. Meanwhile, the average precision and detection speed were achieved by 96.5% and 77 frames per second, respectively. 2) In the tracking performance, the higher order tracking accuracy, multiple-object tracking accuracy, and identification F1-score were 58.6%, 64.7% and 80.0%, respectively, which were 3.6%, 4.1%, and 6.0% higher than ByteTrack. 3) In the counting performance, the average counting precision was achieved by 93.1%. Meanwhile, the mean absolute error was 1.3%. As such, the improved model was effectively solved the problems of the tracking and counting bagged grape clusters. The finding can provide a reliable basis to predict the orchard yield.

HTML全文

参考文献(36)

施引文献

资源附件(0)