Abstract:
An accurate real-time counting is vital for the bagged grape clusters, in order to ensure the subsequent estimation of the orchard yield. However, some challenges are still remained on the current real-time fruit counting. The tracking target can be lost in the bagged grape clusters, due to the dense distribution, occlusion and unstable camera movement. In this study, a real-time detection and counting were combined for the dual-association bagged grape clusters using EMO-YOLOv5s. Dual-association tracking was used as the BIoU and Euclidean distance. The rectangular region counting was also selected after target tracking. Firstly, the efficient model (EMO) was introduced as the backbone network of YOLOv5s in the detection stage. The fewer parameters were sufficiently utilized by the window multi-head self-attention (W-MHSA) mechanism in the Swin Transformer and the depthwise separable convolution (DSConv). EMO-YOLOv5s was improved the inference speed. Secondly, a dual-association was realized using ByteTrack and buffered intersection over union (BIoU) in the tracking stage. Euclidean distance was proposed to solve the target loss in the bagged grape clusters tracking. The association-matching performance of the tracking stage was enhanced to conduct twice associations between the bagged grape clusters detection and the prediction boxes. Finally, the rectangular region was designed to improve the counting accuracy of the bagged grape clusters in the counting stage. The counting increased the probability of the effectively counting bagged grape clusters. The automatic counting of the bagged grape clusters was realized to enlarge the countable range of the fruits. The experimental dataset was collected from the Agricultural Science and Technology Demonstration Park in Bishan District, Chongqing, China. The coordinate longitude, latitude, and altitude of the demonstration park were 106.221°, 29.753°, and 353 m, respectively. The grapes were planted line by line, with the equal row spacing, and the relatively uniform distribution of the bagged grape clusters. OPPO Reno6pro+ and Redmi K40 mobile phones were used to capture the images of the bagged grape clusters. Four angles were selected as the front, top, upward, and side in three periods of 8:00, 12:00, and 18:00 on July 22, 2022. The growth of the bagged grape clusters was then evaluated at the different angles and periods. The shooting time was about 6 h, where the height was 1.0 m~1.8 m from the ground, and the horizontal distance was 0.1 m~1.0 m. The 500 original images of the bagged grape clusters were obtained to capture line by line, with the image resolutions of 4000×3000 pixels and 4096×3072 pixels. There were 6 valid videos of the bagged grape clusters. The video resolution was 1920×1080 pixels, where the video format was MP4, the video frame rate was 30 frames per second and the video time was about 20 s. In addition, 200 images of the bagged grape clusters were also taken to supplement the original dataset on September 1, 2023, from 9:00 to 11:00, with an image resolution of 4096×3072 pixels. The experiments were conducted on the self-built dataset of the bagged grape clusters. The results showed that: 1) The parameters and floating-point operations decreased by 38.6% and 39.0% in the performance of the detection, respectively, compared with the YOLOv5s. Meanwhile, the average precision and detection speed were achieved by 96.5% and 77 frames per second, respectively. 2) In the tracking performance, the higher order tracking accuracy, multiple-object tracking accuracy, and identification F1-score were 58.6%, 64.7% and 80.0%, respectively, which were 3.6%, 4.1%, and 6.0% higher than ByteTrack. 3) In the counting performance, the average counting precision was achieved by 93.1%. Meanwhile, the mean absolute error was 1.3%. As such, the improved model was effectively solved the problems of the tracking and counting bagged grape clusters. The finding can provide a reliable basis to predict the orchard yield.