Abstract:
Rose is one of most favorite flowers in modern floriculture worldwide. However, significant challenges have been posed to the rose cultivation in China. Particularly, manual harvesting cannot fully meet the large-scale intensive production in recent years, due mainly to the increasing labor costs. It is still lacking in accurate and efficient machine vision to reliably identify the rose flowering stages. In this study, an accurate model was proposed to detect the rose flowering stage using the YOLOv8n framework, designated as YOLOv8n-dy. The backbone network was optimized to enhance the C2f module. A more efficient multi-branch convolutional block was constructed to improve the recognition for the multi-scale and multi-morphology flowering targets. The Efficient Local Attention (ELA) mechanism was introduced to detect the small targets. An additional layer was also added for the detection of the small target. The loss function was replaced with the Weighted Intersection over Union version 3 (WIOUv3) to accurately locate and detect the small-sized targets, such as the bud-stage roses. The Adamax Optimizer 5 was employed to avoid the local optima. Experimental results demonstrated that the YOLOv8n-dy model achieved significant improvements over the original YOLOv8n model, with gains of 4.4 percentage points in precision, 6.7 percentage points in recall, and 6.1 percentage points in mAP
0.5, respectively. A comparison was made of the state-of-the-art detectors. The YOLOv8n-dy model also maintained a highly competitive mAP
0.5 value of 77.3%. While the YOLOv9 achieved the highest accuracy in raw detection. The computational footprint was dramatically reduced to require only 9.8 Giga Floating Point Operations (GFLOPs). Additionally, the weight of the YOLOv8n-dy model was only 4.22% of that of the YOLOv9 model, significantly enhancing the deployment and application performance. This exceptional efficiency was also observed in the field. All three flowering stages were consistently and accurately identified under diverse and challenging field conditions, including the varying lighting and occlusions. In conclusion, the YOLOv8n-dy model was optimized to balance between high detection accuracy and low computational complexity. The strategic architecture greatly contributed to the robust performance, including the optimal blocks of the feature extraction, attention mechanism, and advanced loss function. The improved model with markedly low computation can be real-time deployed, suitable for the embedded and mobile platforms, which are very critical for practical applications. Consequently, this finding can provide a reliable, efficient, and scalable technological solution for rose harvesting, in order to reduce the labor shortages and operational costs in sustainable and precision agriculture.