Abstract:
This study addresses the significant challenges in the rose cultivation industry, including the intense physical demands of manual harvesting, an aging farmer demographic, and increasing labor costs, exacerbated by the lack of accurate and efficient automated vision systems for reliably identifying rose flowering stages. To tackle this, a novel model based on the YOLOv8n framework, designated as YOLOv8n-dy, was proposed for rose flowering stage detection. The model's backbone network was optimized by enhancing the C2f module to construct a more efficient multi-branch convolutional block, improving recognition capabilities for multi-scale and multi-morphology flowering targets. The Efficient Local Attention (ELA) mechanism was introduced to bolster the detection of small targets, and an additional layer for small target detection was added. The loss function was replaced with the Weighted Intersection over Union version 3 (WIOUv3) to enhance the model's ability to accurately locate and detect small-sized targets such as bud-stage roses. The Adamax optimizer was employed to address the issue of the model getting stuck in local optima. Experimental results demonstrated that the YOLOv8n-dy model achieved significant improvements over the original YOLOv8n model, with gains of 4.4% in precision, 6.7% in recall, and 6.1% in mAP
0.
5, respectively. In a comparative analysis against other state-of-the-art detectors, while YOLOv9 achieved the highest raw detection accuracy, the YOLOv8n-dy model maintained a highly competitive mAP
0.
5 value of 0.773. The most significant advantage of the proposed model was its dramatically reduced computational footprint, requiring only 9.8 Giga Floating Point Operations (GFLOPs), which is a mere 3.67% of the computational demand of the YOLOv9 model. Additionally, the weight of the YOLOv8n-dy model is only 4.22% of that of the YOLOv9 model, significantly enhancing the model's deployment and application performance. This exceptional efficiency did not come at the cost of field performance, as the model consistently and accurately identified all three flowering stages under diverse and challenging real-world field conditions, including varying lighting and occlusions. In conclusion, the YOLOv8n-dy model presented in this work successfully fulfills its design objectives by achieving an optimal and practical balance between high detection accuracy and low computational complexity. The strategic architectural innovations, including the optimized feature extraction blocks, attention mechanism, and advanced loss function, collectively contributed to its robust performance. The model's markedly low computational requirement makes it exceptionally suitable for real-time deployment on embedded systems and mobile platforms, which are critical for practical agricultural applications. Consequently, this research provides a reliable, efficient, and scalable technological solution for automating rose harvesting, potentially mitigating labor shortages, lowering operational costs, and promoting greater sustainability in modern floriculture and precision agriculture.