Abstract:
This study aimed to improve the detection accuracy and real-time performance of safflower at the full bloom stage in complex field environments, where traditional models are constrained by high computational complexity and poor edge deployment adaptability. A lightweight detection method for safflower at full bloom stage named YOLO-GESCW was proposed based on YOLOv8n. The self-built dataset of safflower at full bloom was constructed, covering scenarios with different lighting, weather, target scales, and occlusion levels. Five targeted improvements were systematically implemented: Firstly, Ghost Convolution (Ghost Conv) replaced all standard convolutions in the backbone network and the 16th and 19th layer standard convolutions in the neck network to reduce computational overhead of convolution depth. Secondly, the Efficient Channel Attention (ECA) module substituted all C2f modules in the backbone network and the 12th and 21st layer C2f modules in the neck network, decreasing model parameters and computational cost for better lightweight performance. Thirdly, the Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN) module replaced the original SPPF (Spatial Pyramid Pooling - Fast) to fuse multi-scale features, enhancing detection capability for irregular small-scale targets. Fourthly, the C3k2 module was added to the neck network to improve capture of object contours and fine details. Finally, the Wise-IoUv3 loss function replaced the original one to reduce the error between predicted and ground-truth boxes. Comprehensive experiments were conducted, including comparative analysis with nine relevant models in the field, including mainstream models YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and Gradient-weighted Class Activation Mapping (Grad-CAM) visualization for the 7th–9th layers of the backbone networks of YOLOv8n and YOLO-GESCW. Deployment verification was performed on the NVIDIA Jetson Xavier NX edge device. Results showed that YOLO-GESCW achieved a parameter count of 0.98 M, model size of 2.09 MB, and FLOPs of 3.4 G, which were 67.39%, 65.10%, and 58.02% lower than YOLOv8n, respectively. On the test set, it reached 93.6% precision, 93.0% recall, and 98.1% mean average precision (mAP@0.5), with an average detection speed of 106.79 frames per second. Compared with the four mainstream models, its parameters were reduced by 50.25%, 56.83%, 62.02%, and 61.72%, and FLOPs were reduced by 55.3%, 47.7%, 46.0%, and 46.0%, respectively. Among the nine comparative models, YOLO-GESCW had the highest mAP@0.5 (98.15%): its precision was 0.5 percentage points lower than SF-YOLO, but higher than others, and its recall was 0.9 percentage points lower than Improved YOLOv7, but higher than the rest. Notably, it was the model with parameters less than one million (0.98 M), consuming fewer computational resources to better meet edge deployment lightweight requirements. Grad-CAM results confirmed the improved model focused more on targets, consistent with expected optimizations. Deployed on the edge device, the 3.48 MB best.engine model achieved 52.36 frames per second and
0.0072 s single-image inference time on 757 unseen images from the self-built dataset. The YOLO-GESCW model effectively balanced lightweight performance, high detection accuracy, and real-time responsiveness, overcoming the key limitations of traditional models in complex field scenarios. It provided a reliable technical reference for the real-time detection of safflower at the full bloom stage and its practical deployment on resource-constrained edge devices, laying a solid foundation for the development of intelligent safflower harvesting technology.