Abstract:
High accuracy and real-time performance are often required to detect the safflower at the full bloom stage in complex field environments. However, the conventional models are constrained by the high computational complexity and low edge deployment adaptability. In this study, a lightweight detection (named YOLO-GESCW) was proposed for the safflower at the full bloom stage using YOLOv8n. The self-built dataset of the safflower at full bloom was constructed to cover the scenarios with different lighting, weather, target scales, and occlusion levels. Five improvements were systematically implemented: Firstly, Ghost Convolution (Ghost Conv) was used to replace all standard convolutions in the backbone network and the 16th and 19th layer standard convolutions in the neck network, in order to reduce the computational overhead of the convolution depth. Secondly, the Efficient Channel Attention (ECA) module was used to substitute all C2f modules in the backbone network and the 12th and 21st layer C2f modules in the neck network. Model parameters and computational cost were reduced for better lightweight performance. Thirdly, the Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN) module replaced the original SPPF (Spatial Pyramid Pooling - Fast) to fuse the multi-scale features, thus enhancing the detection for the irregular small-scale targets. Fourthly, the C3k2 module was added to the neck network to capture the object contours and fine details. Finally, the Wise-IoUv3 loss function replaced the original one to reduce the error between predicted and ground-truth boxes. A comparative analysis was conducted with nine models in the field, including mainstream models YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and Gradient-weighted Class Activation Mapping (Grad-CAM) visualization for the 7th–9th layers of the backbone networks of YOLOv8n and YOLO-GESCW. Deployment verification was performed on the NVIDIA Jetson Xavier NX edge device. Results showed that the YOLO-GESCW was achieved in a parameter count of 0.98 M, model size of 2.09 MB, and FLOPs of 3.4 G, which were 67.44%, 65.10%, and 58.02% lower than YOLOv8n, respectively. On the test set, it reached 93.6% precision, 93.0% recall, and 98.1% mean average precision (mAP@0.5), with an average detection speed of 106.79 frames per second. Compared with the four mainstream models, its parameters were reduced by 50.25%, 56.83%, 62.02%, and 61.72%, while the FLOPs were reduced by 55.3%, 47.7%, 46.0%, and 46.0%, respectively. Among the nine comparative models, the YOLO-GESCW achieved the highest mAP@0.5 (98.15%), its precision was 0.5 percentage points lower than SF-YOLO, but higher than the rest, and its recall was the highest among the nine models. Notably, it was the model with the parameters less than one million (0.98 M), thus consuming fewer computational resources to better meet the lightweight requirements of the edge deployment. The Grad-CAM results confirmed the improved model focused more on the targets, which was consistent with the expected optimizations. The 3.48 MB best was deployed on the edge device. The engine model achieved 52.36 frames per second and 0.0072 s single-image inference time on the 757 unseen images from the self-built dataset. The YOLO-GESCW model effectively balanced the lightweight performance, high detection accuracy, and real-time responsiveness, thereby overcoming the key limitations of conventional models in complex field scenarios. The finding can provide a reliable technical reference for the real-time detection of the safflower at the full bloom stage and its practical deployment on the resource-constrained edge devices. A solid foundation can help develop intelligent safflower harvesting.