高级检索+

基于YOLO-GESCW的盛花期红花轻量化检测方法

Detecting safflower at the full bloom stage using lightweight YOLO-GESCW

  • 摘要: 为提高盛花期红花检测精度和检测实时性,该研究基于YOLOv8n提出一种YOLO-GESCW盛花期红花轻量化检测方法。通过引入幽灵卷积(Ghost Conv)降低卷积深度计算开销;采用高效通道注意力模块(efficient channel attention, ECA)替换主干网络及颈部网络部分C2f模块,减少模型参数量和计算量;引入空间金字塔池化高效层聚合网络(spatial pyramid pooling with efficient layer aggregation network, SPPELAN)模块增强多尺度特征提取能力;在颈部网络加入C3k2模块,提升轮廓与细节捕捉能力;使用Wise-IoUv3函数优化边界框损失。结果表明,改进后模型参数量约0.98 M、模型大小为2.09 MB、浮点计算量为3.4 G,较原YOLOv8n分别降低67.44%、65.10%和58.02%;测试集精确率为93.6%、召回率为93.0%、平均精度均值(mAP0.5)为98.1%,平均检测速度为106.79帧/s;在NVIDIA Jetson Xavier NX边缘设备部署测试,目标模型best.engine大小为3.48 MB、平均帧率52.36帧/s、单图平均推理耗时0.0072 s。该研究实现了检测模型轻量化与检测精度平衡,可为盛花期红花采收作业检测模型边缘设备部署提供解决方案。

     

    Abstract: High accuracy and real-time performance are often required to detect the safflower at the full bloom stage in complex field environments. However, the conventional models are constrained by the high computational complexity and low edge deployment adaptability. In this study, a lightweight detection (named YOLO-GESCW) was proposed for the safflower at the full bloom stage using YOLOv8n. The self-built dataset of the safflower at full bloom was constructed to cover the scenarios with different lighting, weather, target scales, and occlusion levels. Five improvements were systematically implemented: Firstly, Ghost Convolution (Ghost Conv) was used to replace all standard convolutions in the backbone network and the 16th and 19th layer standard convolutions in the neck network, in order to reduce the computational overhead of the convolution depth. Secondly, the Efficient Channel Attention (ECA) module was used to substitute all C2f modules in the backbone network and the 12th and 21st layer C2f modules in the neck network. Model parameters and computational cost were reduced for better lightweight performance. Thirdly, the Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN) module replaced the original SPPF (Spatial Pyramid Pooling - Fast) to fuse the multi-scale features, thus enhancing the detection for the irregular small-scale targets. Fourthly, the C3k2 module was added to the neck network to capture the object contours and fine details. Finally, the Wise-IoUv3 loss function replaced the original one to reduce the error between predicted and ground-truth boxes. A comparative analysis was conducted with nine models in the field, including mainstream models YOLOv9t, YOLOv10n, YOLOv11n, YOLOv12n, and Gradient-weighted Class Activation Mapping (Grad-CAM) visualization for the 7th–9th layers of the backbone networks of YOLOv8n and YOLO-GESCW. Deployment verification was performed on the NVIDIA Jetson Xavier NX edge device. Results showed that the YOLO-GESCW was achieved in a parameter count of 0.98 M, model size of 2.09 MB, and FLOPs of 3.4 G, which were 67.44%, 65.10%, and 58.02% lower than YOLOv8n, respectively. On the test set, it reached 93.6% precision, 93.0% recall, and 98.1% mean average precision (mAP@0.5), with an average detection speed of 106.79 frames per second. Compared with the four mainstream models, its parameters were reduced by 50.25%, 56.83%, 62.02%, and 61.72%, while the FLOPs were reduced by 55.3%, 47.7%, 46.0%, and 46.0%, respectively. Among the nine comparative models, the YOLO-GESCW achieved the highest mAP@0.5 (98.15%), its precision was 0.5 percentage points lower than SF-YOLO, but higher than the rest, and its recall was the highest among the nine models. Notably, it was the model with the parameters less than one million (0.98 M), thus consuming fewer computational resources to better meet the lightweight requirements of the edge deployment. The Grad-CAM results confirmed the improved model focused more on the targets, which was consistent with the expected optimizations. The 3.48 MB best was deployed on the edge device. The engine model achieved 52.36 frames per second and 0.0072 s single-image inference time on the 757 unseen images from the self-built dataset. The YOLO-GESCW model effectively balanced the lightweight performance, high detection accuracy, and real-time responsiveness, thereby overcoming the key limitations of conventional models in complex field scenarios. The finding can provide a reliable technical reference for the real-time detection of the safflower at the full bloom stage and its practical deployment on the resource-constrained edge devices. A solid foundation can help develop intelligent safflower harvesting.

     

/

返回文章
返回