高级检索+

基于改进YOLOv10n的复杂田间小麦幼苗检测和计数方法

Detecting and counting wheat seedlings under complex field conditions using improved YOLOv10n

  • 摘要: 为实现田间复杂环境下小麦幼苗的精确识别,本研究基于YOLOv10n提出一种改进检测算法DSM-YOLOv10。首先,针对算法计算复杂,通过引入双卷积结构(dual convolution,DualConv)构建DCC2f模块,替代原主干网络中的C2f结构,有效减少冗余计算并增强特征融合能力;其次,针对小麦幼苗分布密集、相互遮挡等检测难点,采用语义细节融合模块(semantics and detail infusion,SDI)替换Concat操作,实现语义指导下的细节增强,提升模型对小麦幼苗重叠区域特征的解耦与复用能力;最后,进一步引入多尺度空洞注意力机制(multi-scale dilated attention,MSDA)强化多尺度特征表达,提升模型对小麦幼苗交叉重叠的理解能力。实验结果表明,DSM-YOLOv10算法能够针对多种复杂场景下的小麦幼苗实现精准检测,平均精度均值、精确率、召回率和F1分数分别达到91.4%、85.2%、81.7%和83.4%,较原YOLOv10n模型分别提高5.0、2.6、5.4和4.1个百分点,模型参数量和浮点运算量分别减少4.7%和10.7%,推理时间降至13.2 ms。在幼苗计数任务中,相较实测数据,决定系数、均方根误差和平均绝对误差分别为0.92、5.68和4.33,性能均优于YOLOv10n等主流模型。本研究有效提升了复杂场景下小麦幼苗的检测与计数能力,为农业实践中的田间数据获取提供了有力支撑。

     

    Abstract: Wheat seedling count is a crucial field survey during the emergence stage. To achieve accurate detection and counting of wheat seedlings in complex field environments, this study proposed an improved detection algorithm named DSM-YOLOv10 based on YOLOv10n, addressing challenges such as small size, dense distribution, and mutual occlusion of wheat seedlings. Firstly, global annotation was adopted during the data labeling phase instead of merely focusing on local feature information, which enhanced the network's ability to learn and extract the overall features of wheat seedlings and effectively avoided feature loss caused by overlapping and occlusion. Secondly, a DCC2f module was constructed by introducing dual convolution (DualConv) to replace the C2f module in the backbone network. By deploying grouped convolution and pointwise convolution in parallel, combined with residual connections, the approach alleviated the gradient vanishing problem in deep-layer training, enables more complete transmission of cross-layer features, minimizes redundant convolution operations, improves computational efficiency, facilitates subsequent real-time deployment on mobile edge devices, and provides more efficient and clearer foundational features for subsequent processing. Furthermore, a semantics and detail infusion (SDI) module was introduced. It explicitly injected high-level semantic information into low-level detail features through cross-layer attention guidance and Hadamard product operations. This achieved detail enhancement under semantic guidance, effectively overcame the limitation of conventional concatenation modules that merely performed simple feature stacking without deep information interaction. Consequently, it improved the model's capability to decouple and reuse features in overlapping regions of wheat seedlings. Finally, a multi-scale dilated attention (MSDA) mechanism was adopted. By incorporating multiple dilation rates, it effectively aggregated multi-scale semantic information, strengthened the network’s ability to integrate local details with contextual information, and further enhanced the model’s understanding of intersecting and overlapping wheat seedlings. Moreover, without complex operations or additional computational cost, MSDA efficiently reduced the redundancy inherent in the self-attention mechanism. Experimental results demonstrate that the DSM-YOLOv10 model achieved precise detection of wheat seedlings across multiple complex scenarios. It attained a mean average precision (mAP), precision, recall, and F1 score of 91.4%, 85.2%, 81.7%, and 83.4%, respectively. Compared with the original YOLOv10n model, these metrics represent improvements of 5.0, 2.6, 5.4, and 4.1 percentage points, respectively. Furthermore, the model's parameter count and floating point operations (FLOPs) were reduced by 4.7% and 10.7%, respectively. With an inference time of 13.2 ms (approximately 76 frames per second), the model demonstrates real-time detection capability. Compared with RetinaNet, Faster-RCNN, SSD, and YOLOv8n models, the DSM-YOLOv10 model exhibited the best detection performance: its mAP was 35.8, 24.5, 30.7, and 8.6 percentage points higher respectively; the number of parameters was reduced by 87.4%, 96.2%, 90.0%, and 15.2% respectively; and the FLOPs were decreased by 86.3%, 96.0%, 88.1%, and 7.4% respectively. In the supplementary experiment on small object detection, the YOLOv10n model performed poorly, particularly in overlapping seedling scenarios, exhibiting a high number of missed and false detections. In contrast, the DSM-YOLOv10 model demonstrated excellent small object recognition capabilities, achieving a mAP of 86.3%. Both its missed rate(MR) and false positive rate(FPR) remained at relatively low levels, at 9.1% and 3.8%, respectively. In the seedling counting task, when comparing the detected values of the DSM-YOLOv10 model with the measured values, the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) were 0.92, 5.68, and 4.33, respectively. Compared with YOLOv10n which performed the best among the comparative models, the R2 increased by 6.98%, while the RMSE and MAE decreased by 35.23% and 40.44%, respectively, demonstrating higher counting accuracy and robustness. This study effectively improved the detection and counting capabilities of wheat seedlings in complex scenarios, and the proposed improved model featured fewer parameters and lower FLOPs, providing strong support for field data acquisition in agricultural practice.

     

/

返回文章
返回