Abstract:
Wheat seedling count is a crucial field survey during the emergence stage. To achieve accurate detection and counting of wheat seedlings in complex field environments, this study proposed an improved detection algorithm named DSM-YOLOv10 based on YOLOv10n, addressing challenges such as small size, dense distribution, and mutual occlusion of wheat seedlings. Firstly, global annotation was adopted during the data labeling phase instead of merely focusing on local feature information, which enhanced the network's ability to learn and extract the overall features of wheat seedlings and effectively avoided feature loss caused by overlapping and occlusion. Secondly, a DCC2f module was constructed by introducing dual convolution (DualConv) to replace the C2f module in the backbone network. By deploying grouped convolution and pointwise convolution in parallel, combined with residual connections, the approach alleviated the gradient vanishing problem in deep-layer training, enables more complete transmission of cross-layer features, minimizes redundant convolution operations, improves computational efficiency, facilitates subsequent real-time deployment on mobile edge devices, and provides more efficient and clearer foundational features for subsequent processing. Furthermore, a semantics and detail infusion (SDI) module was introduced. It explicitly injected high-level semantic information into low-level detail features through cross-layer attention guidance and Hadamard product operations. This achieved detail enhancement under semantic guidance, effectively overcame the limitation of conventional concatenation modules that merely performed simple feature stacking without deep information interaction. Consequently, it improved the model's capability to decouple and reuse features in overlapping regions of wheat seedlings. Finally, a multi-scale dilated attention (MSDA) mechanism was adopted. By incorporating multiple dilation rates, it effectively aggregated multi-scale semantic information, strengthened the network’s ability to integrate local details with contextual information, and further enhanced the model’s understanding of intersecting and overlapping wheat seedlings. Moreover, without complex operations or additional computational cost, MSDA efficiently reduced the redundancy inherent in the self-attention mechanism. Experimental results demonstrate that the DSM-YOLOv10 model achieved precise detection of wheat seedlings across multiple complex scenarios. It attained a mean average precision (mAP), precision, recall, and F1 score of 91.4%, 85.2%, 81.7%, and 83.4%, respectively. Compared with the original YOLOv10n model, these metrics represent improvements of 5.0, 2.6, 5.4, and 4.1 percentage points, respectively. Furthermore, the model's parameter count and floating point operations (FLOPs) were reduced by 4.7% and 10.7%, respectively. With an inference time of 13.2 ms (approximately 76 frames per second), the model demonstrates real-time detection capability. Compared with RetinaNet, Faster-RCNN, SSD, and YOLOv8n models, the DSM-YOLOv10 model exhibited the best detection performance: its mAP was 35.8, 24.5, 30.7, and 8.6 percentage points higher respectively; the number of parameters was reduced by 87.4%, 96.2%, 90.0%, and 15.2% respectively; and the FLOPs were decreased by 86.3%, 96.0%, 88.1%, and 7.4% respectively. In the supplementary experiment on small object detection, the YOLOv10n model performed poorly, particularly in overlapping seedling scenarios, exhibiting a high number of missed and false detections. In contrast, the DSM-YOLOv10 model demonstrated excellent small object recognition capabilities, achieving a mAP of 86.3%. Both its missed rate(MR) and false positive rate(FPR) remained at relatively low levels, at 9.1% and 3.8%, respectively. In the seedling counting task, when comparing the detected values of the DSM-YOLOv10 model with the measured values, the coefficient of determination (
R2), root mean square error (RMSE), and mean absolute error (MAE) were 0.92, 5.68, and 4.33, respectively. Compared with YOLOv10n which performed the best among the comparative models, the
R2 increased by 6.98%, while the RMSE and MAE decreased by 35.23% and 40.44%, respectively, demonstrating higher counting accuracy and robustness. This study effectively improved the detection and counting capabilities of wheat seedlings in complex scenarios, and the proposed improved model featured fewer parameters and lower FLOPs, providing strong support for field data acquisition in agricultural practice.