Abstract:
Wheat seedling counting is one of the most crucial field surveys at the emergence stage. However, the challenges remain, such as small size, dense distribution, and mutual occlusion of wheat seedlings. This study aims to accurately detect and count the wheat seedlings in complex field environments. An improved algorithm (named DSM-YOLOv10) was proposed using YOLOv10n. Firstly, global annotation was adopted rather than only local features during data labeling. The overall features of wheat seedlings were learned and extracted to effectively avoid the feature loss caused by overlapping and occlusion. Secondly, a DCC2f module was constructed to replace the C2f module with the dual convolution (DualConv) in the backbone network. Grouped and pointwise convolutions were concurrently deployed to combine with the residual connections. The gradient vanishing was alleviated in deep-layer training. A more complete transmission of cross-layer features was realized to minimize the redundant convolution. Computational efficiency was improved to facilitate subsequent real-time deployment on mobile edge devices. More efficient features were provided for the subsequent processing. Furthermore, a semantics and detail infusion (SDI) module was introduced to explicitly inject the high-level semantic information into low-level detail features using cross-layer attention and Hadamard products. Detail enhancement was obtained under semantic guidance to avoid simple feature stacking without deep information interaction in the conventional concatenation modules. Consequently, the performance was improved to decouple and reuse features in the overlapping regions of wheat seedlings. Finally, a multi-scale dilated attention (MSDA) mechanism was adopted to incorporate the multiple dilation rates. Multi-scale semantic information was effectively aggregated to integrate the local details with contextual information. Furthermore, the intersecting and overlapping wheat seedlings were enhanced after extraction. Moreover, the complex redundancy or additional computational cost was efficiently reduced in the self-attention mechanism. Experimental results demonstrate that the DSM-YOLOv10 model achieved precise detection of wheat seedlings under multiple complex scenarios. The mean average precision (mAP), precision (
P), recall (
R), and F1 score were 91.4%, 85.2%, 81.7%, and 83.4%, respectively, which was improved by 5.0, 2.6, 5.4, and 4.1 percentage points, respectively, compared with the original YOLOv10n model. Furthermore, the parameter count and floating point operations (FLOPs) were reduced by 4.7% and 10.7%, respectively. The real-time detection was also realized with an inference time of 13.2 ms (approximately 76 frames per second). Compared with the RetinaNet, Faster-RCNN, SSD, and YOLOv8n models, the DSM-YOLOv10 model exhibited the best performance in detection. The mAP values were 35.8, 24.5, 30.7, and 8.6 percentage points higher, respectively; whereas, the number of parameters was reduced by 87.4%, 96.2%, 90.0%, and 15.2%, respectively; and the FLOPs were decreased by 86.3%, 96.0%, 88.1%, and 7.4%, respectively. The DSM-YOLOv10 model also demonstrated excellent recognition of the small objects, with a mAP of 86.3%, particularly in overlapping seedling scenarios. Both its missed detection rate and false detection rate remained at the low levels, at 9.1% and 3.8%, respectively. The coefficient of determination (
R²), root mean square error (RMSE), and mean absolute error (MAE) were 0.92, 5.68 plants, and 4.33 plants, respectively, in the seedling counting task using the DSM-YOLOv10 model. Compared with the YOLOv10n, the
R² increased by 6.98%, while the RMSE and MAE decreased by 35.23% and 40.44%, respectively, indicating the higher counting accuracy and robustness. This finding can effectively improve the detection and counting performance of wheat seedlings in complex scenarios. The improved model was also featured by fewer parameters and lower FLOPs. The finding can provide support for field data acquisition in smart agriculture.