Abstract:
Maize is one of the most significant cereal crops worldwide. It is often required to realize the seed purity and accurate maize counting for high-quality control in modern production. However, conventional computer vision cannot fully meet the requirements of the large-scale performance during conveyor-belt inspection, including the spatial dense distribution of multiple seed varieties, subtle morphologies among categories, and motion-induced blur under the relative movement between the camera and the seeds. In this study, an improved YOLOv10n architecture was proposed to dynamically detect and then count maize seeds in real time. Specifically, the framework was also optimized for high-performance detection on the moving conveyor belts. The YOLO-corn detection model significantly enhanced the baseline YOLOv10n architecture using four structural improvements. 1) RFAConv (Receptive Field Attention Convolution) was integrated into the redesigned C2fE modules. The feature smearing effect of motion blur was mitigated in the standard convolutions with fixed parameters. Spatial features within the receptive field were adaptively re-weighted to concentrate on discriminative micro-textures, such as the seed embryos and grain contours. 2) Diverse Branch Block (DBB) was incorporated to extract the shallow feature. A multi-branch topology during training was utilized to capture diverse scale-space information after structure re-parameterization. Subsequently, a single kernel of inference was fused to enhance local edge perception with less computational overhead. 3) The lightweight ADown down-sampling module was utilized to avoid the loss typical of conventional pooling. The spatial fidelity was combined with the parallel average pooling and strided convolution. Structural cues were then preserved throughout the hierarchy. 4) A composite FPIoU-v2 loss function was proposed to accelerate convergence. The segmented linear re-weighting of Focal-IoU was coupled with the pixel-level boundary sensitivity of PIoU. The loss function was effectively recalibrated for the hard-to-detect overlapping samples after training. The BoT-SORT algorithm was also implemented for the temporal tracking task. Camera motion compensation was used to filter out mechanical vibrations, while an improved Kalman filter was used for smoother trajectory estimation. A virtual line-crossing logic was further integrated to map trajectories into discrete counts, effectively neutralizing redundant counts by tracking ID instability. Experimental evaluations demonstrated that the YOLO-corn framework substantially outperformed existing benchmarks. According to a self-curated dynamic dataset with five varieties of maize seed, the better performance of the model was achieved in a Precision of 89.2%, a Recall of 88.4%, and an mAP@0.5 of 94.0%. Compared with the original YOLOv10n, the performance was improved by 2.1, 2.1, and 1.6 percentage points, respectively, while the number of parameters increased by only 0.17 M. In terms of throughput, a high inference speed of 110.5 and 49.5 FPS on a high-performance workstation and an NVIDIA Jetson Nano after TensorRT optimization, respectively, indicates its readiness for edge deployment. Ablation studies confirmed that the synergistic interaction between RFAConv and DBB was crucial to reducing the motion-induced blurring. Furthermore, counting experiments revealed that an average accuracy exceeded 89.3% at low-to-medium velocities (0.1-0.3 m/s) and stayed above 81.5% even at the increased speeds (0.4~0.5 m/s), indicating strong operational robustness. The YOLO-corn framework can offer a robust, high-accuracy, and real-time solution for the online monitoring of the moving maize seeds. These findings can provide a technical foundation for the agricultural inspection to balance the architectural complexity with detection fidelity. Physics-based deblurring pre-processing and 3D point-cloud data can be expected to resolve extreme occlusion and stacking in high-speed industrial environments.