Abstract:
Corn, as one of the most important food and feed crops in China, plays a vital role in ensuring national food security. Weeds are a major biological stress factor affecting corn growth. They not only compete with corn seedlings for water, nutrients, and light, but may also serve as hosts for pests and pathogens, thereby causing declines in both crop yield and quality. Traditional weed control strategies typically rely on manual identification or extensive herbicide application, both of which are associated with low efficiency, resource waste, and environmental pollution. Existing detection models for corn seedlings and weeds often suffer from large parameter counts, high computational costs, and insufficient accuracy in complex agricultural environments, where false positives and missed detections are common. To address these limitations, this study proposes a lightweight detection framework named YOLO11-SAW, which is developed based on YOLO11n. The proposed model is designed to achieve improved detection accuracy and inference efficiency, while being well-suited for deployment on edge devices. Specifically, this study improves YOLO11n from three aspects: backbone feature extraction, neck feature fusion, and the bounding box regression loss function. First, the C3STR module, which is constructed by combining the C3 module with the Swin Transformer, is introduced at the end of the backbone network. This improvement enables the model to better capture global contextual dependencies and enhances its ability to distinguish corn seedlings from weeds in scenarios involving overlapping leaves, target occlusion, dense target distribution, and complex soil backgrounds. Second, the Neck structure is augmented with the Alterable Kernel Convolution (AKConv) module, which introduces learnable offsets to replace conventional convolution operations. This modification enhances the model's adaptability to multi-scale and deformed weed targets, while simultaneously reducing both floating-point operations (FLOPs) and model parameters. Finally, a refined bounding box regression strategy is implemented through the introduction of the WIoUv3 loss function. This adjustment alleviates training instability caused by fixed gradient magnitudes, enhances the suppression of low-quality samples, improves the utilization of high-quality samples, and consequently promotes convergence stability and localization accuracy. To comprehensively evaluate the effectiveness of the proposed method, a series of comparative experiments were conducted on a self-constructed corn seedling and weed dataset. Under unified experimental settings, YOLO11-SAW was compared with several representative detection models, including Faster R-CNN, Swin Transformer, RT-DETR-L, and multiple YOLO-series models. The experimental results demonstrate that YOLO11-SAW outperforms existing methods in both detection accuracy and computational efficiency. Compared with the baseline model YOLO11n, YOLO11-SAW improves precision by 1.71% and recall by 2.18%, with mAP
0.5 reaching 96.40% and mAP
0.5:0.95 reaching 76.17%, indicating strong detection and localization capability. In terms of model complexity, YOLO11-SAW contains only 1.97 million parameters and 8.0 GFLOPs, with a model size of 4.3 MB, showing a clear advantage in lightweight design. In addition, the model achieves a real-time inference speed of 202.77 FPS, meeting the real-time requirements of intelligent agricultural applications. In summary, the proposed YOLO11-SAW model enables real-time and accurate detection of corn seedlings and weeds in complex agricultural environments, and can provide practical technical support for the deployment of agricultural mobile robots and intelligent variable-rate spraying systems on edge devices.