Abstract:
To address the issues of low single-seeding rate and high missed-seeding rate in the process of plug seedling seeding for leafy vegetables, this study proposes an improved lightweight model named Seed-YOLO, based on You Only Look Once 11 nano (YOLO11n), for detecting the seeding performance of three types of leafy vegetable seeds in plug seedling trays. The model was deployed on the edge computing device NVIDIA Jetson Xavier NX, and an efficient detection system for plug seedling seeding performance was developed. YOLO11 represented an advanced image detection framework characterized by its sophisticated architecture and substantial parameters, which enabled remarkable accuracy in detecting diverse targets within high-resolution, complex scenes. However, this high precision came at the cost of significant computational demands and prolonged inference times, restricting its practical deployment in resource-constrained scenarios. In contrast, YOLO11n achieved substantial model lightweighting through architectural optimizations that reduced computational complexity and parameter redundancy while maintaining competitive performance. The core improvements of the Seed-YOLO model consist of four key components. First, a Context Anchor Attention (CAA) module was introduced into the backbone network to construct the C2PSA_CAA module, which precisely enhances the feature representation of the seed center region and improves the model's ability to capture seed characteristics. The CAA module was a network structure designed to capture long-range contextual information. It extracted statistical features of local regions through average pooling operations and strengthened these features using 1×1 convolutions, thereby enhancing the feature representation of seed central areas. The module adopted horizontal (1×11) and vertical (11×1) depth-wise separable strip convolutions to expand the receptive field while maintaining efficient computational complexity, achieving effects similar to large convolution kernels. It generated an attention weight map via a Sigmoid function and applied this weight to the original feature map to realize weighted enhancement of features. Second, Group Shuffle Convolution (GSConv) and GSBottleneck modules were incorporated into the neck network to build the C3K2_GS module, accelerating the fusion of seed features while maintaining detection accuracy. GSConv was a lightweight convolution technique. Through the Shuffle operation, it evenly spread the feature information generated by standard convolution across different channels into every part of the feature information generated by Depthwise Separable Convolution (DSC). This achieved the goal of reducing the computational complexity and the number of parameters of the model while maintaining its performance. Third, the Wise Intersection over Union version 3 (WIoU v3) loss function was adopted, leveraging its dynamic non-monotonic focusing mechanism to effectively enhance the model's attention to anchor boxes of average-quality seeds, thereby improving detection performance. WIoU v3 was used as a bounding box loss function. This loss function employed a dynamic non - monotonic focusing mechanism. It dynamically adjusted the gradient gains of samples with different qualities through a weight factor. While reducing the focus on high - quality samples, it also mitigated the negative gradients generated by low - quality samples, thereby enhancing the overall performance of the model. Finally, an XSmall detection head was added to boost the detection accuracy for small target leafy vegetable seeds, while the original Medium and Large detection heads were removed to reduce the model's parameter count and size, achieving model lightweighting. Experimental results demonstrate that Seed-YOLO achieves a mean average precision at 50% IoU (mAP@0.5) of 96.7% and F1 of 93.79% for seeding performance detection of the three leafy vegetable seeds, representing improvements of 5.4 and 8.87 percentage points compared to YOLO11n’s 91.3% and 84.92%, respectively. Notably, the model’s parameter count was reduced to 1.58 million, a 38.7% decrease from YOLO11n’s 2.58 million. The model was deployed on the NVIDIA Jetson platform, and a graphical user interface was developed to create a real-time detection system for plug seedling seeding performance. When operating at a seeding rate of 120 trays per hour, the system achieved accuracy of 99.19% for single-seed seeding prediction, 94.79% for reseeding prediction, and 93.43% for missed seeding prediction, with an average computation time of 121 milliseconds per tray. This study provides valuable support for the development of detection systems for plug seedling seeding performance in leafy vegetable.