Abstract:
Xinjiang, China's largest tomato production region, extensively employs plug seedling transplantation technology in modern agricultural practices. Grading and screening tomato plug seedlings to obtain qualified plants significantly enhances transplantation survival rates and crop yield consistency, making accurate pre-transplantation seedling grading critical for transplantation quality, operational efficiency, and economic viability. Traditional manual grading methods exhibit high labor intensity, time consumption, and substantial subjectivity, resulting in inconsistent results and reduced production efficiency. With growing demand for agricultural automation and precision agriculture technologies, advanced automated solutions capable of high-precision, high-reliability real-time quality assessment are urgently needed. This study proposes an innovative FAN-DETR (feature-adaptive network detection transformer) real-time grading detection model based on the advanced RT-DETR (real-time detection transformer) architecture to improve accuracy and speed of real-time tomato plug seedling grading detection. By combining complementary advantages of HGNet and CSPNet architectures, a novel lightweight hybrid backbone network with dynamic feature enhancement capabilities, designated as DFE-LHBNet (dynamic feature-enhanced lightweight hybrid backbone network), was meticulously designed to address computational redundancy issues. The HGStem module enables intelligent feature diversion and distribution, integrated with CSPNet's lightweight modular components to substantially reduce parameter count without compromising feature extraction capability. A hierarchical gating mechanism through HGBlock intelligently filters and prioritizes high-value features while suppressing redundant information. DWConv depthwise separable convolutions were integrated throughout the network to significantly reduce computational complexity while maintaining feature representation quality. An innovative DHAM (dual hierarchical attention mechanism) module was implemented for efficient feature fusion, balancing local detail preservation and global context understanding while reducing computational overhead. The model incorporates a state-of-the-art DTAB (dilated transformer attention block) attention mechanism to expand effective receptive field and capture comprehensive global contextual information crucial for precise seedling classification. A hierarchical processing pipeline effectively handles complex cross-channel and spatial information interactions through three parallel branches operating complementarily to extract diverse feature representations and enhance robustness. To optimize training efficiency and convergence characteristics, the PIoU (powerful IoU) loss function was adopted as the primary optimization criterion to prevent anchor box expansion issues and improve model training convergence speed and stability. Comprehensive experimental validation demonstrates that the proposed FAN-DETR model exhibits superior performance metrics. The system achieved an impressive mean average precision of 92.62% at IoU thresholds greater than 50%, while attaining excellent detection speed of 96.67 frames per second (FPS). Compared to the baseline RT-DETR model, these results represent significant improvements of 2.2 percentage points in accuracy and 22.82 percentage points in processing speed. The optimized architecture achieved substantial reductions of 29.59 percentage points in parameter count and 44.39 percentage points in computational complexity. Comparative analysis with mainstream YOLO series models (YOLOv5-m, YOLOv8-m, YOLOv10-m, and YOLOv12-m) revealed consistently superior performance across multiple evaluation metrics. Compared to YOLO models, FAN-DETR achieved average precision improvements of 6.77 percentage points, recall enhancements of 3.31 percentage points, and mean average precision increases of 1.50 percentage points at IoU thresholds greater than 50%. The 96.67 FPS performance represents an average improvement of 36.99 FPS compared to competing models, while maintaining an extremely compact model size of only 27.70MB, demonstrating exceptional parameter compression and deployment efficiency suitable for resource-constrained agricultural environments. The FAN-DETR model exhibits exceptional performance characteristics in real-time tomato plug seedling grading detection applications, providing a highly reliable and accurate automated grading solution suitable for industrial deployment. This research provides compelling evidence for the feasibility and effectiveness of real-time plug seedling grading detection systems, contributing to advancing agricultural automation technology development and supporting intelligent agricultural solutions that improve production efficiency, reduce labor costs, and enhance crop quality consistency in tomato production operations.