基于改进RT-DETR的番茄穴盘苗实时分级检测

朱祥建; 任玲; 张玉泉; 杨苗; 崔建谱; 张聪华; 刘军平

doi:10.11975/j.issn.1002-6819.202504050

基于改进RT-DETR的番茄穴盘苗实时分级检测

Real-time Detection and Grading of Tomato Plug Seedlings Based on Improved RT-DETR

摘要

摘要: 为提高番茄穴盘苗实时分级检测精度和速度，该研究以RT-DETR（real-time detection transformer）为基准模型，提出了改进的FAN-DETR（feature-adaptive network detection transformer）实时分级检测模型。首先设计动态特征增强的轻量化混合主干网络DFE-LHBNet（dynamic feature-enhanced lightweight hybrid backbone network），通过结构优化，实现高效的特征提取，降低参数量和计算量，提升检测速度；加入DTAB（dilated transformer attention block）空洞变换注意力机制，扩展感受野，精准捕捉幼苗细节特征，提高检测精度；最后使用PIoU（powerful-IoU）损失函数，避免锚框膨胀，提高模型训练的收敛速度。研究表明，FAN-DETR在交并比IoU（intersection over union）阈值大于50%的平均检测精度均值达到92.62%，FPS（frames per second）96.67帧/s，较原模型相比分别提高了2.2个百分点、22.82个百分点；参数量和计算量相较于原模型分别降低了29.59个百分点、44.39个百分点；相比于YOLO系列模型精度平均提升6.77个百分点，召回率平均提升3.31个百分点，IoU阈值大于50%的平均检测精度均值平均提升1.50个百分点，检测速度平均提高了62%。研究结果表明FAN-DETR模型在番茄穴盘苗实时分级检测中有良好的性能，可以为穴盘苗实时分级检测提供技术支撑。

Abstract: Xinjiang, China's largest tomato production region, extensively employs plug seedling transplantation technology in modern agricultural practices. Grading and screening tomato plug seedlings to obtain qualified plants significantly enhances transplantation survival rates and crop yield consistency, making accurate pre-transplantation seedling grading critical for transplantation quality, operational efficiency, and economic viability. Traditional manual grading methods exhibit high labor intensity, time consumption, and substantial subjectivity, resulting in inconsistent results and reduced production efficiency. With growing demand for agricultural automation and precision agriculture technologies, advanced automated solutions capable of high-precision, high-reliability real-time quality assessment are urgently needed. This study proposes an innovative FAN-DETR (feature-adaptive network detection transformer) real-time grading detection model based on the advanced RT-DETR (real-time detection transformer) architecture to improve accuracy and speed of real-time tomato plug seedling grading detection. By combining complementary advantages of HGNet and CSPNet architectures, a novel lightweight hybrid backbone network with dynamic feature enhancement capabilities, designated as DFE-LHBNet (dynamic feature-enhanced lightweight hybrid backbone network), was meticulously designed to address computational redundancy issues. The HGStem module enables intelligent feature diversion and distribution, integrated with CSPNet's lightweight modular components to substantially reduce parameter count without compromising feature extraction capability. A hierarchical gating mechanism through HGBlock intelligently filters and prioritizes high-value features while suppressing redundant information. DWConv depthwise separable convolutions were integrated throughout the network to significantly reduce computational complexity while maintaining feature representation quality. An innovative DHAM (dual hierarchical attention mechanism) module was implemented for efficient feature fusion, balancing local detail preservation and global context understanding while reducing computational overhead. The model incorporates a state-of-the-art DTAB (dilated transformer attention block) attention mechanism to expand effective receptive field and capture comprehensive global contextual information crucial for precise seedling classification. A hierarchical processing pipeline effectively handles complex cross-channel and spatial information interactions through three parallel branches operating complementarily to extract diverse feature representations and enhance robustness. To optimize training efficiency and convergence characteristics, the PIoU (powerful IoU) loss function was adopted as the primary optimization criterion to prevent anchor box expansion issues and improve model training convergence speed and stability. Comprehensive experimental validation demonstrates that the proposed FAN-DETR model exhibits superior performance metrics. The system achieved an impressive mean average precision of 92.62% at IoU thresholds greater than 50%, while attaining excellent detection speed of 96.67 frames per second (FPS). Compared to the baseline RT-DETR model, these results represent significant improvements of 2.2 percentage points in accuracy and 22.82 percentage points in processing speed. The optimized architecture achieved substantial reductions of 29.59 percentage points in parameter count and 44.39 percentage points in computational complexity. Comparative analysis with mainstream YOLO series models (YOLOv5-m, YOLOv8-m, YOLOv10-m, and YOLOv12-m) revealed consistently superior performance across multiple evaluation metrics. Compared to YOLO models, FAN-DETR achieved average precision improvements of 6.77 percentage points, recall enhancements of 3.31 percentage points, and mean average precision increases of 1.50 percentage points at IoU thresholds greater than 50%. The 96.67 FPS performance represents an average improvement of 36.99 FPS compared to competing models, while maintaining an extremely compact model size of only 27.70MB, demonstrating exceptional parameter compression and deployment efficiency suitable for resource-constrained agricultural environments. The FAN-DETR model exhibits exceptional performance characteristics in real-time tomato plug seedling grading detection applications, providing a highly reliable and accurate automated grading solution suitable for industrial deployment. This research provides compelling evidence for the feasibility and effectiveness of real-time plug seedling grading detection systems, contributing to advancing agricultural automation technology development and supporting intelligent agricultural solutions that improve production efficiency, reduce labor costs, and enhance crop quality consistency in tomato production operations.

HTML全文

参考文献(33)

施引文献

资源附件(0)