Abstract:
A grapevine is one of the most widely cultivated fruits in the globe. However, the leaf diseases have significantly impacted their health and productivity. Traditional disease detection can also struggle with the small size of the lesions, the weak edge features, and the interference from the complex backgrounds in the natural scenes, such as the sunlight, shadows, or leaf texture. In this study, an improved RT-DETR (Recurrent Transformer-based Detection Transformer) model was proposed to accurately and rapidly detect grape leaf diseases. Several key enhancements were introduced to improve the RT-DETR model, which was originally designed for general object detection. Firstly, a Multi-Level Margin Feature Information Enhancer (MLMFIE) was introduced using Sobel operator convolution. The edge features were then extracted from the lesions of the grape leaf disease. Sobel convolution was used to highlight the boundaries of the lesions, in order to more accurately differentiate between the affected regions and the healthy parts of the leaf. The edge feature information was enhanced across multiple scales. The MLMFIE was used to detect the lesions with subtle or unclear boundaries, particularly in natural environments where the lesions were not defined sharply. Secondly, the Adaptive Sparse Self-Attention (ASSA) mechanism was introduced to reduce the distraction from the complex background in the natural scenes, thus focusing on the lesion itself. The irrelevant background features were then filtered out from the background, in order to concentrate on the lesion areas. The weight of the sparse and dense attention branches was dynamically adjusted to focus the computational resources on the most critical regions. The lesions were detected under the environmental limiting factors, such as the lighting or overlapping leaves. The adaptive mechanism has significantly improved the accuracy of the detection, particularly in cluttered or noisy backgrounds. Thirdly, the Small Lesion Feature Focusing Module (SLFFM) was introduced to detect the small lesions. The reason was that the small lesions were often difficult to distinguish, due to their size and proximity to the rest features on the leaf. The SLFFM was introduced to integrate the additional extraction of the smaller-scale features. The fine details of the small lesions remained during downsampling in many models. Specifically, the small lesions were detected with high accuracy for these challenging cases. The experimental results demonstrate that the improved model outperformed than before. The improved RT-DETR model was achieved with an average precision of 92.7%, which was a 4.5 percentage point improvement over the baseline. Additionally, compared with the mainstream object detection models, such as Faster R-CNN, YOLOv5L, YOLOv8L, SSD, Deformable-DETR, Efficient-DETR, and MS-DETR, the improved model shared the increases ranging from 0.7 to 14.7 percentage points, with the largest gains in the detection of the small lesion. Particularly, the primary challenges were attributed to the real background interference of the natural environments and the small sizes of the lesion. Furthermore, the strong generalization was tested on the datasets from the various crops, including apples, corn, tomatoes, and strawberries. The superior performance of the detection was maintained across various agricultural contexts. A versatile tool was also developed for the automated detection of crop diseases in diverse environments. In conclusion, the improved MSA-DETR model was significantly advanced to detect grape leaf diseases. The fine-grained edge features were then captured to focus on the small lesions. Background interference was also reduced for the highly accurate and efficient solution. Its excellent generalization can also be a promising tool for disease detection in a variety of agricultural settings. The findings can offer valuable insights for future research and practical applications in precision agriculture. The MSA-DETR model was well suited for real-time and field deployment, due to its compact architecture and strong performance. As such, substantial improvement was also gained over previous models in the accuracy and computational efficiency.