基于改进YOLOv11n的轻量化茶叶病虫害识别方法

曾昆峰; 张鹏超; 王磊; 吴凡凡; 崔金凯; 马文星

doi:10.11975/j.issn.1002-6819.202508036

基于改进YOLOv11n的轻量化茶叶病虫害识别方法

Lightweight method for identifying tea pests and diseases based on an improved YOLOv11n

摘要

摘要: 针对茶叶病虫害检测模型面临的精度低、计算复杂度高等问题，提出了一种基于改进YOLOv11n的轻量化高精度模型WGSE-YOLOv11n。首先，为提升对多尺度病斑特征的捕捉能力，该研究在主干网络中设计了一种小波变换动态卷积架构WGD（wavelet transform GhostDynamic convolution architecture），该架构融合了小波池化（WaveletPool）模块的频域分解能力与动态卷积（C3k2-GhostDynamic）模块的轻量化特性，能在减少参数量的同时增强特征表达；其次，为强化对微小病斑及复杂边界的识别精度，在颈部网络设计小波池化星型融合网络WSF（wavelet unpooling StarFusion network），结合WaveletUnPool上采样和StarFusion特征增强模块，有效重建病斑的结构信息与纹理细节；最后，引入一种轻量级高效检测头（EfficientHead），通过GroupConv分组卷积进行通道分组进一步降低参数量和计算量。试验结果表明，该模型在自建数据集上实现了97.64%精确率、97.87%召回率和99.08%的模型平均精度，较基准模型YOLOv11n分别提升了0.24、3.62和0.77个百分点；同时参数量、计算量和模型大小分别压缩至1.51M、3.3G和3.2MB，较基准模型显著降低41.5%、47.6%和39.6%；嵌入式设备检测帧率达到26.14帧/s，单幅图像平均推理时间为14 ms。证明了该模型在大幅降低网络计算量的同时保持较高的识别精度，能够满足在移动端和嵌入式设备的部署要求。

Abstract: High accuracy and low computational complexity can be required to detect the tea pest and disease in modern agriculture. In this study, a lightweight and high-precision model, called WGSE-YOLOv11n, was proposed using an improved YOLOv11n architecture. A tea pest and disease dataset was also constructed to support the model training and validation. The dataset comprised 9 categories (healthy leaves, tea anthracnose, tea leaf spot disease, tea black rot, tea leaf rust, tea leaf blight, tea white spot disease, tea aphids, and tea spider mites), totaling 2,496 sample images. Two parts of the data sources consisted of: One part comprised 746 valid images that were captured at the Yunfeng Ecological Tea Garden in Hanzhong, Shaanxi Province, China, in 2025. Another part consisted of 1,750 images that were sourced from the Roboflow public dataset. All images were uniformly resized to a resolution of 640×640. Label Studio was used to annotate the disease regions in the images and then convert them into YOLO format. Simultaneously, Mosaic data augmentation (including translation, rotation, scaling, brightness adjustment, and noise injection) operations were applied to expand the sample size to 6,150 images. The dataset was ultimately divided into the training (4,296 images), validation (1,235 images), and test (619 images) sets at a ratio of 7:2:1. In the WGSE-YOLOv11n model design, firstly, the Wavelet-Gaussian Dynamic Convolution (WGD) architecture was developed in the backbone network. The frequency-domain decomposition of the WaveletPool module was integrated with the lightweight properties of the C3k2-GhostDynamic module. Feature representation was enhanced to reduce the parameter complexity, thereby improving the capture of multi-scale lesion features. Secondly, the neck network was incorporated with a Wavelet Pooling Star Fusion (WSF) architecture. WaveletUnPool upsampling and StarFusion feature enhancement modules were combined to effectively reconstruct the lesion structural and texture information, thereby improving the recognition accuracy for the minute lesions and complex boundaries. Finally, a lightweight EfficientHead detection module was introduced. Channel grouping via the GroupConv architecture further reduced the parameters and computational complexity. Grad-CAM was employed to generate the heatmaps for the tea leaf pest and disease detection, in order to validate the performance of the detection. Results show that the WGSE-YOLOv11n model accurately and rapidly located the lesions, whose heatmaps exhibited the high-response zones in the pathological tissue boundaries, indicating the strong spatial coupling with the lesion morphology. In contrast, the heatmaps of the YOLOv11n baseline model shared significant spatial diffusion, indicating the weak responses in the diseased areas, edge attenuation, and false activation of non-pathological tissues. Edge response intensity was significantly enhanced in the WGSE-YOLOv11n model, while the false-positive activations were reduced substantially. There was no feature confusion in the multi-leaf scenarios. The better performance of the improved model also outperformed that of the baseline, particularly in the response intensity toward the minute targets, like the ring lesions of the tea cake disease and tea aphids. There were great improvements in the pathological feature focus, edge precision, and multi-class robustness. The improved model was also deployed on a Jetson Orin NX development board that connected to a D455 camera for real-time image capture. The practical performance on the mobile devices was further validated after deployment. Among them, TensorRT was integrated for operator acceleration and INT8 quantization. CUDA was then utilized for the multithreaded parallel preprocessing. The computational efficiency enhanced the detection rate, in order to prevent computational constraints on the embedded platforms. Experimental results demonstrate that the WGSE-YOLOv11n model achieved 97.64% precision, 97.87% recall, and 99.08% mean average precision on the self-built dataset, respectively, indicating the improvements of 0.24%, 3.62%, and 0.77%, respectively, over the baseline YOLOv11n model. Parameter count, computational load, and model size were compressed to 1.51 million parameters, 3.3 gigabytes of computational load, and 3.2 megabytes, respectively, which were significantly reduced by 41.5%, 47.6%, and 39.6%, respectively, compared with the baseline model. On Jetson Orin NX, the detection frame rate reached 246.14 frames per second, with an average inference time of 14ms per image. The high accuracy of the recognition was obtained to substantially reduce the computational load, suitable for the deployment on mobile and embedded devices, as well as the real-time detection in field environments.

HTML全文

参考文献(35)

施引文献

资源附件(0)