基于改进YOLOv8n的低倍率显微图像黄瓜霜霉病孢子囊检测

张志军; 李明; 刘凯歌; 查志华

doi:10.11975/j.issn.1002-6819.202503142

基于改进YOLOv8n的低倍率显微图像黄瓜霜霉病孢子囊检测

Improved YOLOv8n-based model for low-magnification microscopic detection of cucumber downy mildew sporangia

摘要

摘要: 针对低倍率显微图像下黄瓜霜霉病孢子囊检测存在人工效率低、传统算法鲁棒性不足的问题，提出一种改进的YOLOv8n模型。通过多模块协同优化提升检测性能：1）设计WhirlConv（whirl convolution）模块，采用四分支反射填充与独立卷积核捕获多方向特征，结合通道注意力机制抑制冗余信息；2）引入P2层级高分辨率特征图构建多尺度检测头，扩展极小目标覆盖范围；3）在SPPF（spatial pyramid pooling-fast）模块中嵌入LSKA（large separable kernel attention）注意力机制，通过大分离卷积核捕获长程依赖关系，在保持模块轻量化的同时实现性能的提升。试验表明，改进模型在自建数据集上精确度达到94.2%，召回率达到90.1%，平均精度均值（mAP_0.5）达到86.9%，较基准模型YOLOv8n分别提升10、7.2和7.8个百分点，参数量（17.7 M）和浮点运算量（56.9 G）比RT-DETR-R50分别低25.1 M和77.5 G。该模型为低倍率显微图像下的孢子囊检测提供了一种检测方法。

Abstract: To address the inherent limitations of high-magnification (≥200×) microscopic imaging in cucumber downy mildew sporangia detection—including excessive storage demands, incompatibility with portable field devices, and discrepancies between laboratory-induced samples and natural agricultural environments—this study developed an enhanced You Only Look Once version 8n (YOLOv8n) model optimized for practical low-magnification (100×) microscopy. Three synergistic architectural innovations were introduced to improve detection robustness in complex field scenarios: (1) A Whirl Convolution (WhirlConv) module replaced standard convolutions, employing a four-branch architecture with reflection padding and independent convolutional kernels to capture multi-directional edge features while suppressing background noise via channel attention. This design mitigated boundary distortion caused by traditional zero-padding and enhanced adaptability to the random orientation of sporangia. (2) High-resolution P2-layer features (spatial resolution: 160×160) were fused into the feature pyramid network, enabling multi-scale detection heads to improve localization accuracy for extremely small targets (average size: 31×27 pixels, occupying 0.02% of the 2560×1920-pixel input image). (3) The Spatial Pyramid Pooling-Fast (SPPF) module was augmented with a Large Separable Kernel Attention (LSKA) mechanism, leveraging large separable kernels (e.g., 15×15) to capture long-range dependencies and global contextual information, complementing the localized directional features extracted by WhirlConv. The dataset was constructed under authentic field conditions at the Xiaotangshan National Precision Agriculture Research Base (Beijing, China) using a volumetric spore sampler and LEICA DM3000 LED microscope at 100× magnification (10× eyepiece, 10× objective, 1× zoom). This approach captured natural challenges, including dense sporangia clusters, overlapping structures, and interference from field impurities (pollen, dust). The dataset comprised 300 raw images (expanded to 1,200 via rotation, hue shifts, and saturation adjustments), ensuring alignment with real-world agricultural scenarios. Rigorous annotation protocols were implemented under the supervision of plant pathologists, excluding ambiguous targets through morphological validation. Trained on this dataset, the optimized model achieved a precision of 94.2%, a recall of 90.1%, and a mean average precision at an intersection-over-union threshold of 0.5 (mAP@0.5) of 86.9%, outperforming the baseline YOLOv8n by 10.0%, 7.2%, and 7.8%, respectively. Comparative evaluations against high-magnification models revealed significant advantages: the proposed model surpassed YOLOv8x (258.1 giga floating-point operations per second GFLOPs, 79.1% mAP@0.5) by 7.8% in mAP@0.5 while reducing storage requirements by 75% and computational complexity by 78% (56.9 vs. 258.1 GFLOPs). Ablation studies confirmed the contributions of each module—WhirlConv alone improved mAP@0.5 by 3.3%, while integrating P2 features and LSKA further enhanced performance synergistically. Visualization analyses demonstrated the model’s robustness in field-relevant scenarios: in dense clusters (174 targets per region), the model reduced false negatives by 14.4% (1.7% vs. 16.1% for YOLOv8n), and under impurity interference, false positives were limited to 1.0%. Heatmap visualizations further validated the model’s ability to focus on densely packed sporangia, with activation regions aligning closely with ground-truth annotations. The model maintained practical deployability with 17.7 million parameters and 56.9 GFLOPs computational complexity, outperforming mainstream detectors like RT-DETR-R18 (20.1 million parameters, 78.1% mAP@0.5) by 8.8% in mAP@0.5. Despite increased computational demands compared to the baseline YOLOv8n (8.2 GFLOPs), the proposed architecture achieved a superior accuracy-efficiency trade-off, making it suitable for resource-constrained agricultural systems. Future efforts will prioritize lightweight adaptations, including depth-wise separable convolutions and quantization techniques, to optimize the model for embedded deployment without compromising detection fidelity. This work bridges the gap between laboratory research and practical agricultural needs, providing a scalable solution for early disease monitoring in low-magnification field microscopy.

HTML全文

参考文献(34)

施引文献

资源附件(0)