Abstract:
To address the inherent limitations of high-magnification (≥200×) microscopic imaging in cucumber downy mildew sporangia detection—including excessive storage demands, incompatibility with portable field devices, and discrepancies between laboratory-induced samples and natural agricultural environments—this study developed an enhanced You Only Look Once version 8n (YOLOv8n) model optimized for practical low-magnification (100×) microscopy. Three synergistic architectural innovations were introduced to improve detection robustness in complex field scenarios: (1) A Whirl Convolution (WhirlConv) module replaced standard convolutions, employing a four-branch architecture with reflection padding and independent convolutional kernels to capture multi-directional edge features while suppressing background noise via channel attention. This design mitigated boundary distortion caused by traditional zero-padding and enhanced adaptability to the random orientation of sporangia. (2) High-resolution P2-layer features (spatial resolution: 160×160) were fused into the feature pyramid network, enabling multi-scale detection heads to improve localization accuracy for extremely small targets (average size: 31×27 pixels, occupying 0.02% of the
2560×1920-pixel input image). (3) The Spatial Pyramid Pooling-Fast (SPPF) module was augmented with a Large Separable Kernel Attention (LSKA) mechanism, leveraging large separable kernels (e.g., 15×15) to capture long-range dependencies and global contextual information, complementing the localized directional features extracted by WhirlConv. The dataset was constructed under authentic field conditions at the Xiaotangshan National Precision Agriculture Research Base (Beijing, China) using a volumetric spore sampler and LEICA DM3000 LED microscope at 100× magnification (10× eyepiece, 10× objective, 1× zoom). This approach captured natural challenges, including dense sporangia clusters, overlapping structures, and interference from field impurities (pollen, dust). The dataset comprised 300 raw images (expanded to 1,200 via rotation, hue shifts, and saturation adjustments), ensuring alignment with real-world agricultural scenarios. Rigorous annotation protocols were implemented under the supervision of plant pathologists, excluding ambiguous targets through morphological validation. Trained on this dataset, the optimized model achieved a precision of 94.2%, a recall of 90.1%, and a mean average precision at an intersection-over-union threshold of 0.5 (mAP@0.5) of 86.9%, outperforming the baseline YOLOv8n by 10.0%, 7.2%, and 7.8%, respectively. Comparative evaluations against high-magnification models revealed significant advantages: the proposed model surpassed YOLOv8x (258.1 giga floating-point operations per second GFLOPs, 79.1% mAP@0.5) by 7.8% in mAP@0.5 while reducing storage requirements by 75% and computational complexity by 78% (56.9 vs. 258.1 GFLOPs). Ablation studies confirmed the contributions of each module—WhirlConv alone improved mAP@0.5 by 3.3%, while integrating P2 features and LSKA further enhanced performance synergistically. Visualization analyses demonstrated the model’s robustness in field-relevant scenarios: in dense clusters (174 targets per region), the model reduced false negatives by 14.4% (1.7% vs. 16.1% for YOLOv8n), and under impurity interference, false positives were limited to 1.0%. Heatmap visualizations further validated the model’s ability to focus on densely packed sporangia, with activation regions aligning closely with ground-truth annotations. The model maintained practical deployability with 17.7 million parameters and 56.9 GFLOPs computational complexity, outperforming mainstream detectors like RT-DETR-R18 (20.1 million parameters, 78.1% mAP@0.5) by 8.8% in mAP@0.5. Despite increased computational demands compared to the baseline YOLOv8n (8.2 GFLOPs), the proposed architecture achieved a superior accuracy-efficiency trade-off, making it suitable for resource-constrained agricultural systems. Future efforts will prioritize lightweight adaptations, including depth-wise separable convolutions and quantization techniques, to optimize the model for embedded deployment without compromising detection fidelity. This work bridges the gap between laboratory research and practical agricultural needs, providing a scalable solution for early disease monitoring in low-magnification field microscopy.