Detecting cucumber downy mildew sporangia from low-magnification microscopic images using improved YOLOv8n model
-
Graphical Abstract
-
Abstract
The high-magnification (≥200×) microscopic imaging has limited to detect the downy mildew sporangia of the cucumber. Some challenges are remained on the excessive storage demands, incompatibility with the portable field devices, and discrepancies between laboratory-induced samples and natural agricultural environments. In this study, an enhanced You Only Look Once version 8n (YOLOv8n) model was developed to optimize for the practical low-magnification (100×) microscopy. The synergistic architecture approaches were introduced to improve the detection robustness in complex field scenarios: (1) A Whirl Convolution (WhirlConv) module was used to replace the standard convolutions. A four-branch architecture was employed with the reflection padding and independent convolutional kernels, in order to capture the multi-directional edge features, while to suppress the background noise via channel attention. The boundary distortion that caused by traditional zero-padding was mitigated for the random orientation of the sporangia. (2) High-resolution P2-layer features (spatial resolution: 160×160) were fused into the feature pyramid network. The multi-scale detection heads were obtained to improve the localization accuracy for the extremely small targets (average size: 31×27 pixels, occupying 0.02% of the 2560×1920-pixel input image). (3) The Spatial Pyramid Pooling-Fast (SPPF) module was augmented with a Large Separable Kernel Attention (LSKA) mechanism. The large separable kernels (e.g., 15×15) were utilized to capture the long-range dependencies and the global contextual information. The localized directional features were complemented to extract by WhirlConv. The dataset was constructed under authentic field conditions at the Xiaotangshan National Precision Agriculture Research Base (Beijing, China) using a volumetric spore sampler and LEICA DM3000 LED microscope at 100× magnification (10× eyepiece, 10× objective, 1× zoom). The natural challenges were observed to capture the images, including the dense sporangia clusters, overlapping structures, and interference from the field impurities (pollen, and dust). The dataset was comprised 300 raw images (expanded to 1,200 via rotation, hue shifts, and saturation adjustments), in order to align with the real-world agricultural scenarios. The annotation protocols were implemented under the supervision of plant pathologists, excluding ambiguous targets after morphological validation and dataset training. The optimal model was achieved in a precision of 94.2%, a recall of 90.1%, and a mean average precision at an intersection-over-union threshold of 0.5 (mAP@0.5) of 86.9%. The performance of the improved model also outperformed the baseline YOLOv8n by 10.0%, 7.2%, and 7.8%, respectively. Compared with the high-magnification models, the significant advantages were: the improved model surpassed YOLOv8x (258.1 giga floating-point operations per second GFLOPs, 79.1% mAP@0.5) by 7.8% in mAP@0.5, while reducing storage requirements by 75% and computational complexity by 78% (56.9 vs. 258.1 GFLOPs). Ablation studies confirmed that there were the great contributions of each module—WhirlConv alone was improved mAP@0.5 by 3.3%, while the P2 features and LSKA were integrated to synergistically enhance the performance. Visualization analysis demonstrated that the better robustness of the improved model was achieved in the field-relevant scenarios: in the dense clusters (174 targets per region), the model was reduced the false negatives by 14.4% (1.7% vs. 16.1% for YOLOv8n), and under impurity interference, the false positives were limited to 1.0%. Heatmap visualizations were also validated to focus on the densely packed sporangia, with the activation regions aligning closely with the ground-truth annotations. The practical deployment was maintained with 17.7 million parameters and 56.9 GFLOPs computational complexity, thus outperforming the mainstream detectors like RT-DETR-R18 (20.1 million parameters, 78.1% mAP@0.5) by 8.8% in mAP@0.5. Compared with the baseline YOLOv8n (8.2 GFLOPs), the improved architecture was achieved in a superior accuracy-efficiency trade-off, suitable for the resource-constrained agricultural systems. The lightweight adaptations can be prioritized for the embedded deployment without compromising detection fidelity in future, including depth-wise separable convolutions and quantization. This work can bridge the research gap between laboratory research and practical agricultural needs. The finding can provide a scalable solution to early disease monitoring in the low-magnification field microscopy.
-
-