高级检索+

基于改进YOLOv8的多光谱遥感影像脐橙树检测方法

Detecting citrus from multispectral remote sensing images using an improved YOLOv8 model

  • 摘要: 为实现不同光谱环境下脐橙树单株高精度实时检测,提出一种基于改进YOLOv8的多光谱图像检测模型YOLO-DBME。首先,提出了改进的频率感知多光谱注意力模块(frequency-aware multispectral attention module,FAMA),将多光谱图像特征映射至频率域,提取具有判别性的频率信息,从而强化特征通道间的关联性。其次,重构主干网络为双主干结构(dual-backbone),引入可逆架构保持信息的完整传递,通过辅助分支生成可靠梯度,促进主分支的稳定训练。最后,构建自适应尺度平衡头模块(adaptive scale-balanced head,ASBHead),自适应地学习不同尺度特征的空间融合权重,实现多尺度信息的动态平衡。试验结果表明,YOLO-DBME在精确率(93.4%)、召回率(94.3%)、F1值(0.938)、mAP0.5(97.2%)、mAP0.5-0.95(71.7%)、计算成本(19.9G)和参数量(6.2M)等关键指标上均优于原YOLOv8模型,其中精确率提升1.4个百分点,召回率提升2个百分点,F1值提高0.016,mAP0.5提升0.8个百分点,mAP0.5-0.95提升2.3个百分点。同时,计算成本降低8.7G,参数量减少5M。相较于YOLOv9至YOLOv12及RT-DETR等主流模型,YOLO-DBME在精确率、召回率、F1值、mAP0.5、计算成本和参数等多项指标上均表现最优。

     

    Abstract: Precise identification of single trees can play a vital role in precision agriculture and digital orchards. However, existing approaches can often suffer from limited spectral utilization, low robustness under canopy occlusion, and less generalization under heterogeneous environments. In this study, a multispectral framework was developed using an enhanced You Only Look Once version 8 (YOLOv8) architecture. An accurate and real-time detection of the individual navel orange trees was also realized under complex and variable spectral environments. An improved multispectral model of object detection, named YOLO-DBME, was proposed to integrate the frequency-domain enhancement, dual-branch structural optimization, and adaptive multi-scale feature balancing. Three modules were incorporated to strengthen its feature extraction and detection. Firstly, a Frequency-Aware Multispectral Attention Module (FAMA) was introduced to project the multispectral feature representations into the frequency domain using the discrete cosine transformation. Discriminative frequency components were captured to emphasize the informative spectral cues. The inter-channel dependencies were reinforced to reduce the redundant responses, thereby improving the perception of the subtle reflectance variations caused by the leaf texture, chlorophyll concentration, and canopy density. Secondly, the backbone network was redesigned as a Dual-Backbone architecture. An invertible auxiliary branch was used for the complete information transmission and stable gradient propagation during network optimization. There was a consistent feature consistency between high-level semantic features and low-level spatial details. As such, the structural information was extracted from the dense and irregular canopies. Thirdly, an Adaptive Scale-Balancing Head (ASBHead) was developed to dynamically learn the spatial fusion weights among multiple feature scales, thus adjusting to the size, density, and occlusion level of the targets. The multi-scale information was effectively balanced to enhance the detection robustness in the high-density orchard scenes. Experiments were conducted to verify the model. The multispectral UAV dataset was collected from Gannan naval orange orchards in southern China. The visible, red-edge, and near-infrared spectral bands were selected to capture the canopy structure and photosynthetic features.The study evaluated six state-of-the-art models, including YOLOv8–YOLOv12 and RT-DETR. The results showed that compared to YOLOv8, YOLO-DBME achieved improvements of 1.4 and 2.0 percentage points in precision (93.4%) and recall (94.3%), respectively, an increase of 0.016 in F1-score (0.938), while reducing computational cost by 8.7G and parameters by 5M. In terms of mAP metrics, under both lenient (mAP0.5) and strict (mAP0.5-0.95) IoU thresholds, YOLO-DBME outperformed YOLOv8 by 0.8 and 2.3 percentage points, respectively.The recall improvement indicated that the model was more sensitive to the small and partially occluded targets. While the high precision was obtained to distinguish the orange crowns from the background vegetation and soil interference. Ablation experiments further verified that there made a great contribution of each module. The FAMA module improved the recall and F1-score using frequency-domain cues, indicating its importance in the spectral feature enhancement. The Dual-Backbone structure strengthened the feature fusion between hierarchical layers, thus reducing the false detections for the model stability during training. The ASBHead was further incorporated to boost the small-object recognition and overall robustness. Feature fusion was adaptively optimized over the scales. A balanced architecture was achieved to efficiently learn both spectral and structural attributes. Visual comparisons showed that the YOLO-DBME successfully detected all tree crowns within densely planted areas, while the conventional YOLOv8 and YOLOv9 models failed to identify several occluded targets. In summary, the YOLO-DBME framework significantly improved the precision, robustness, and generalization of the multispectral object detection for the navel orange trees. The frequency-domain attention enhanced the spectral discrimination. The dual-branch design reinforced the gradient stability and information completeness. The adaptive balancing head optimized the multi-scale feature fusion. Together, these real-time and high-accuracy detection was realized on the unmanned aerial vehicle platforms. The key limitations of existing single-stage detectors were overcome in the complex orchard environments. The YOLO-DBME can provide a practical and efficient solution to monitor the fruit tree. Strong potential can also offer to integrate into the large-scale precision agriculture. This finding can greatly contribute to the UAV multispectral remote sensing.

     

/

返回文章
返回