Detecting citrus from multispectral remote sensing images using an improved YOLOv8 model
-
Graphical Abstract
-
Abstract
Precise identification of single trees can play a vital role in precision agriculture and digital orchards. However, existing approaches can often suffer from limited spectral utilization, low robustness under canopy occlusion, and less generalization under heterogeneous environments. In this study, a multispectral framework was developed using an enhanced You Only Look Once version 8 (YOLOv8) architecture. An accurate and real-time detection of the individual navel orange trees was also realized under complex and variable spectral environments. An improved multispectral model of object detection, named YOLO-DBME, was proposed to integrate the frequency-domain enhancement, dual-branch structural optimization, and adaptive multi-scale feature balancing. Three modules were incorporated to strengthen its feature extraction and detection. Firstly, a Frequency-Aware Multispectral Attention Module (FAMA) was introduced to project the multispectral feature representations into the frequency domain using the discrete cosine transformation. Discriminative frequency components were captured to emphasize the informative spectral cues. The inter-channel dependencies were reinforced to reduce the redundant responses, thereby improving the perception of the subtle reflectance variations caused by the leaf texture, chlorophyll concentration, and canopy density. Secondly, the backbone network was redesigned as a Dual-Backbone architecture. An invertible auxiliary branch was used for the complete information transmission and stable gradient propagation during network optimization. There was a consistent feature consistency between high-level semantic features and low-level spatial details. As such, the structural information was extracted from the dense and irregular canopies. Thirdly, an Adaptive Scale-Balancing Head (ASBHead) was developed to dynamically learn the spatial fusion weights among multiple feature scales, thus adjusting to the size, density, and occlusion level of the targets. The multi-scale information was effectively balanced to enhance the detection robustness in the high-density orchard scenes. Experiments were conducted to verify the model. The multispectral UAV dataset was collected from Gannan naval orange orchards in southern China. The visible, red-edge, and near-infrared spectral bands were selected to capture the canopy structure and photosynthetic features.The study evaluated six state-of-the-art models, including YOLOv8–YOLOv12 and RT-DETR. The results showed that compared to YOLOv8, YOLO-DBME achieved improvements of 1.4 and 2.0 percentage points in precision (93.4%) and recall (94.3%), respectively, an increase of 0.016 in F1-score (0.938), while reducing computational cost by 8.7G and parameters by 5M. In terms of mAP metrics, under both lenient (mAP0.5) and strict (mAP0.5-0.95) IoU thresholds, YOLO-DBME outperformed YOLOv8 by 0.8 and 2.3 percentage points, respectively.The recall improvement indicated that the model was more sensitive to the small and partially occluded targets. While the high precision was obtained to distinguish the orange crowns from the background vegetation and soil interference. Ablation experiments further verified that there made a great contribution of each module. The FAMA module improved the recall and F1-score using frequency-domain cues, indicating its importance in the spectral feature enhancement. The Dual-Backbone structure strengthened the feature fusion between hierarchical layers, thus reducing the false detections for the model stability during training. The ASBHead was further incorporated to boost the small-object recognition and overall robustness. Feature fusion was adaptively optimized over the scales. A balanced architecture was achieved to efficiently learn both spectral and structural attributes. Visual comparisons showed that the YOLO-DBME successfully detected all tree crowns within densely planted areas, while the conventional YOLOv8 and YOLOv9 models failed to identify several occluded targets. In summary, the YOLO-DBME framework significantly improved the precision, robustness, and generalization of the multispectral object detection for the navel orange trees. The frequency-domain attention enhanced the spectral discrimination. The dual-branch design reinforced the gradient stability and information completeness. The adaptive balancing head optimized the multi-scale feature fusion. Together, these real-time and high-accuracy detection was realized on the unmanned aerial vehicle platforms. The key limitations of existing single-stage detectors were overcome in the complex orchard environments. The YOLO-DBME can provide a practical and efficient solution to monitor the fruit tree. Strong potential can also offer to integrate into the large-scale precision agriculture. This finding can greatly contribute to the UAV multispectral remote sensing.
-
-