Abstract:
As forest resource management increasingly emphasized precision and digitization, unmanned aerial vehicle (UAV) technology emerged as a promising tool for intelligent and automated forest inventory, though challenges like imprecise crown segmentation, limited accuracy in single-tree volume estimation, and the high cost of high-precision Light Detection and Ranging (LiDAR) point cloud data hindered its broad adoption. These limitations prompted this study to develop a method for single-tree volume estimation using UAV-derived visible light imagery and low-density point cloud data, focusing on enhancing crown segmentation precision and improving volume estimation through multi-source feature integration. A crown segmentation network called CrownSeg was introduced, utilizing UAV visible light imagery and built upon the YOLOv11 framework with several specialized modules. The ScaleEdgeExtractor (SEE) module employed a three-stage mechanism—shallow filtering, edge enhancement, and cross-layer fusion—combining directional Sobel convolution, multi-scale downsampling, and adaptive edge-feature fusion to effectively preserve and enhance crown boundary information. The Gated Feature Pyramid Network (GatedFPN) adopted a bi-directional hierarchical structure with spatial-channel dual-attention gating, enabling closed-loop multi-scale optimization and more refined crown segmentation across different canopy densities. The C2BRA module introduced bi-level routing attention and a channel-spatial dual-attention mechanism to enhance boundary perception while suppressing background interference from complex forest environments. Meanwhile, the DilatedFusion (DF) module leveraged parallel dilated convolutions with shared kernels to extract multi-granularity contextual information, improving adaptability to trees of varying shapes and sizes. These modules worked collaboratively to enhance spatial detail retention and semantic feature abstraction, resulting in high-quality segmentation outputs. For volume estimation, a model was developed combining crown morphological, spectral, and textural features extracted from UAV imagery with tree height data from low-density LiDAR point clouds. A progressive feature combination strategy and a weighted ensemble learning technique were employed to integrate these multi-source inputs for robust prediction. The CrownSeg network achieved an Average Precision at an Intersection over Union threshold of 0.5 (AP50) of 94.9% and an AP50-95 of 66.2% surpassing the baseline model by 1.5 and 3.8 percentage points respectively due to enhanced boundary delineation and multi-scale feature representation. The weighted ensemble model for volume estimation yielded a coefficient of determination (
R2) of
0.9215, a mean absolute error (MAE) of
0.0228 cubic meters, and a mean absolute percentage error (MAPE) of 17.00%, outperforming standalone models. Comparative analyses showed that integrating morphological, spectral, and textural features significantly reduced estimation errors, with the ensemble model demonstrating superior stability and generalization across diverse forest conditions. These findings were validated by experimental data from 749 single trees in a plantation forest, where error metrics were consistently lower than those of individual algorithms like Random Forest or Neural Networks. Visual inspections confirmed CrownSeg’s excellence in handling complex canopy structures and minimizing segmentation errors in dense or heterogeneous stands, ultimately establishing a high-precision crown segmentation network and an accurate single-tree volume estimation model that leveraged UAV-based data to offer a cost-effective, efficient alternative to traditional ground-based surveys, providing a practical technical framework for UAV remote sensing applications in precision forestry. Future efforts are suggested to explore multi-modal data integration, such as combining LiDAR and optical imagery, to further refine segmentation and estimation accuracy in varied forest environments.