基于不同改进语义分割模型的云南高原山地温室设施遥感提取

林先成; 彭立

doi:10.11975/j.issn.1002-6819.202505265

基于不同改进语义分割模型的云南高原山地温室设施遥感提取

林先成,
彭立

Remote sensing extraction of greenhouse facilities in Yunnan Plateau Mountainous Areas using various improved semantic segmentation models

摘要

摘要: 针对当前温室设施提取方法存在的模型对比系统性不足、场景适应性分析薄弱等问题，该研究提出一种基于深度学习的遥感影像温室设施提取方法，以提升提取精度与场景适应能力。选取云南省安宁市为研究区，基于0.3 m分辨率WorldView-3影像构建包含14 000个样本的多特征数据集，采用Enhanced-SegUNet、Dense-SegUNet++和MS-DeepLabV3+三种改进语义分割模型，融合多尺度特征、动态卷积与光谱增强等技术进行训练与对比。Dense-SegUNet++模型表现最优，交并比（intersection over union，IoU）达0.92，F1分数为0.93，较基准模型提升13%，在复杂和分散的温室设施区域具有较高空间细节保留能力；MS-DeepLabV3+在颜色差异显著区域表现突出，Enhanced-SegUNet在阴影与低对比度场景中识别效果良好。研究结果可为耕地非粮化监测提供有效技术支撑。

Abstract: Accurate mapping of greenhouse facilities can contribute to the non-grain utilization of cropland and fine-scale regulation in the rapidly transforming farming in China. In this study, an integrated deep learning framework was developed to extract the greenhouse from very high-resolution remote sensing imagery. A systematic comparison was also made on three improved semantic segmentation models with different architectural strategies. A typical plateau–mountain region was taken from the Anning City in Yunnan Province of China. The greenhouse area increased by 23.7% per year from 2020 to 2022. The study area was selected as the small, scattered and embedded greenhouses in complex terrain. A multi-feature dataset was constructed with 14,000 image tiles using 0.3 m pansharpened WorldView-3 imagery. A four-level experiment was designed to explicitly classify the greenhouse types, common backgrounds, transient disturbances (e.g., shadows, and clouds), and non-ideal samples (e.g., partially occluded or damaged structures). Robust feature learning was realized in an encoder–decoder mode. Three improved models were implement: (1) Enhanced-SegUNet, where the residual blocks, CBAM attention and a hybrid Dice–Boundary loss were combined to represent deep segmentation and boundary precision under low contrast or shadow; (2) Dense-SegUNet++, where the UNet++ was extended with densely connected skip paths, multi-branch deep supervision and dynamic convolution to enhance multi-level feature fusion and adaptively complex textures; and (3) MS-DeepLabV3+, where DeepLabV3+ was augmented with an extended multi-rate ASPP module. Spectral enhancement branch was used to exploit the red-edge and NIR bands, as wel as lightweight depthwise separable convolutions. All models were trained under a unified strategy with transfer learning, two-stage fine-tuning, label smoothing, spatial dropout and L2 regularization. A systematic evaluation was conducted using pixel-level (IoU, F1, and mPA) and object-level (detection rate, false alarm rate, and segmentation quality index) metrics, as well as inference efficiency (FPS, GMACs, and peak memory). Experiments were performed on an independent test set of 2,800 tiles. The results showed that all three improved models outperformed the baseline U-Net. Dense-SegUNet++ was also achieved in the best overall performance, with an IoU of 0.924, F1-score of 0.931 and segmentation quality index of +11.6% over U-Net. Fine spatial structures were then preserved to delineate the greenhouse boundaries in the densely clustered and background-rich environments. MS-DeepLabV3+ model exceled in areas with strong spectral contrast and diverse materials, due to its strengthened multi-scale and spectral modeling. Whereas Enhanced-SegUNet shared the superior robustness in the shadowed and low-contrast scenes, as well as the dark-grey greenhouse roofs. Efficiency analysis on an NVIDIA RTX 4070 platform indicated that the framework was operationally viable: Dense-SegUNet++ reacheed41 FPS with acceptable computational cost (21.6 GMACs), while MS-DeepLabV3+ provided a favorable balance between accuracy and speed. When embedded into a processing chain, the county-level monitoring efficiency was improved by a factor of 18.3, compared with conventional manual interpretation. Standardized vector products were directly integrated into non-grain cropland monitoring workflows for the area statistics, detection and regulatory zoning. There were the complementary strengths of different deep architectures under various scenarios. The finding also highlighted the practical potential of the framework to high-frequency, fine-scale supervision of cropland non-grain utilization in complex plateau–mountain agricultural regions. Limitations also remained to distinguish the greenhouses from spectrally similar bright buildings. It is also required to recognize the severely damaged or newly emerging structures with few training examples. Multi-temporal and multi-source (e.g., SAR) data and lightweight architecture can also be incorporated into the edge deployment for the high robustness and scalability.

HTML全文

参考文献(31)

施引文献

资源附件(0)