基于SE-VUNet模型的高分辨率遥感影像耕地提取

朱登峰; 牛全福; 王刚; 程西安; 邵东虎; 周鑫蓉

doi:10.11975/j.issn.1002-6819.202412171

基于SE-VUNet模型的高分辨率遥感影像耕地提取

Extraction of Cultivated Land from High-Resolution Remote Sensing Images Based on SE-VUNet Modeling

摘要

摘要: 为应对多样地形下耕地分割中边界模糊、细节缺失等问题，提出一种改进的U-Net耕地提取方法。此方法融合VGG网络加深主干特征提取（V-UNet），嵌入Squeeze-and-Excitation（SE）注意力机制优化特征定位与边缘细节，利用Batchnormalization（BN）层抑制过拟合；并通过在V-UNet网络五个关键位置嵌入SE模块形成5种SE-VUNet模型—SE-VUNet（1-5）；基于GID高分二号RGB数据，在平整集中与复杂冗余两种耕地地形下，与PSPNet、HrNet、Deeplabv3+、U-Net进行对比实验。结果表明，两种地形下，5种SE-VUNet均优于对比网络；SE模块置于下采样之前的SE-VUNet对平整集中耕地分割最优（MIoU=96.66%, F_1-score=97.57%）；SE模块置于特征学习部分的SE-VUNet对复杂冗余耕地分割效果最佳（MIoU=94.40%, F_1-score=97.11%）。此模型可为应对多样地形下，耕地分割中边界模糊、细节缺失等问题提供技术参考。

Abstract: Accurately extracting cultivated land from high-resolution remote sensing (HRRS) imagery is critical for food security, agricultural planning, and ecological management. However, existing deep learning methods struggle with boundary ambiguity, detail loss, and adaptability across diverse terrains, particularly in areas with fragmented parcels, spectral heterogeneity (e.g., varying crop types, soil moisture, phenology), and complex mixtures with spectrally similar non-cropland covers. This study aims to overcome these limitations by developing a terrain-adaptive segmentation model for robust cultivated land extraction. We propose SE-VUNet, an enhanced U-Net architecture integrating three key innovations: 1) VGG-Enhanced Encoder: Replacing the standard encoder with a VGG-based deep feature extractor to capture richer multi-scale contextual information, improving representation of local textures (e.g., field ridges, ditches) and global patterns (e.g., plain vs. terrace distributions). 2) Terrain-Adaptive Squeeze-and-Excitation (SE) Attention: Strategically embedding SE modules to dynamically recalibrate channel-wise feature importance, enhancing vegetation-relevant channels while suppressing noise. Five distinct variants (SE-VUNet(1) to (5)) were created by embedding SE modules at: Shallow Feature Layer (1), Pre-Downsampling (2), Skip-Connection (3), Decoder Fusion (4), and Feature Learning Module (5). 3) Batch Normalization (BN) Optimization: Integrating BN layers after each convolutional block to mitigate internal covariate shift, accelerate convergence, reduce overfitting (crucial given limited labeled data), and enhance generalization. Comprehensive experiments utilized the Gaofen Image Dataset (GID) derived from Gaofen-2 (GF-2) satellite RGB imagery, evaluating performance on two key terrain types: (i) Flat Homogeneous Land (large, contiguous fields, uniform spectra, low interference) and (ii) Complex Heterogeneous Land (small, irregular fields, blurred boundaries, high spectral variability, significant non-cropland mixing). SE-VUNet variants were benchmarked against PSPNet, HrNet, Deeplabv3+, and baseline U-Net. All five SE-VUNet variants outperformed baselines across both terrains, validating VGG feature extraction and SE attention integration. Crucially, optimal SE placement was terrain-dependent: Flat Homogeneous Terrain: SE-VUNet(2) (SE Pre-Downsampling) achieved superior performance with a Mean Intersection over Union (MIoU) of 96.66% and an F1-score of 97.57%. This configuration excels by amplifying high-resolution shallow features early, preserving critical fine linear details like field boundaries and irrigation canals. It outperformed the best baseline (typically Deeplabv3+) by over 4.85% in accuracy and 6.36% in F1-score. Complex Heterogeneous Terrain: SE-VUNet(5) (SE in Feature Learning Module) delivered optimal results, achieving an MIoU of 94.40% and an F1-score of 97.11%. This placement enhances adaptive multi-scale feature fusion and deep feature refinement, significantly improving discrimination of spectrally ambiguous classes (e.g., crops vs. grasslands) and resolution of intricate fragmented boundaries. Gains over the strongest baseline (typically HrNet or Deeplabv3+) were substantial, exceeding 6% in accuracy and 11% in MIoU. Quantitative analysis confirmed SE-VUNet's significant reduction in boundary localization errors and improvement in capturing small-field details compared to all baselines. The explicit terrain-based module optimization strategy proved highly effective. This study demonstrates the critical importance of terrain-aware model customization for high-precision agricultural remote sensing. SE-VUNet provides a robust framework by synergistically combining deep VGG feature extraction, channel-wise SE attention recalibration, and BN-stabilized training. The findings highlight that strategically optimizing attention mechanism deployment based on landscape heterogeneity is essential for overcoming boundary blurring and detail loss. The proposed terrain-adaptive architecture significantly enhances cultivated land mapping accuracy under diverse topographic conditions. Future work will extend the framework using multi-temporal and multi-spectral data to further boost capabilities for dynamic agricultural monitoring and precision farming.

HTML全文

参考文献(34)

施引文献

资源附件(0)