Abstract:
Accurately extracting cultivated land from high-resolution remote sensing (HRRS) imagery is critical for food security, agricultural planning, and ecological management. However, existing deep learning methods struggle with boundary ambiguity, detail loss, and adaptability across diverse terrains, particularly in areas with fragmented parcels, spectral heterogeneity (e.g., varying crop types, soil moisture, phenology), and complex mixtures with spectrally similar non-cropland covers. This study aims to overcome these limitations by developing a terrain-adaptive segmentation model for robust cultivated land extraction. We propose SE-VUNet, an enhanced U-Net architecture integrating three key innovations: 1) VGG-Enhanced Encoder: Replacing the standard encoder with a VGG-based deep feature extractor to capture richer multi-scale contextual information, improving representation of local textures (e.g., field ridges, ditches) and global patterns (e.g., plain vs. terrace distributions). 2) Terrain-Adaptive Squeeze-and-Excitation (SE) Attention: Strategically embedding SE modules to dynamically recalibrate channel-wise feature importance, enhancing vegetation-relevant channels while suppressing noise. Five distinct variants (SE-VUNet(1) to (5)) were created by embedding SE modules at: Shallow Feature Layer (1), Pre-Downsampling (2), Skip-Connection (3), Decoder Fusion (4), and Feature Learning Module (5). 3) Batch Normalization (BN) Optimization: Integrating BN layers after each convolutional block to mitigate internal covariate shift, accelerate convergence, reduce overfitting (crucial given limited labeled data), and enhance generalization. Comprehensive experiments utilized the Gaofen Image Dataset (GID) derived from Gaofen-2 (GF-2) satellite RGB imagery, evaluating performance on two key terrain types: (i) Flat Homogeneous Land (large, contiguous fields, uniform spectra, low interference) and (ii) Complex Heterogeneous Land (small, irregular fields, blurred boundaries, high spectral variability, significant non-cropland mixing). SE-VUNet variants were benchmarked against PSPNet, HrNet, Deeplabv3+, and baseline U-Net. All five SE-VUNet variants outperformed baselines across both terrains, validating VGG feature extraction and SE attention integration. Crucially, optimal SE placement was terrain-dependent: Flat Homogeneous Terrain: SE-VUNet(2) (SE Pre-Downsampling) achieved superior performance with a Mean Intersection over Union (MIoU) of 96.66% and an F1-score of 97.57%. This configuration excels by amplifying high-resolution shallow features early, preserving critical fine linear details like field boundaries and irrigation canals. It outperformed the best baseline (typically Deeplabv3+) by over 4.85% in accuracy and 6.36% in F1-score. Complex Heterogeneous Terrain: SE-VUNet(5) (SE in Feature Learning Module) delivered optimal results, achieving an MIoU of 94.40% and an F1-score of 97.11%. This placement enhances adaptive multi-scale feature fusion and deep feature refinement, significantly improving discrimination of spectrally ambiguous classes (e.g., crops vs. grasslands) and resolution of intricate fragmented boundaries. Gains over the strongest baseline (typically HrNet or Deeplabv3+) were substantial, exceeding 6% in accuracy and 11% in MIoU. Quantitative analysis confirmed SE-VUNet's significant reduction in boundary localization errors and improvement in capturing small-field details compared to all baselines. The explicit terrain-based module optimization strategy proved highly effective. This study demonstrates the critical importance of terrain-aware model customization for high-precision agricultural remote sensing. SE-VUNet provides a robust framework by synergistically combining deep VGG feature extraction, channel-wise SE attention recalibration, and BN-stabilized training. The findings highlight that strategically optimizing attention mechanism deployment based on landscape heterogeneity is essential for overcoming boundary blurring and detail loss. The proposed terrain-adaptive architecture significantly enhances cultivated land mapping accuracy under diverse topographic conditions. Future work will extend the framework using multi-temporal and multi-spectral data to further boost capabilities for dynamic agricultural monitoring and precision farming.