Abstract:
Accurate mapping of greenhouse facilities can contribute to the non-grain utilization of cropland and fine-scale regulation in the rapidly transforming farming in China. In this study, an integrated deep learning framework was developed to extract the greenhouse from very high-resolution remote sensing imagery. A systematic comparison was also made on three improved semantic segmentation models with different architectural strategies. A typical plateau–mountain region was taken from the Anning City in Yunnan Province of China. The greenhouse area increased by 23.7% per year from 2020 to 2022. The study area was selected as the small, scattered and embedded greenhouses in complex terrain. A multi-feature dataset was constructed with 14,000 image tiles using 0.3 m pansharpened WorldView-3 imagery. A four-level experiment was designed to explicitly classify the greenhouse types, common backgrounds, transient disturbances (e.g., shadows, and clouds), and non-ideal samples (e.g., partially occluded or damaged structures). Robust feature learning was realized in an encoder–decoder mode. Three improved models were implement: (1) Enhanced-SegUNet, where the residual blocks, CBAM attention and a hybrid Dice–Boundary loss were combined to represent deep segmentation and boundary precision under low contrast or shadow; (2) Dense-SegUNet++, where the UNet++ was extended with densely connected skip paths, multi-branch deep supervision and dynamic convolution to enhance multi-level feature fusion and adaptively complex textures; and (3) MS-DeepLabV3+, where DeepLabV3+ was augmented with an extended multi-rate ASPP module. Spectral enhancement branch was used to exploit the red-edge and NIR bands, as wel as lightweight depthwise separable convolutions. All models were trained under a unified strategy with transfer learning, two-stage fine-tuning, label smoothing, spatial dropout and L2 regularization. A systematic evaluation was conducted using pixel-level (IoU, F1, and mPA) and object-level (detection rate, false alarm rate, and segmentation quality index) metrics, as well as inference efficiency (FPS, GMACs, and peak memory). Experiments were performed on an independent test set of 2,800 tiles. The results showed that all three improved models outperformed the baseline U-Net. Dense-SegUNet++ was also achieved in the best overall performance, with an IoU of 0.924, F1-score of 0.931 and segmentation quality index of +11.6% over U-Net. Fine spatial structures were then preserved to delineate the greenhouse boundaries in the densely clustered and background-rich environments. MS-DeepLabV3+ model exceled in areas with strong spectral contrast and diverse materials, due to its strengthened multi-scale and spectral modeling. Whereas Enhanced-SegUNet shared the superior robustness in the shadowed and low-contrast scenes, as well as the dark-grey greenhouse roofs. Efficiency analysis on an NVIDIA RTX 4070 platform indicated that the framework was operationally viable: Dense-SegUNet++ reacheed41 FPS with acceptable computational cost (21.6 GMACs), while MS-DeepLabV3+ provided a favorable balance between accuracy and speed. When embedded into a processing chain, the county-level monitoring efficiency was improved by a factor of 18.3, compared with conventional manual interpretation. Standardized vector products were directly integrated into non-grain cropland monitoring workflows for the area statistics, detection and regulatory zoning. There were the complementary strengths of different deep architectures under various scenarios. The finding also highlighted the practical potential of the framework to high-frequency, fine-scale supervision of cropland non-grain utilization in complex plateau–mountain agricultural regions. Limitations also remained to distinguish the greenhouses from spectrally similar bright buildings. It is also required to recognize the severely damaged or newly emerging structures with few training examples. Multi-temporal and multi-source (e.g., SAR) data and lightweight architecture can also be incorporated into the edge deployment for the high robustness and scalability.