局部聚焦算法辅助标记的高分遥感影像农田地块智能识别

高圣杰; 朱长明; 李均力; 沈谦; 施坤涛; 李一凡; 张新

doi:10.11975/j.issn.1002-6819.202409069

局部聚焦算法辅助标记的高分遥感影像农田地块智能识别

Intelligent recognition of farmland geo-parcels in high resolution remote sensing imagery assisted by local focusing algorithm

摘要

摘要: 高分辨率遥感影像地块图斑智能识别是对象级遥感信息提取的难点。文章在总结现有分割算法基础之上，构建了由上而下的先识别-再分割的技术框架，提出了局部聚焦算法辅助标记的高分辨率影像地块智能识别方法（Local Focus Aided Segmentation, LFAS）。首先采用YOLO主干网络对高分辨遥感影像地块进行特征提取与融合，获取特征图的语义信息和位置信息，进而预测出包含地块目标的多个预测框；然后根据非极大值抑制去除冗余预测框，获取目标终极预测框的中心点坐标位置；在此基础上将中心点坐标作为辅助标记，并以位置编码形式映射至多维特征空间；最后联合原始影像嵌入SAM解码器生成最终的地块分割结果，完成待检测遥感影像中目标地块的提取。结果表明： LFAS提取识别准确率(PA)高达91.15%，交并比(IoU)为89.62%，完整度C_p达到87.42%，边界线提取质量为80.39%，地块边界线提取结果与地块实际空间形态贴合。LFAS农田地块智能识别精度(PA)较SegFormer提升了2.73个百分点，交并比(IoU)提升了3.01个百分点，边界线提取质量提升了3.55个百分点。以上结果表明，LFAS增强了地块分割模型的区域适应性，精度及边缘质量提升显著，方法通用性较强，全流程基本无需人工干预，为高空间分辨率遥感影像地块精确分割提供了一种可行的方法。

Abstract: A geo-parcel is one of the most basic geographical units to perform the physical parameter inversion under space-time attributes. Automatic recognition of the geo-parcels can be a crucial step during object-based imaging and high-resolution remote sensing. The intelligent interpretation of these images can be used to form the spatial database, in order to extract the thematic information. A powerful tool has emerged for target recognition and complex geographic scenarios. The promising potential can also be found in precision agriculture, resource investigation, disaster assessment, and urban planning. Nevertheless, technical challenges still remained in the intelligent recognition of geo-parcel from the high-resolution remote sensing images, particularly on semantic segmentation. In this article, a top-down "Target-Primitive-Object" framework was presented to recognize the farmland geo-parcels in the high-resolution remote sensing imagery using a local focusing algorithm. An intelligent recognition of the high-resolution image parcel was introduced prior to segmentation, referred to as LFAS (Local Focus Aided Segmentation). A local focus algorithm was utilized to assist with the labeling. Specifically, the YOLO backbone network with the CNN architecture was employed to extract and then fuse the features from the high-resolution remote-sensing image blocks. Thereby both semantic and positional information was then acquired from the feature maps. The prediction was realized on the multiple bounding boxes containing block targets. Subsequently, the non-maximum suppression was applied to eliminate the redundant boxes of prediction. The center point coordinates of the ultimate prediction box were obtained to serve as the auxiliary markers. The SAM feature space was then used to map in the form of position encoding. Finally, the original image was integrated into the mask decoder to produce the final block segmentation. Thereby, the target block was extracted from the remote sensing images. The experimental results demonstrate that the LFAS effectively learned the global semantic features from the remote sensing images against the complex backgrounds. The long-term dependencies of global features were effectively captured to manage the fine edges and local details. The recognition accuracy of LFAS extraction reached an impressive 91.15% (PA), with completeness (Cp) at 87.42%, IoU (Intersection Over Union) at 89.62%, and the quality of boundary line extraction (Q) at 80.39%. There were distinctly clear edge features of the farmland geo-parcels after extraction. The objects in each plot were well-defined and independent. Furthermore, the boundary line after extraction was aligned closely with the actual spatial forms of the parcel. A comparative experiment was also performed on the various algorithms of semantic segmentation, including the U-Net, DeepLabV3, Swin Transformer, and SegFormer. It was found that the LFAS exhibited superior performance of land parcel recognition, compared with the rest. Specifically, the intelligent recognition accuracy (PA) of LFAS for the farmland geo-parcel was improved by 2.73 percentage points, compared with the SegFormer. The IoU increased by 3.01 percentage points, and the quality of boundary line extraction (Q) was enhanced by 3.55 percentage points. Notably, a significant improvement was observed in the recognition accuracy and edge quality of farmland plots. Therefore, the LFAS system was employed to first identify and then segment the farmland geo-parcels. Two learning architectures (CNN and Transformer) were integrated to achieve the feature-level fusion of spatial position and spectral attributes. The regional adaptability of the segmentation was improved in the quality of farmland parcel extraction using high-resolution remote sensing. In summary, the LFAS shared the strong universality for stable and reliable recognition. The minimal manual intervention was required over the entire process. A viable approach was offered for the accurate segmentation of high spatial resolution remote sensing images, indicating the broad prospects of application. The more complex agricultural scenarios and diverse plot structures can be expected to fully validate the algorithm's adaptability and robustness in the future.

HTML全文

参考文献(47)

施引文献

资源附件(0)