融合全局多层次特征的跨尺度河流精准识别方法

闫烁月; 王庆; 钟康; 张昌民; 叶茂林; 付安琪; 刘远刚

融合全局多层次特征的跨尺度河流精准识别方法

Cross-scale River Precise Identification Method With Fusion of Global Multilevel Features

摘要

摘要: 高分辨率遥感影像中河流自动化精准识别，在河湖环境监测和流域变化研究等方面具有重要意义和研究价值。然而，因河流在影像中面积占比较小，易造成数据集正负样本不平衡。此外，河流具有形态多变和尺度变换复杂等特点，导致河流识别易出现边界不连续和格网效应等问题。基于此，提出一种融合全局多层次特征的跨尺度河流精准识别方法。首先，选取全球具有明显特征的曲流河和辫状河，创建多特征河流数据集，以此增加数据多样性。其次，以轻量级语义分割模型Segformer为主干网络搭建R-Seg模型，设计全局多层次特征提取GASPP模块，通过各阶段与Transformer级联提取多尺度特征，使得模型能更好捕捉河流影像上下文特征信息，减少信息损失并放大全局维度交互特征。最后，提出基于掩膜加权投票的跨尺度河流影像预测方法，通过对大场景河流影像进行滑窗裁剪，将各单元预测块与特定掩膜加权相乘得到子预测结果，并按照重叠投票方式依次拼接组成最终结果，实现不同尺度河流影像精准识别。实验证明，在所构建包含曲流河和辫状河的多特征数据集中，通过与其他方法对比可发现：在定性方面，R-Seg整体网络结构既能确保主干河流的识别精度，又能缓解细小河流断流现象，有效平滑河流边界，对500×500小尺度河流影像识别具有较好的鲁棒性；此外，采用掩膜加权投票方法，能有效减少格网效应造成的单元图块边缘缺失问题，充分利用单元图块预测结果，提升对更大场景遥感影像的适应能力和河流预测精度，实现不同尺度河流影像精准识别。从定量角度，方法各类精度评价指标相对最优，总体精度可达99.49%；其次，对单张影像识别时间不到1 s，效率可满足大多数实际要求。此外，相比于纯粹重叠预测策略，掩膜加权投票预测策略的河流识别总体精度高约0.28%～6.93%；通过调整重叠度参数可发现，重叠度与精度并非正相关，大约在12.5%精度能达到相对最优。方法通过设计R-Seg网络模型和提出掩膜加权投票预测方法，能一定程度上减少河流边界识别不连续和格网效应等问题，有效提升不同场景下遥感影像河流识别精度，具有较好的鲁棒性和目视效果，识别结果对河流地质勘探及流域变化等有重要应用价值。

Abstract: Automated precise identification of rivers in high-resolution remote sensing images holds significant importance and research value in river and lake environmental monitoring, as well as watershed change studies. However, due to the relatively small area occupied by rivers in the images, it can lead to an imbalance between positive and negative samples in the dataset. Additionally, the morphological variability and complex scale transformations inherent in rivers contribute to challenges in river identification, resulting in issues such as discontinuous boundaries and grid effects. In response to these challenges, this paper proposes a cross-scale river precise identification method with fusion of global multilevel features. The method can be divided into three main parts. Firstly, we construct a multi-feature river dataset by selecting globally distinctive meandering and braided rivers to enhance data diversity. Secondly, we construct the R-Seg model, utilizing the lightweight semantic segmentation model Segformer as the backbone network. We design the Global and Adaptive Scale Pyramid Pooling（GASPP） module for extracting multi-scale features. This module, coupled with Transformers, facilitates the extraction of multi-scale features, enabling the model to capture contextual information in river images, reduce information loss, and amplify global dimension interaction features. Lastly, we propose a cross-scale river image prediction method based on mask-weighted voting. By employing sliding window cropping on large-scale river images, we obtain sub-prediction results by multiplying each unit prediction block with a specific mask weight.These results are then sequentially concatenated through overlapping voting, achieving precise identification of river images at different scales. The experiments demonstrate that, in the constructed multi-feature dataset encompassing meandering and braided rivers, a comparative analysis with other methods reveals the following: qualitatively, the overall structure of the R-Seg network ensures high identification accuracy for main rivers and effectively mitigates interruptions in smaller river flows, smoothing river boundaries with good robustness for 500×500 small-scale river image identification. Moreover, the use of mask-weighted voting method significantly reduces the edge loss problem caused by grid effects in unit blocks, making full use of unit block prediction results, improving river prediction accuracy for larger scenes, and achieving accurate identification of river images of different scales. From a quantitative perspective, the method achieves an overall accuracy of 99.49% with optimal performance across various accuracy evaluation metrics. Also, the single-image identification time is less than 1second, meeting the efficiency requirements of most practical applications. Furthermore, the mask-weighted voting strategy exhibits an overall higher river identification accuracy of approximately 0.28% to 6.93% compared to a pure overlap prediction strategy. By adjusting the overlap parameter, it is observed that accuracy and overlap are not positively correlated; an accuracy of approximately 12.5% achieves relative optimization. This approach, through the design of the R-Seg network model and the introduction of the mask-weighted voting prediction method, effectively alleviates issues such as discontinuity in river boundary recognition and grid effects. It significantly enhances the accuracy of river identification in remote sensing images across diverse scenarios, demonstrating strong robustness and visual performance. The identification outcomes hold crucial application value in geological exploration of rivers and studies on watershed changes.

HTML全文

参考文献(29)

施引文献

资源附件(0)