融合高分影像与时序NDVI的农作物语义分割模型

赵旭; 李浩; 朱益虎; 王胜利; 何燕兰

doi:10.11975/j.issn.1002-6819.202407098

融合高分影像与时序NDVI的农作物语义分割模型

A crop semantic segmentation model integrating high-resolution imagery and time-series NDVI

摘要

摘要: 通过遥感技术准确及时地掌握农作物分类信息，对农业生产的管理、预估产量以及调整种植结构等方面至关重要。随着光学传感器性能的不断提升，遥感影像的分辨率也在持续提高，农业遥感正逐步进入高精度时代。然而，目前的高分辨率农作物语义分割模型在利用包含农作物物候信息的时序数据方面存在一定的困难，特别是在既有单季作物也有双季作物的复杂种植结构区域。针对此问题，该文提出了一种能够融合高分辨率遥感影像和中分辨率时序NDVI的语义分割模型MCSNet（multi-source crops segmentation network），该模型采用双编码器结构，能够有针对性地同步挖掘高分辨率影像的空间细节与中分辨率时序影像的时空特征，并通过注意力机制引导的数据融合模块对时空信息进行充分融合，提高了农作物分类精度。试验表明，该模型加入了时序NDVI数据后分类精度大幅提高；在对比试验中，该模型分类结果的平均交并比和总体精度分别达到了最高的77.75％和89.56％；在卷积长短期记忆单元和残差双注意力模块的联合作用下，该模型的分类结果在平均交并比和总体精度上分别提升3.84、4.24个百分点。将该模型应用到研究区盱眙县，得出了县域尺度的高分辨率农作物分类结果，制图效果优秀，且各项评价指标的精度均高于基于像素与面向对象的双向长短期记忆网络算法，为基于深度学习语义分割算法的大面积复杂种植结构区域农作物制图提供了可行的方案。

Abstract: High spatial resolution of remote sensing imagery has been increasing with the optical sensor performance. An accurate and rapid classification of the crops is often required for agricultural production, yield prediction, and structure adjustment. However, the traditional high-resolution imagery cannot fully meet the rich phenological information in the crop growth cycle, particularly in the complex planting structure regions with both single- and double-season crops. This limitation can significantly constrain the performance of the high-resolution crop semantic segmentation models. This study aims to propose a multi-source crop semantic segmentation model—MCSNet (multi-source crops segmentation network)—that integrates the high-resolution remote sensing imagery with the medium-resolution time-series normalized difference vegetation index (NDVI) data. A dual-encoder structure was composed of a high-resolution encoder (HR-Decoder) and a time-series encoder (TS-Decoder). The HR-Decoder was targeted at the high-resolution imagery in order to extract the spatial detail features, such as the crop plot boundaries and texture differences. Meanwhile, the TS-Decoder was focused on the medium-resolution time-series NDVI data. The vegetation indices were utilized to sensitively capture the spectral variations of crops throughout their growth cycles, thereby fully exploiting the phenological features to distinguish between single- and double-season crops. Furthermore, the network incorporated the convolutional long short-term memory (ConvLSTM) units within the TS-Decoder. The modeling capacity was enhanced for the complex temporal information. The local spatial features were extracted to effectively capture the dynamic changes along the temporal dimension. Subsequently, a multi-feature fusion encoder (MF-Encoder) was integrated to fuse the multi-source features from the HR-Decoder and TS-Decoder. The residual double attention was also utilized to emphasize the importance of the critical feature channels and spatial positions. Thereby, the temporal features and high-resolution spatial details were fused to ultimately strengthen the precision and robustness of the crop classification. The time-series NDVI data were also integrated with the MCSNet in the experimental phase. The accuracy of the crop classification was significantly improved, compared with the traditional only on the high-resolution imagery. The comparative experiments showed that the MCSNet shared the outstanding performance, with the mean intersection over union (mIoU) of 77.75% and an overall accuracy (OA) of 89.56%, indicating the highest levels. Furthermore, the ConvLSTM and the residual double attention in the network enhanced the modeling capability of the spatiotemporal features, thus increasing mIoU and OA by 3.84% and 4.24%, respectively. The MCSNet model was applied to the large-scale and complex study area of Xuyi County, Huaian City, Jiangsu Province, China. According to the pixel- and object-oriented classification, like Bi-LSTM (bidirectional long short-term memory), MCSNet exhibited significant advantages in both mapping and classification accuracy. Specifically, the MCSNet achieved an OA of 89%, a Kappa coefficient of 0.85, a mean weighted F1 score (mF1) of 0.89, and an mIoU of 0.78, thus outperforming the comparative data across all metrics. Therefore, there were the effectiveness and practicality of the MCSNet for the classification tasks in the large-scale and complex planting structure regions. In summary, the MCSNet can offer a viable technical pathway for multi-source data processing by integrating high-resolution imagery and time-series NDVI data. The dual-encoder structure (ConvLSTM) and residual double attention module (MCSNet) were introduced to effectively enhance the crop classification accuracy and stability in the complex planting structure regions. This finding can also provide a strong theoretical and technical solution to the multi-source remote sensing data fusion for crop production and structure optimization in sustainable agriculture.

HTML全文

参考文献(36)

施引文献

资源附件(0)