Abstract:
High spatial resolution of remote sensing imagery has been increasing with the optical sensor performance. An accurate and rapid classification of the crops is often required for agricultural production, yield prediction, and structure adjustment. However, the traditional high-resolution imagery cannot fully meet the rich phenological information in the crop growth cycle, particularly in the complex planting structure regions with both single- and double-season crops. This limitation can significantly constrain the performance of the high-resolution crop semantic segmentation models. This study aims to propose a multi-source crop semantic segmentation model—MCSNet (multi-source crops segmentation network)—that integrates the high-resolution remote sensing imagery with the medium-resolution time-series normalized difference vegetation index (NDVI) data. A dual-encoder structure was composed of a high-resolution encoder (HR-Decoder) and a time-series encoder (TS-Decoder). The HR-Decoder was targeted at the high-resolution imagery in order to extract the spatial detail features, such as the crop plot boundaries and texture differences. Meanwhile, the TS-Decoder was focused on the medium-resolution time-series NDVI data. The vegetation indices were utilized to sensitively capture the spectral variations of crops throughout their growth cycles, thereby fully exploiting the phenological features to distinguish between single- and double-season crops. Furthermore, the network incorporated the convolutional long short-term memory (ConvLSTM) units within the TS-Decoder. The modeling capacity was enhanced for the complex temporal information. The local spatial features were extracted to effectively capture the dynamic changes along the temporal dimension. Subsequently, a multi-feature fusion encoder (MF-Encoder) was integrated to fuse the multi-source features from the HR-Decoder and TS-Decoder. The residual double attention was also utilized to emphasize the importance of the critical feature channels and spatial positions. Thereby, the temporal features and high-resolution spatial details were fused to ultimately strengthen the precision and robustness of the crop classification. The time-series NDVI data were also integrated with the MCSNet in the experimental phase. The accuracy of the crop classification was significantly improved, compared with the traditional only on the high-resolution imagery. The comparative experiments showed that the MCSNet shared the outstanding performance, with the mean intersection over union (mIoU) of 77.75% and an overall accuracy (OA) of 89.56%, indicating the highest levels. Furthermore, the ConvLSTM and the residual double attention in the network enhanced the modeling capability of the spatiotemporal features, thus increasing mIoU and OA by 3.84% and 4.24%, respectively. The MCSNet model was applied to the large-scale and complex study area of Xuyi County, Huaian City, Jiangsu Province, China. According to the pixel- and object-oriented classification, like Bi-LSTM (bidirectional long short-term memory), MCSNet exhibited significant advantages in both mapping and classification accuracy. Specifically, the MCSNet achieved an OA of 89%, a Kappa coefficient of 0.85, a mean weighted F1 score (mF1) of 0.89, and an mIoU of 0.78, thus outperforming the comparative data across all metrics. Therefore, there were the effectiveness and practicality of the MCSNet for the classification tasks in the large-scale and complex planting structure regions. In summary, the MCSNet can offer a viable technical pathway for multi-source data processing by integrating high-resolution imagery and time-series NDVI data. The dual-encoder structure (ConvLSTM) and residual double attention module (MCSNet) were introduced to effectively enhance the crop classification accuracy and stability in the complex planting structure regions. This finding can also provide a strong theoretical and technical solution to the multi-source remote sensing data fusion for crop production and structure optimization in sustainable agriculture.