基于多模态深度特征模型的冬小麦冠层等效水厚度反演方法

程智楷; 周智辉; 谷晓博; 方晓海; 韦春宇; 徐洋; 赵彤彤; 杜娅丹

doi:10.11975/j.issn.1002-6819.202411159

基于多模态深度特征模型的冬小麦冠层等效水厚度反演方法

Inversion of winter wheat canopy equivalent water thickness based on multi-modal depth feature model

摘要

摘要: 为提高无人机遥感监测冬小麦冠层等效水厚度（equivalent water thickness，EWT）的精度，该研究采集冬小麦返青、拔节和抽穗期无人机遥感数据（可见光、多光谱、三维点云数据）和实测EWT样本，提取光谱特征、纹理特征和结构特征，应用递归特征消除（recursive elimination feature，REF）、BORUTA和套索算法（least absolute shrinkage and selection operator，LASSO）算法分析多模态特征的重要性贡献，结合卷积神经网络和双向长短期记忆网络构建了能够提取多模态深度特征的复合神经网络（multi-modal depth feature neural network，MDFNN），并与4种机器学习算法（K最近邻、偏最小二乘、随机森林和支持向量机）对比分析冠层EWT估测性能。结果表明，融合多模态特征提高了EWT估测精度，4种机器学习模型在验证集的决定系数（R²）为0.709～0.810，均方根误差为0.054～0.063 mm，平均绝对百分比误差为9.00%～21.5%；REF对模型的优化能力高于BORUTA和LASSO；基于REF特征组合的MDFNN估测精度高于机器学习模型，实现了最优的冠层EWT估测（R²为0.882，均方根误差为0.050 mm，平均绝对百分比误差为6.60%）。研究结果可为无人机遥感监测田间冬小麦冠层EWT提供参考。

Abstract: Accurate monitoring of canopy equivalent water thickness (EWT) in winter wheat can greatly contribute to precision water management and irrigation strategies in sustainable agriculture. However, traditional machine learning is limited to the complex and non-linear relationships between feature interactions. This study aimed to enhance the estimation of the wheat canopy EWT by multi-modal unmanned aerial vehicle (UAV) remote sensing features and advanced deep learning techniques. High accuracy was also obtained to monitor the canopy EWT of winter wheat. An optimal estimation model was then established to facilitate the high-precision and field-scale monitoring of the crop water status throughout the growth stages. Firstly, the field experiments under different water and nitrogen treatments were conducted during the turning green, jointing, and heading stages of winter wheat in 2022−2023. The experimental treatments included three irrigation levels (rainfed, 30 mm in overwintering and 30 mm in the jointing stage, and 60 mm at overwintering and 60 mm at the jointing stage), four amounts of nitrogen fertilization (0, 100, 200, and 300 kg/hm²). The high-resolution images of the wheat canopies were collected at turning green, jointing, and heading stages using visible (red, green, and blue), and multispectral (red-edge and near-infrared) sensors carried by the UAV platform. The canopy EWTs were simultaneously made for the destructive sampling by manual in-field experiments. Secondly, the spectral features (visible and multispectral indices), textural features (normalized difference texture index, ratio texture index, and difference texture index), and structural features (plant height and canopy cover) were extracted by band calculation, gray-level co-occurrence matrix, and three-dimensional point cloud data processing, respectively. The feature importance scores were analyzed by recursive elimination feature (REF), BORUTA, and least absolute shrinkage and selection operator (LASSO) algorithms, respectively. Finally, a composite neural network (MDFNN) was constructed to extract the multi-modal deep features by combining a convolutional neural network (CNN) and a long short-term memory network (LSTM). The CNN module contained three independent convolutional architectures for depth feature extraction at three growth stages. The deep features of each growth stage were concatenated by a fully connected layer as the input of the LSTM. The final output was set as the EWT. The four algorithms of machine learning were adopted to compare the model accuracy, including k-nearest neighbors, partial least squares, random forest, and support vector machine. The results showed that the multi-modal feature fusion improved the estimation accuracy of the EWT, compared with any single modal feature, with the determination coefficients (R2) of 0.709−0.810, root mean square errors (RMSE) of 0.054−0.063 mm, and mean absolute percentage errors (MAPE) of 9.00%−21.5%. The most retained features by all three-feature selections were the texture features, followed by spectral and structural features. The REF shared the best optimization (8 spectral features, 10 texture features, and 2 structural features.), while the BORUTA and LASSO failed to optimize the model performance and even reduced model accuracy. Compared with the accuracy of machine learning models with the single modal features, the texture features showed excellent performance, with R2 of 0.629−0.706, RMSE of 0.063−0.078 mm, and MAPE of 20.20%−29.10%. Compared with four machine learning models, the MDFNN significantly improved the estimation accuracy of the wheat canopy EWT and reduced the prediction error using all feature selection datasets. The estimation accuracy of MDFNN with the REF features was achieved with the highest estimation accuracy, with the R2 of 0.88, RMSE of 0.050 mm, and MAPE of 6.60%. The inverse maps were obtained from the optimal MDFNN model and accurately reflected the changes of canopy EWT with the temporal and spatial variations. The UAV-based monitoring framework was combined with the multi-modal data and deep learning for the high-resolution EWT estimation of the wheat canopy. The MDFNN model outperformed the conventional machine learning. The complex feature interactions were effectively obtained across spatial and temporal dimensions. The finding can provide the theoretical references for the UAV remote sensing to monitor the winter wheat canopy EWT, particularly with the potential applications in irrigation scheduling and stress monitoring.

HTML全文

参考文献(31)

施引文献

资源附件(0)