高级检索+

基于LightGBM的冬小麦产量估测与可解释性研究

Interpretability on Yield Estimation of Winter Wheat Based on LightGBM

  • 摘要: 机器学习模型在作物长势监测和产量估测过程中,复杂模型的内部机制难以理解,为了在准确估测作物产量的同时给出合理解释,本文选取条件植被温度指数(VTCI)以及冬小麦产量数据,基于轻量级梯度提升机(LightGBM)开展关中平原冬小麦的产量估测研究,并将局部可解释性模型无关方法(LIME)、部分依赖图(PDP)和个体条件期望图(ICE)等全局和局部可解释性方法用于对模型估测结果的进一步解释。结果表明,与其他机器学习方法相比,经过网格搜索优化的LightGBM能够准确地估测冬小麦产量,估测单产与实测单产的决定系数R2达到0.32,均方根误差(RMSE)为809.10 kg/hm2,平均相对误差(MRE)为16.55%,达到极显著水平(P<0.01),表明该模型有较高的预测精度和泛化能力。进一步可解释性实验表明,网格搜索优化的LightGBM能够准确提取数据蕴含的信息,从全局角度来看,冬小麦4个生育期中拔节期VTCI对产量形成最为重要,抽穗-灌浆期和乳熟期次之,返青期则影响最小,这与先验知识相符合;从局部角度来看,局部可解释性方法基于冬小麦产量西高东低的空间特征能够进一步提供不同县(区)产量存在差异的原因,为关中平原的田间管理提供参考,对冬小麦的稳产增产具有应用价值。

     

    Abstract: Machine learning models have been applied for monitoring crop growth condition and estimating crop yield, it is difficult to understand the internal mechanisms of complex models. In order to estimate crop yields accurately and make understandable explanations at the same time, LightGBM was used to develop yield estimation models of winter wheat in the Guanzhong Plain, PR China by using vegetation temperature condition index(VTCI), and interpretable methods such as local interpretable model-agnostic explanation(LIME), submodular pick-LIME, partial dependence plot(PDP), and individual conditional expectation(ICE) at global and local scales were used for further interpretations of the yield estimation models. Compared with other models, the results of LightGBM optimized by grid search showed that the R~2 between the estimated and official yield records of winter wheat was 0.32, the RMSE was 809.10 kg/hm~2, and the MRE was 16.55%, which reached the extremely significant level(P<0.01), indicating that the model had high prediction precision and strong generalization ability. The interpretability of the experiments showed that the model can extract the knowledge in the data. In global interpretation, VTCI at the jointing stage for yield formation was the most important, followed by VTCI at the heading to filling stage and VTCI at the dough stage, and VTCI at the turning green stage had the least effect, which were consistent with prior knowledge. In local interpretation, based on the spatial characteristics of winter wheat yield that was high in the west and low in the east, the local interpretable methods further provided the reasons for the differences in the yield formation of different counties(districts), which provided references for field management in the Guanzhong Plain, PR China. These methods had application value for increasing and stabilizing the yield of winter wheat.

     

/

返回文章
返回