高级检索+

基于CCI-LightGBM-SHAP模型的高原山区农作物识别

Crop identification in plateau mountainous areas based on the CCI-LightGBM-SHAP model

  • 摘要: 农作物高精度识别模型是保障粮食安全、优化种植结构和推进智慧农业发展的基础。然而,高原山区地形破碎、环境复杂,显著的空间异质性使得全局模型在局部区域表现受限,而地理加权回归(geographically weighted regression, GWR)等空间统计方法难以刻画多源遥感特征的高维非线性关系,且存在计算成本偏高与模型解释能力不足等问题,制约了其实际应用。为此,该研究构建了一种基于CCI-LightGBM-SHAP的农作物高精度识别框架。首先,选取基于样本空间混淆特征的分类混淆指数(classification confusion index, CCI),融合地形因子与多源遥感特征,通过递归算法将研究区划分为多个内部光谱与地貌特征同质的空间子域;其次,在各子域内构建轻量梯度提升机(light gradient boosting machine,LightGBM)等集成学习模型作为局部“专家”,通过自主优化局部特征子集与决策规则,实现对局部特征结构的精细建模;最后,结合沙普利加和解释(Shapley additive explanations,SHAP)方法,揭示不同地理分区下农作物高识别的特征响应机制。实验结果表明:1)模型整体分类精度得到提升,平均总体准确率(OA)达 0.928,马修斯相关系数(MCC)达0.915。各分区局部模型均优于全局基准模型,其中LightGBM在稳定性与泛化能力方面表现最优,在关键作物类别上的F1-score提升幅度最高达6.28%,验证了“分区建模+集成学习”的有效性;2)模型具备良好的空间非平稳性适应能力,SHAP分析表明,局部模型能够针对不同地貌环境自主调整特征依赖关系,分类决策呈现由“结构主导”向“生理主导”的动态转变;3)多源遥感特征对不同农作物类型呈现差异化协同驱动机制,其中水稻识别受地形与季节性水文节律共同影响,玉米依赖相对均质地形与充足热量条件,油菜主要受物候差异引起的光谱变化影响,而果园与茶园等多年生作物则与海拔等稳定环境因子具有更强关联。该研究提出的CCI-LightGBM-SHAP方法,有效缓解了空间非平稳性对识别精度的制约,并通过SHAP提升了模型决策过程的可解释性,在精度与可解释性之间实现了良好平衡,可为大尺度、高异质性山区农作物精细化遥感制图提供一种高效、可推广的技术路径。

     

    Abstract: High-precision crop identification models are fundamental for ensuring food security, optimizing planting structures, and advancing the development of smart agriculture. However, in plateau mountainous regions, remote sensing classification faces significant bottlenecks due to highly fragmented terrain, rugged topography, and extreme environmental complexity. These factors result in pronounced geospatial non-stationarity, where the relationship between spectral signatures and crop types varies across space, causing global models to suffer from a "homogenization constraint" that often suppresses localized signals. Furthermore, traditional spatial statistical methods, such as Geographically Weighted Regression (GWR), often struggle to characterize the high-dimensional nonlinear relationships inherent in multi-source remote sensing features and are frequently constrained by prohibitive computational overhead and a lack of model interpretability. To address these issues and systematically alleviate the impact of geospatial non-stationarity, this paper constructs a high-precision crop identification framework based on CCI-LightGBM-SHAP. First, using the Classification Confusion Index (CCI) based on sample spatial confusion characteristics, and integrating topographic factors with multi-source remote sensing features, the study area is recursively delineated into multiple spatial subdomains with internally homogeneous spectral and geomorphological characteristics. Second, ensemble learning models such as the Light Gradient Boosting Machine (LightGBM) are deployed as local "experts" within each subdomain to achieve fine-grained modeling of local feature structures by autonomously optimizing local feature subsets and decision rules, thereby bypassing global homogenization constraints. Finally, the SHapley Additive exPlanations (SHAP) method is employed to reveal the nuanced feature response mechanisms driving high-accuracy crop identification across different geographic partitions and complex environmental gradients The experimental results demonstrate that: (1) The overall classification accuracy was significantly enhanced, achieving an average Overall Accuracy (OA) of 0.928 and a Matthews Correlation Coefficient (MCC) of 0.915. Quantitative assessments indicate that partitioned local models consistently outperform the global baseline model in all subdomains. Among the evaluated ensemble learning algorithms, LightGBM exhibited the highest degree of stability and generalization capability, with F1-scores for key crop categories improving by as much as 6.28%. These metrics validate the effectiveness of the "partitioning + ensemble learning" strategy in mitigating classification errors induced by spatial heterogeneity. (2) The model possesses strong adaptability to spatial non-stationarity. SHAP analysis indicates that local models can autonomously adjust feature dependencies in response to varying geomorphological environments, capturing a dynamic decision-making transition from a "structure-dominated" to a "physiology-dominated" paradigm. (3) Multi-source remote sensing features exhibit differentiated synergistic driving mechanisms for various crop types. Specifically, rice identification is jointly influenced by topography and seasonal hydrological rhythms; corn relies on relatively homogeneous terrain and sufficient thermal conditions; rapeseed is primarily driven by spectral variations induced by phenological differences; whereas perennial crops such as orchards and tea gardens show stronger associations with stable environmental factors like altitude. In conclusion, the results suggest that the CCI-LightGBM-SHAP modeling approach mitigates the impact of spatial non-stationarity on identification accuracy by effectively adapting to localized environmental gradients. Through the integration of predictive precision and interpretability, this framework establishes a robust technical methodology for fine-grained remote sensing crop mapping. These findings provide highly valuable spatial decision-support data for agricultural management and food security assessments in complex, heterogeneous mountainous regions.

     

/

返回文章
返回