基于对数比转换的地统计插值与随机森林耦合的土壤颗粒组分空间建模

刘水清; 刘宇轩; 尚松浩

doi:10.11975/j.issn.1002-6819.202507239

基于对数比转换的地统计插值与随机森林耦合的土壤颗粒组分空间建模

Spatial modeling of soil particle size distribution based on log-ratio transformation and random forest optimization

摘要

摘要: 为系统评估不同对数比转换方法在闭合型土壤颗粒组分数据空间建模中的适用性，并进一步探索空间插值与机器学习优化相结合的壤颗粒组分及土质高精度制图路径，该研究以叶尔羌河平原绿洲为典型区，比较了加性对数比转换（additive log-ratio, ALR）、中心化对数比转换（centered log-ratio, CLR）和等距对数比转换（isometric log-ratio, ILR）3种对数比转换方法应用于土壤颗粒组分空间建模的效果，并构建了基于对数比转换的以反距离权重插值（inverse distance weighting，IDW）结果作为协变量的协同克里金插值（co-Kriging，CK）与随机森林（random forest，RF）耦合的联合建模流程。通过采集表层土壤样品并在对数比空间中进行插值与拟合，结合交叉验证与留出法验证评估模型性能。结果表明，ILR转换在保持数据分布特征和提升预测精度方面最优；引入IDW作为协变量可降低插值误差；RF优化进一步提高了模型的泛化能力和空间一致性。该联合建模流程在土壤颗粒组分数据的空间制图中表现出较高精度与稳定性，为土壤质地高精度制图提供了一种可推广的技术路径。

Abstract: This study compared three log-ratio transformations—additive log-ratio (ALR), centered log-ratio (CLR), and isometric log-ratio (ILR)—and evaluated a combined interpolation and machine-learning workflow for mapping soil particle-size fractions in the Yarkant River Plain oasis. The aim was to identify the most suitable log-ratio transformation method for closed compositional data and to quantify the performance gains from using inverse distance weighting (IDW) as an auxiliary variable in co-Kriging (CK) followed by Random Forest (RF) optimization. Surface soil samples were collected and laboratory measurements were used to obtain the three particle-size fractions. Compositional data in the training set were transformed using ALR, CLR and ILR; all interpolation and variogram fitting were conducted in the transformed Euclidean space and results were back-transformed and normalized for evaluation. Preliminary interpolation was performed using IDW, ordinary Kriging (OK), and CK with IDW outputs as auxiliary covariates. Cross-validation and hold-out validation were used to assess performance by root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient. Finally, IDW+CK predictions were used as inputs to RF, trained against measured values for nonlinear correction and final soil mapping. Across validation schemes, ILR produced the most faithful reconstruction of distributional shape (skewness, kurtosis, mean) and achieved the best interpolation accuracy. In cross-validation, ILR-based interpolation yielded the lowest mean RMSE of 0.027 and the highest average correlation coefficient of 0.915. Introducing IDW as an auxiliary covariate in CK substantially reduced interpolation error under interpolation conditions: mean RMSE across the three fractions decreased and correlations increased markedly. However, in hold-out validation, the IDW contribution was variable and spatially dependent—marked improvement was observed for clay fraction, while silt fraction performance deteriorated in some hold-out samples—indicating context-sensitive benefits. Subsequent RF optimization using IDW-assisted CK outputs further improved the generalization capability: whole-sample RMSE/MAE decreased, and overall correlation increased. Spatial maps after RF optimization preserved local extrema and revealed more realistic texture patterns, with notable adjustments in class proportions compared to uncorrected predictions. The ILR transformation combined with IDW-assisted CK and RF optimization provided superior accuracy and more realistic spatial patterns for closed compositional soil data in the study area. Gains were larger where sample density was higher and where auxiliary and primary variables were strongly correlated. Conversely, benefits were limited or negative in regions with sparse sampling or weakly correlated settings. The proposed workflow therefore offered a practical, empirically supported approach for soil particle-size fraction mapping, while its transferability should be tested under different sampling schemes and with alternative auxiliary data sources.

HTML全文

参考文献(45)

施引文献

资源附件(0)