Abstract:
This study compared three log-ratio transformations—additive log-ratio (ALR), centered log-ratio (CLR), and isometric log-ratio (ILR)—and evaluated a combined interpolation and machine-learning workflow for mapping soil particle-size fractions in the Yarkant River Plain oasis. The aim was to identify the most suitable log-ratio transformation method for closed compositional data and to quantify the performance gains from using inverse distance weighting (IDW) as an auxiliary variable in co-Kriging (CK) followed by Random Forest (RF) optimization. Surface soil samples were collected and laboratory measurements were used to obtain the three particle-size fractions. Compositional data in the training set were transformed using ALR, CLR and ILR; all interpolation and variogram fitting were conducted in the transformed Euclidean space and results were back-transformed and normalized for evaluation. Preliminary interpolation was performed using IDW, ordinary Kriging (OK), and CK with IDW outputs as auxiliary covariates. Cross-validation and hold-out validation were used to assess performance by root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient. Finally, IDW+CK predictions were used as inputs to RF, trained against measured values for nonlinear correction and final soil mapping. Across validation schemes, ILR produced the most faithful reconstruction of distributional shape (skewness, kurtosis, mean) and achieved the best interpolation accuracy. In cross-validation, ILR-based interpolation yielded the lowest mean RMSE of 0.027 and the highest average correlation coefficient of 0.915. Introducing IDW as an auxiliary covariate in CK substantially reduced interpolation error under interpolation conditions: mean RMSE across the three fractions decreased and correlations increased markedly. However, in hold-out validation, the IDW contribution was variable and spatially dependent—marked improvement was observed for clay fraction, while silt fraction performance deteriorated in some hold-out samples—indicating context-sensitive benefits. Subsequent RF optimization using IDW-assisted CK outputs further improved the generalization capability: whole-sample RMSE/MAE decreased, and overall correlation increased. Spatial maps after RF optimization preserved local extrema and revealed more realistic texture patterns, with notable adjustments in class proportions compared to uncorrected predictions. The ILR transformation combined with IDW-assisted CK and RF optimization provided superior accuracy and more realistic spatial patterns for closed compositional soil data in the study area. Gains were larger where sample density was higher and where auxiliary and primary variables were strongly correlated. Conversely, benefits were limited or negative in regions with sparse sampling or weakly correlated settings. The proposed workflow therefore offered a practical, empirically supported approach for soil particle-size fraction mapping, while its transferability should be tested under different sampling schemes and with alternative auxiliary data sources.