利用改进机器学习算法预测土壤分离能力

闫婧歆; 张宽地; 陈俊英; 王雨新; 杨洋; 刘娟娟

doi:10.11975/j.issn.1002-6819.202409182

摘要: 土壤分离能力（Dc）的预测一直以来都是土壤侵蚀领域中的重要课题，而近年来其与机器学习的结合，更是极大地推进了相关学科的发展。机器学习能够充分利用已有的试验数据，在不明晰机制原理的情况下实现对Dc的准确预测，极大地减少了试验所需的时间与成本。该研究通过融合智能优化算法与特征降维技术，构建了基于数据驱动下的高精度Dc预测模型。以研究区5类典型土地利用类型下的土壤样本为基础，通过水槽冲刷试验测定其Dc。基于主成分分析（principal component analysis，PCA）实现输入特征降维，在此基础上，构建了两大智能算法优化的混合模型体系：遗传算法（genetic algorithm，GA）优化的PCA-GA-BP神经网络，麻雀算法（sparrow search algorithm，SSA）优化的PCA-SSA-BP网络模型，并与传统PCA-BP神经网络（back propagation，BP）模型对比。结果表明：相较于未优化模型（BP），智能算法（GA、SSA）使均方根误差降低（31.122%～38.061%），模型拟合优度提升（15.125%～16.625%）。其中PCA-SSA-BP网络模型表现最优（Dc估算值与实测值的决定系数为0.933，均方根误差为0.061），可更好适应复杂情况，实现Dc的有效预测。研究结果对黄土区土壤分离能力的预测具有积极的理论和实践意义。

Abstract: Soil detachment capacity (Dc) is a key indicator in soil erosion dynamics. Accurate and rapid prediction of Dc is increasingly required for soil and water conservation efforts. However, conventional prediction methods are often constrained by experimental time and cost. Machine learning offers a promising approach for accurate Dc prediction. This study constructed a high-precision prediction model by integrating intelligent optimization with feature dimensionality reduction. Soil samples were collected from five typical land-use scenarios based on land use types: cultivated land, wasteland, orchard, forest land, grassland and shrubland. Dc, soil properties, and root parameters were determined under combinations of four freeze-thaw cycles and nine flow slopes. A multi-dimensional dataset was established, and maximum-minimum normalization was applied during preprocessing to eliminate dimensional differences from soil and root trait data. The training and test sets were optimally partitioned into 630 and 270 samples, respectively, minimizing model error and maximizing the coefficient of determination (R²).Further experiments determined that a hidden layer with 10 nodes yielded the optimal model configuration (root mean square error (RMSE) = 0.007). Correlation analysis explored the specific influence directions and degrees of soil properties and root traits on Dc, revealing significant correlations among various factors. Principal component analysis (PCA) was then used to reduce feature dimensionality (cumulative contribution rate = 88.91%), eliminating data redundancy and mitigating potential impacts on model performance.A hybrid model system was constructed using two intelligent algorithms: a PCA-GA-BP neural network (optimized by Genetic Algorithm, GA) and a PCA-SSA-BP network (optimized by Sparrow Search Algorithm, SSA). These were compared with a traditional PCA-BP model. Results showed all three models effectively predicted Dc trends. However, the fitting degree of the optimized models significantly improved: R² for PCA-BP was 0.800, while PCA-GA-BP and PCA-SSA-BP increased to 0.921 and 0.933, respectively (representing increases of 15.125% and 16.625%). PCA-SSA-BP performed best.Error analysis indicated that PCA-BP prediction error fluctuated most significantly. The optimized models exhibited significantly reduced fluctuation ranges and greater stability. Key error metrics confirmed that the mean absolute error (MAE) and RMSE of PCA-GA-BP and PCA-SSA-BP were significantly lower than those of PCA-BP (MAE=0.044 kg/(m²·s), RMSE=0.098 kg/(m²·s)). PCA-SSA-BP achieved the greatest reduction (MAE decreased by 20.316%, RMSE decreased by 38.061%). This PCA-SSA-BP intelligent optimization delivered the best performance, with an R² of 0.933 between measured and predicted values and an RMSE of 0.061 kg/(m²·s).The hybrid model has the interpretability of the mechanism model and the efficiency of the data-driven model. It can better adapt to complex conditions and provide a new method for predicting soil detachment capacity. Finally, some scientific problems existing in the prediction of soil detachment capacity by machine learning are analyzed, and its development prospects are discussed.

利用改进机器学习算法预测土壤分离能力

Predicting soil detachment capacity using improved machine learning