Abstract:
Soil detachment capacity (
Dc) is a key indicator in soil erosion dynamics. Accurate and rapid prediction of
Dc is increasingly required for soil and water conservation efforts. However, conventional prediction methods are often constrained by experimental time and cost. Machine learning offers a promising approach for accurate
Dc prediction. This study constructed a high-precision prediction model by integrating intelligent optimization with feature dimensionality reduction. Soil samples were collected from five typical land-use scenarios based on land use types: cultivated land, wasteland, orchard, forest land, grassland and shrubland.
Dc, soil properties, and root parameters were determined under combinations of four freeze-thaw cycles and nine flow slopes. A multi-dimensional dataset was established, and maximum-minimum normalization was applied during preprocessing to eliminate dimensional differences from soil and root trait data. The training and test sets were optimally partitioned into 630 and 270 samples, respectively, minimizing model error and maximizing the coefficient of determination (
R²).Further experiments determined that a hidden layer with 10 nodes yielded the optimal model configuration (root mean square error (RMSE) = 0.007). Correlation analysis explored the specific influence directions and degrees of soil properties and root traits on
Dc, revealing significant correlations among various factors. Principal component analysis (PCA) was then used to reduce feature dimensionality (cumulative contribution rate = 88.91%), eliminating data redundancy and mitigating potential impacts on model performance.A hybrid model system was constructed using two intelligent algorithms: a PCA-GA-BP neural network (optimized by Genetic Algorithm, GA) and a PCA-SSA-BP network (optimized by Sparrow Search Algorithm, SSA). These were compared with a traditional PCA-BP model. Results showed all three models effectively predicted Dc trends. However, the fitting degree of the optimized models significantly improved:
R² for PCA-BP was 0.800, while PCA-GA-BP and PCA-SSA-BP increased to 0.921 and 0.933, respectively (representing increases of 15.125% and 16.625%). PCA-SSA-BP performed best.Error analysis indicated that PCA-BP prediction error fluctuated most significantly. The optimized models exhibited significantly reduced fluctuation ranges and greater stability. Key error metrics confirmed that the mean absolute error (MAE) and RMSE of PCA-GA-BP and PCA-SSA-BP were significantly lower than those of PCA-BP (MAE=0.044 kg/(m
2·s), RMSE=0.098 kg/(m
2·s)). PCA-SSA-BP achieved the greatest reduction (MAE decreased by 20.316%, RMSE decreased by 38.061%). This PCA-SSA-BP intelligent optimization delivered the best performance, with an R² of 0.933 between measured and predicted values and an RMSE of 0.061 kg/(m
2·s).The hybrid model has the interpretability of the mechanism model and the efficiency of the data-driven model. It can better adapt to complex conditions and provide a new method for predicting soil detachment capacity. Finally, some scientific problems existing in the prediction of soil detachment capacity by machine learning are analyzed, and its development prospects are discussed.