Abstract:
Soil moisture (SM) is one of the most key variables on agricultural productivity, land–atmosphere interactions and hydrological processes. It is often required to exact retrieval of the cropland surface soil moisture (SSM) from the remote sensing. Particularly under moderate to high vegetation cover, the strong vegetation attenuation and scattering effects can substantially degrade the radar sensitivity to soil conditions. Existing approaches can rely generally on either physically models or data-driven machine learning (ML). However, the single mechanistic models can often suffer from the limited generalizability and accuracy under complex canopy conditions. Whereas the purely data-driven models can usually lack the physical interpretability and robustness, although they can capture the nonlinear relationships. The jointing stage of the winter wheat, a critical phenological period can be characterized by the high vegetation coverage and strong soil–vegetation interaction, leading to the substantial difficulty for the soil moisture retrieval. This study aims to enhance the retrieval accuracy of the SSM for the winter wheat at the jointing stage by coupling the water-cloud model with random forest. A study area was selected from a typical winter wheat-producing area at Yuanyang County, Henan Province in the North China Plain. Multi-source remote sensing data from Sentinel-1 C-band synthetic aperture radar (SAR) and Sentinel-2 multispectral imagery were integrated with in situ measurements of the surface (0–10 cm) volumetric soil moisture during the satellite overpass period. A systematic evaluation was performed on a classical physical model—the water cloud model (WCM)—and six widely-used ML algorithms, including the support vector regression, multilayer perceptron, random forest (RF), gradient boosting, k-nearest neighbor, and decision tree models. A cascaded hybrid framework, termed WC-RFM (water cloud–random forest model), was proposed to fully exploit the complementary strengths of the physical modeling and machine learning. The WCM served as a front-end mechanistic module to explicitly characterize the vegetation attenuation and soil backscattering. Key physical intermediate variables were derived from the WCM, including the water content, attenuation factors, vegetation backscattering, and soil backscattering components. The physically meaningful features were subsequently incorporated into the RF model. A feature pool was constructed with SAR backscattering and optical vegetation indices. The stability and interpretability of the model were obtained after feature selection using correlation analysis and multicollinearity diagnostics, as well as parameter sensitivity analysis of the WCM. The performance was evaluated using a 5-fold cross-validation. The results demonstrate that the random forest model was achieved the best performance among the standalone ML models, with a coefficient of determination (
R2) of 0.871 and a root mean squared error (RMSE) of 0.020 m
3/m
3. The wc-rfm hybrid model was further improved the retrieval accuracy and robustness, with an
R2 of 0.910 and an RMSE of 0.015 m
3/m
3 on the test set, as well as the lowest performance variability (RMSE standard deviation of ±0.002 m
3/m
3). Compared with the standalone WCM and RF models, the WC-RFM was effectively mitigated the performance degradation of the physical models under dense vegetation, while compensating for the absence of the physical constraints in purely data-driven approaches. Spatial mapping results revealed that there was the outstanding heterogeneity of the soil moisture in the study area, which was consistent with the regional irrigation patterns, soil texture variability, and hydrological conditions, further indicating the reliability of the model. Overall, a physically interpretable water cloud model was coupled with machine learning can significantly enhance the retrieval accuracy of the soil moisture under vegetation conditions. The WC-RFM framework can provide a promising approach for the high-resolution, high-accuracy dynamic monitoring of the cropland soil moisture during key growth stages of the winter wheat in the North China Plain. Multi-temporal meteorological variables and multi-growth-stage observations can be further integrated to strengthen the spatiotemporal applicability of the framework for agricultural monitoring.