Abstract:
Addressing the challenges of limited generalizability of single mechanistic models and poor physical interpretability of standalone machine learning (ml) models in remote sensing-based soil moisture (sm) retrieval over complex vegetation-covered cropland, this study focused on the jointing stage of winter wheat, a critical period with high vegetation cover and significant retrieval difficulty. We selected a typical winter wheat-producing area in the North China Plain, Yuanyang County as the study area. By integrating Sentinel-1 synthetic aperture radar (sar) and Sentinel-2 multispectral imagery with in-situ measurements of volumetric soil moisture, we systematically compared the classical water cloud model (wcm) with six typical ml models (random forest is the best performer). Based on this, we constructed a novel, deeply cascaded hybrid model, wc-rfm (water cloud-random forest model), which couples physical mechanism with data-driven learning. Through extracting radar backscattering features, optical vegetation indices, and physical intermediate variables from the wcm, followed by rigorous feature selection and parameter sensitivity analysis, an optimal set of input variables was determined, and model performance was evaluated using 5-fold cross-validation. The results indicate that among standalone ml models, rf achieved the best performance (
R2 : 0.871, RMSE : 0.020 m
3/m
3). The proposed wc-rfm hybrid model further enhanced retrieval accuracy and robustness (test set
R2 : 0.910, RMSE : 0.015 m
3/m
3). The wc-rfm effectively mitigates the limitations of traditional physical models under moderate to high vegetation cover while addressing the lack of physical constraints in purely data-driven models. In addition, it shows higher robustness than single models, with an RMSE standard deviation of ±0.002 m
3/m
3. This study provides a new approach for high accuracy, high spatial resolution dynamic monitoring of cropland soil moisture during the winter wheat jointing stage in the North China Plain. Future work will explore the incorporation of sequential meteorological factors and multi-growth-stage observational data to deepen the model's mechanistic understanding and expand its spatiotemporal applicability for operational applications.