Abstract:
Groundwater table prediction plays an essential role in agricultural production, drainage engineering, and water-resources regulation in arid and semi-arid regions, where shallow groundwater dynamics respond quickly to climate forcing and anthropogenic disturbances. The study focused on twenty-three daily monitoring stations located in Linwei District, Weinan City, Shaanxi Province, a typical semi-arid region where groundwater dynamics are governed by complex non-linear interactions between meteorological forcing and human activities. A regional groundwater level prediction framework was developed and evaluated by integrating site-specific explanatory variable selection with Long Short-Term Memory (LSTM) networks and intelligent hyperparameter optimization. To address the significant spatial heterogeneity across the study area, an independent modeling strategy was adopted, wherein a customized prediction model was constructed for each individual observation well. This approach allowed the framework to implicitly capture localized hydrogeological characteristics and varying responses to external stressors without the need for explicit spatial parameterization. The methodology commenced with a rigorous data preprocessing stage. A standard score-based outlier detection method was applied to the daily water level series, utilizing the 3 \sigma criterion in statistics deviations to identify and remove anomalous data points. Missing values resulting from this process were subsequently filled using linear interpolation. To ensure the effectiveness of model inputs, a comprehensive feature screening process was conducted using Pearson correlation analysis. Eight candidate variables, including meteorological and environmental indicators such as reference crop evapotranspiration, daily maximum and minimum temperatures, precipitation, relative humidity, sunshine hours, wind speed, and soil heat flux, were evaluated. Only those factors demonstrating a correlation coefficient absolute value greater than 0.30 with the localized groundwater level were selected as input features. This site-adaptive input selection effectively reduced data redundancy and mitigated the risk of overfitting by excluding weakly correlated noise. To overcome the limitations of manual hyperparameter tuning, two distinct intelligent optimization strategies were implemented and compared: the Salp Swarm Algorithm (SSA) and Bayesian Optimization (BO). The SSA simulated the swarming behavior of salp chains to explore the parameter space through a leader-follower mechanism. In contrast, BO employed a Gaussian Process as a surrogate model to estimate the distribution of the objective function and utilized an Upper Confidence Bound acquisition function to balance exploration and exploitation. These optimizers were used to automatically determine the optimal configuration of the model structures. The results demonstrated that intelligent optimization significantly enhanced the predictive accuracy and robustness of the baseline LSTM models. The benchmark LSTM models achieved average values for Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Nash-Sutcliffe Efficiency (NSE) of 0.15, 0.24, and 0.71, respectively. The introduction of the SSA reduced the average MAE and RMSE to 0.14 and 0.22. The LSTM network coupled with BO (LSTM-BO) exhibited the most superior performance, with the average MAE and RMSE further decreasing to 0.13 and 0.20. Crucially, the average NSE for the LSTM-BO models reached 0.85, and all individual stations maintained an efficiency score above 0.61. This indicated that BO provided a more stable and reliable parameter search than the SSA, particularly in complex stations where the baseline models previously failed or showed significant performance degradation. The findings confirmed that the proposed framework, by combining site-specific feature selection with probabilistic hyperparameter optimization, offered a powerful and practical tool for groundwater resource management and agricultural drainage engineering in semi-arid environments.