Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins

Yang, Ruibiao; Wu, Jinglu; Gan, Guojing; Guo, Ru; Zhang, Hongliang

doi:10.3390/w16243699

Open AccessArticle

Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins

by

Ruibiao Yang

^1,2

,

Jinglu Wu

^1,2,*,

Guojing Gan

^1,2

,

Ru Guo

^1,2 and

Hongliang Zhang

^1,2

¹

State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences (CAS), Nanjing 210008, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Water 2024, 16(24), 3699; https://doi.org/10.3390/w16243699

Submission received: 18 November 2024 / Revised: 18 December 2024 / Accepted: 21 December 2024 / Published: 22 December 2024

Download

Browse Figures

Versions Notes

Abstract

:

The implementation of accurate water balance assessment in glacier basins is essential for the management and sustainable development of water resources in the basins. In this study, a hybrid modeling framework was constructed to enhance runoff prediction and water balance assessment in glacier basins. An improved physical hydrological model (SEGSWAT+) was combined with a machine learning model (ML) to capture the relationship between runoff residuals and water balance components through the Shapley additive explanations (SHAP) method. Based on the enhancement of the runoff fitting results of the existing model, the runoff residuals are decomposed and used to correct the hydrological process component values, thus improving the accuracy of the water balance results. We evaluated the performance and correction results of the method using various ML methods. We analyzed the results for two consecutive periods from 1959 to 2022 for the glacial sub-basins of three tributaries of the Upper Ili River Basin in central Asia. The results show that the hybrid framework based on extreme gradient boosting (XGBoost) with an average NSE value of 0.93 has the best performance, and the bias based on the evapotranspiration component and soil water content change component is reduced by 3.2–5%, proving the effectiveness of the water balance correction. This study advances the interpretation of ML models for hydrologic assessment of areas with complex hydrodynamic characteristics.

Keywords:

water balance; explainable machine learning; glacial basin; hydrological model

1. Introduction

Accurate hydrological simulations are pivotal for understanding the impacts of climate change and human activities on water cycle dynamics. They are also crucial for the ongoing observation and management of water resource distribution [1]. A wide range of hydrological models, based on physical and statistical methodologies or their combination, has been developed and applied worldwide [2]. These models serve as essential tools for researchers to understand historical and current watershed changes and to assess the effects of management practices and drivers of change [3]. Choosing the appropriate hydrological model for a specific river basin is particularly important and depends on the region’s climate, hydrology, and land conditions [4]. Given that many of the world’s major river systems originate in mountainous regions, where much freshwater is stored in snow and glaciers [5], accurate and efficient hydrological modeling of these basins improves the accuracy of the process of hydrological simulation, which is critical for water balance assessment.

Due to inadequate representation of snowpack and glacial hydrological processes, traditional distributed rainfall–runoff models often need help to accurately simulate and predict hydrological dynamics in mountainous regions, especially in watersheds where glacial runoff is the primary source of water [6]. The parameters control the efficacy of such models, and the simulation process is subject to more uncertainty as the number of parameters and their possible values increase [7]. Efforts to incorporate snowpack and glacier-related processes into hydrological modeling have significantly improved model performance by enhancing the Variable Infiltration Capacity (VIC) and Soil and Water Assessment Tool (SWAT) models using temperature index methods and glacier mass balance theory [8,9,10]. However, the addition of new modules has increased the complexity of model parameter calibration and validation, posing a challenge to achieving highly reliable predictions. In recent years, statistical methods, especially machine learning (ML) methods, have been increasingly used in streamflow simulation and prediction, where they provide highly accurate results with fewer optimization parameters and reliance on expert experience. In addition to the traditional single-model approaches, ensemble learning methods have also achieved wide application [11,12,13,14,15]. Deep learning (DL) models further consider the time series nature of hydrology, as well as deep models of increasing complexity and size, which have begun to show excellent performance in the hydrological modeling of alpine basins [1,16,17,18]. However, these excellent performances are mainly focused on runoff simulation. By eliminating the physical hydrological parameters, the ‘black box’ nature of ML models limits researchers’ ability to investigate whether the internal water balance is in accordance with the factual pattern. It is essential to note that the accuracy of meteorological data will directly affect the performance of hydrological modeling, whether it is a physical model or a statistical approach. In some data-scarce study areas, such as arid mountainous regions or glacial basins, the accuracy of observations or products is consistently disputed [19]. Therefore, there is an urgent need for a new approach that effectively combines the advantages of both types of hydrological modeling to solve or alleviate the problem of accurate hydrological modeling in the presence of data scarcity or inaccuracy.

Many studies nowadays emphasize that hybrid models combining ML techniques with physically based hydrological modeling approaches can effectively transcend the limitations of a single model and produce superior results [6,20]. As an example, in a study conducted in a glacier basin in Northwest China, the hybrid model constructed by Yang et al. [17] by combining long short-term memory (LSTM) networks and a gated recurrent unit (GRU) with the SWAT model showed significant advantages in hydrological simulation at the monthly scale. Wang et al. [21] used the Shapley additive explanations (SHAP) method based on ‘game theory’ for attribution analysis of runoff simulation in the Xiaoqing River basin. Furthermore, in recent studies, the hybrid modeling approach has shown excellent capability in both data-sparse regions and large alpine basins [22,23]. However, to the best of our knowledge, no study combines an interpretable machine learning approach with a physical model type to improve water balance accuracy in hydrological modeling, especially for complex catchments containing glacial runoff.

To address the above issues, we developed an interpretable machine learning (ML) hydrological modeling approach combined with physical models for watersheds containing glacial runoff to improve water balance accuracy in hydrological modeling. The research focuses on (1) evaluating the accuracy of hybrid hydrological modeling approaches constructed with different ML models for glacial watersheds; (2) interpretation of ML model fits through SHAP and use of the interpreted results for quantitative correction of biases in the water balance process. Four representative ML methods were selected for model performance comparison to achieve these objectives. The seasonal trend decomposition procedure based on LOESS (STL) was used to perform time series decomposition of eigenvalues to improve the accuracy of SHAP interpretation. The modeling approach is applied to simulate monthly runoff over two long periods (1959–1996 and 1997–2022) in the upper watershed of the Ili River Basin in central Asia. This work thus aims to improve the accuracy of hydrological modeling in water balance studies in glacial basins.

2. Materials and Methods

2.1. SEGSWAT+

The SWAT+ model is a significant improvement on the widely used SWAT model [24,25]. It divides the watershed into spatial entities such as hydrologic response units (HRUs) to fully integrate hydrologic processes. Luo [9] and Yang et al. [26] introduced a new implementation for modeling glacier hydrology in the SWAT+ iteration. Rooted in temperature index methods and glacier volume–area mass balance methods, the approach incorporates glacier meltwater, culminating in the streamlined yet powerful GSWAT+ model. This approach produced reasonable runoff simulation results in the high mountainous terrain of Asia [27]. However, given the stochastic nature of the parameter rate-setting process within the model, the equilibrium results of the water balance elements may not be consistent with reality. Therefore, to minimize such errors as much as possible, we optimized the original sublimation calculation method in the GSWAT+ model by physically constraining it. We used a mass transfer method that has been tested in the high mountainous regions of Asia and proved to be effective in improving the snow and ice sublimation calculations, which is given by the following equation [28]:

S u b l i m a t i o n = a (1 + b * u) V P D

(1)

where

u

is the near-surface wide speed,

V P D

is vapor pressure deficit (hPa), and

a

and

b

are two parameters, which were taken as 0.26 and 0.54, respectively, as in the work of Zhou et al. [29]. To differentiate from the original GSWAT+ model, the sublimation-enhanced model is named the sublimation-enhanced glacier SWAT+ model (SEGSWAT+). In the hydrological process of SEGSWAT+, the main component inputs include rainfall after snowfall (Rain), snowmelt (M_snow), and glacier melt (M_glacier). The outputs are divided into five elements: surface runoff (Q_sur), seepage of water from the soil into deeper layers (W_seep), evapotranspiration (ET), underground runoff (Q_gw), and soil water content change (ΔSW). The modeled water balance equations are as follows:

R a i n + M_{s n o w} + M_{g l a c i e r} = Q_{s u r} + Q_{g w} + W_{s e e p} + E T + ∆ S W

(2)

The similarities and differences between the SWAT+ model, the GSWAT+ model, and the SEGSWAT+ model are shown in Table A1 in Appendix A.

2.2. Machine Learning Models

With limited data availability, complex machine learning (ML) models do not necessarily lead to superior results. However, they may instead lead to reduced efficiency, increased computational cost, and risk of model overfitting. Therefore, in this study, we selected four machine learning models that are currently popular in hydrological modeling—namely, artificial neural networks (ANNs), long short-term memory (LSTM) networks, random forest (RF), and extreme gradient boosting (XGBoost)—for the comparative analysis of performance in the model selection process. The four models selected in this study represent a single machine learning model (ANN), a deep learning model (LSTM), and ensemble learning models (RF and XGBoost), which cover the current mainstream machine learning methods. The rationale for selecting these models is that they all show strong generalization ability and some representativeness in different scenarios. By comparing the performance of these models, we aim to identify the most suitable prediction model under limited data conditions and provide a scientific basis for subsequent practical applications. The detailed list and hyperparameters of the models are shown in Table A2 in Appendix A. All models are implemented based on the Scikit-learn library and the Pytorch library in Python.

2.3. STL and SHAP

The seasonal trend decomposition procedure based on LOESS (STL) provides more robust time series decomposition results by mitigating the effects of outliers through local fitting at each time point using a locally weighted regression (LOESS) technique [30]. This approach refines the time series into trend, seasonality, and residual components and is applicable to any data with a seasonal frequency exceeding one. It was used in this study to remove the disturbances of trend and seasonality from the time series of the water balance variables. It is calculated as follows:

Y_{t} = T_{t} + S_{t} + R_{t}

(3)

where

Y_{t}

is the observed value at time t, and

T_{t}

,

S_{t}

, and

R_{t}

are the trend component, cycle component, and residual component at time t, respectively.

The Shapley additive explanations (SHAP) method is a machine learning result interpretation method inspired by cooperative game theory, where all features are considered as contributors [31]. In the SHAP analysis framework, each prediction sample corresponds to a prediction value, and the SHAP value is the value of each feature in the prediction sample. In the regression analysis task, a positive or negative SHAP value represents a positive or negative impact on the prediction, and a high or low value represents the degree of impact. The advantage of the SHAP method is that the results of each prediction are analyzed individually, providing a complete sample of results. Assuming that the

i_{t h}

sample is

x_{i}

, the

j_{t h}

feature of the

i_{t h}

sample is

x_{i j}

, the predicted value of the model for the sample is

y_{i}

, and the mean value of the target variable of the sample is

y_{b a s e}

, then the formula for calculating the SHAP value is as follows:

y_{i} = y_{b a s e} + f (x_{i 1}) + f (x_{i 2}) + \dots + f (x_{i k})

(4)

From the formula,

f (x_{i 1})

is the contribution value of the first feature in the

i_{t h}

sample to the final predicted value

y_{i}

. When

f (x_{i 1})

> 0, it means that the feature promotes the increase in the predicted value, which is a positive effect.

2.4. Hybrid Modeling Framework

The residuals between the observed and modeled runoff values indicate a corresponding error in the modeled water balance results. The runoff residuals are considered to be a more suitable target for ML model predictions than the runoff values in many hybrid hydrological modeling [17]. Although the runoff residuals cannot fully represent bias in the hydrological process, given the attenuation that exists in the hydrological process, a correct decomposition of the residuals can improve the accuracy of the water balance components to a large extent. Therefore, we use the runoff simulation residuals generated by SEGSWAT+ as the target of the ML model, train the trend-decomposed water balance components as the features of the model, and use SHAP to interpret the training results of the optimal model. Such quantitatively decomposed residuals (based on the value of the interpreted contribution) are then used to correct the model’s hydrological process components. The process framework of this hybrid modeling approach is shown in Figure 1, where Q_obs denotes the observed runoff values, Q_sim denotes the fitted runoff values, Q_res denotes the residuals between the runoff simulated values and the observed values, and Q_{(res, sim)} denotes the simulation results of the DL model. The meteorological and subsurface data are the inputs to the physical model, the difference between the fitted runoff results and the measured values (i.e., residuals) is used as the input to the ML model, and the residuals of the water balance process coefficients output from the physical model after STL decomposition are used as the output of the ML model. Hybrid modeling is expected to improve the accuracy of runoff simulation while also ensuring the physical reality of hydrological processes within the model framework.

In addition, some commonly used evaluation methods were used to assess the accuracy of modeling: coefficient of determination (R²), normalized root mean square error (NRMSE), correlation coefficient (r), Nash–Sutcliffe efficiency coefficient (NSE), absolute percentage of bias (APBIAS), and Kling–Gupta efficiency (KGE) were used to assess the model performance and the degree of fit of runoff simulation, which has been widely used in previous hydrologic modeling studies [32,33,34]. The specific equations are

R^{2} = 1 - \frac{\sum_{t = 1}^{N} {(Q_{r e s}^{t} - Q_{r e s, s i m}^{t})}^{2}}{\sum_{t = 1}^{N} {(Q_{r e s}^{t} - \bar{Q_{r e s}^{t}})}^{2}}

(5)

N R M S E = \frac{\sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(Q_{r e s}^{t} - Q_{r e s, s i m}^{t})}^{2}}}{M A X (Q_{r e s}) - M I N (Q_{r e s})}

(6)

r = \frac{\sum_{t = 1}^{n} (Q_{s i m}^{t} - \bar{Q_{s i m}}) (Q_{o b s}^{t} - \bar{Q_{o b s}})}{\sqrt{\sum_{t = 1}^{n} {(Q_{s i m}^{t} - \bar{Q_{s i m}})}^{2} \sum_{i = 1}^{n} {(Q_{o b s}^{t} - \bar{Q_{o b s}})}^{2}}}

(7)

N S E = 1 - \frac{\sum_{t = 1}^{N} {(Q_{o b s}^{t} - Q_{s i m}^{t})}^{2}}{\sum_{t = 1}^{N} {(Q_{o b s}^{t} - \bar{Q_{o b s}^{t}})}^{2}}

(8)

A P B I A S = |\frac{\sum_{t = 1}^{N} (Q_{s i m}^{t} - Q_{o b s}^{t})}{\sum_{t = 1}^{N} Q_{o b s}^{t}}| \times 100

(9)

K G E = 1 - \sqrt{{(\frac{\bar{Q_{s i m}}}{\bar{Q_{o b s}}} - 1)}^{2} + {(\frac{σ_{s i m}}{σ_{o b s}} - 1)}^{2} + {(r - 1)}^{2}}

(10)

where

\bar{Q_{s i m}}

and

\bar{Q_{o b s}}

represent the mean values of the fitted and observed runoff, respectively, and

σ_{s i m}

and

σ_{o b s}

represent the standard deviation of the simulated and observed values, respectively.

2.5. Study Area

The present study is centered on the glacial sub-basin of the upper Ili River in the Ili-Lake Balkhash Basin, which is located on the border between China and Kazakhstan, and includes the Kash River, the Tekes River, and the Kunes River as its main tributaries. These rivers join to form the Ili River, which flows into Lake Balkhash, with a watershed area of about 47,600 square kilometers, ranging from 692 to 6302 m above sea level, and varied topography. According to the first glacier inventory in China, there are 2721 glaciers at higher elevations in the basin, covering an area of about 2300 square kilometers. Snowmelt and glacier meltwater provide a large amount of water resources for the three main tributaries, especially the Tekes and Kashi Rivers [35]. The study period was chosen to be from 1959 to 2022, considering the data collection time for the two glacier inventories in China, the warm-up requirements of the hydrological model, the variability in the underlying ground surface, and the inter-annual variability of runoff from hydrological stations at the basin outlet. The period was further divided into two phases, 1959–1996 and 1997–2022, to refine the water balance assessment. The detailed geographical location of the study area is shown in Figure 2.

2.6. Data and Preprocessing

The hydrological model is driven by meteorological data, including daily maximum and minimum temperatures, total precipitation, solar radiation, and wind speeds at a spatial resolution of 0.1 degrees from 1959 to 2022 from the ERA5-Land reanalysis dataset, corrected for bias in the mountainous regions of central Asia [36], using the in situ precipitation through the method of He et al. [37]. Actual evapotranspiration data from SynthesizedET and remotely sensed soil moisture data from ESA CCI were used to constrain the model and validate the calibration results [38,39]. We modeled the water balance in the study area for the period 1961–2022 and calibrated and validated the model using monthly runoff observations from 1961–1979 and 2000–2008 at five hydrological stations in the basin, as detailed in Table 1. Considering the differences in the continuity of the data at these stations, the calibration process was adjusted to the available data at each station. According to the best practice of previous studies, 70% of the data were used for model calibration and training, and the remaining 30% were used for validation to ensure a reliable assessment of the accuracy of the model predictions.

In addition, Copernicus DEM data at 30 m spatial resolution can be used for stream delineation, watershed outlining, and elevation zoning and are widely available at high elevations [40]. Second, the high-resolution Digital Soil Open Map (DSOLMap) at 250 m resolution and the 30 m Global Land Cover Product (GLC_FCS30) were resampled and classified for HRU identification [41]. Considering the span of the two study periods, GLC_FCS30 data were selected for 1985 and 2010, respectively [42]. Most importantly, to simulate the glacier hydrological processes, the First China Glacier Inventory (FCGI) and Second China Glacier Inventory (SCGI) datasets were introduced to determine the initial state of the glacier in different periods. These datasets were superimposed on the corresponding land use data as the final input data depending on the geographic location of the study area, data availability, and modeling start time. Based on the modeling approach, the study area was divided into 23 sub-basins with 5501 and 5655 HRUs for the two periods and included 335 and 309 GHRUs for each period.

Prior to ML model training, the target variables and features were normalized to speed up model convergence and reduce computational issues associated with calculating the loss function within the model. The proportional predictions were then back transformed to obtain the actual predicted values, thus ensuring the practical applicability of the model outputs.

3. Results

3.1. Model Evaluation

To optimize the performance of the SEGSWAT+ model, sensitive parameters for simulating runoff were selected using the Sobol index provided by R-SWAT and the model parameters were optimized [43]. Parameters related to snowpack and glacial hydrological processes were selected for optimal calibration, as well as those most frequently calibrated in previous studies [26]. Detailed calibration procedures for these parameters can be found in previous studies. In this study, the warm-up period of the model was set to 2 years using the optimal parameters, and multi-site calibration was used to obtain the results and derive the runoff values for the two simulation periods and for the different control stations in the watershed. The mean values of the assessment metrics for these stations for different periods are shown in Table 2. For daily scale hydrologic simulation, NSE greater than 0.5 is acceptable, which indicates that the model has sufficient performance to predict runoff. For APBIAS, a deviation of 15% is acceptable, and 10% is good. The KGE value reflects the overall fit superiority of the runoff series, with the most favorable value of 1.

The results show that both SEGSWAT+ and GSWAT+ have superior results for NSE and KGE metrics for both models (both higher than 0.5), indicating that both models have good runoff simulation performance in the study area. The r-value of SEGSWAT+ is higher than 0.7, while GSWAT+ has the lowest value of 0.61 in the validation period of the Subsequent period. SEGSWAT+ has the highest NSE value (0.88), and the average of the two models is close to 0.80, while GSWAT+ has the highest KGE value (0.81). The average of both models was close to 0.73, indicating that the runoff simulation time series results of the two models were close to each other. In addition, all models had APBIAS values greater than 10% for all periods, and considering the performance of NSE, the models captured the overall trend and magnitude of hydrologic processes in the study area. However, as indicated by APBIAS, there is still a gap in this result in terms of capturing peak flows. Considering that the model has been fully parameter optimized, the driving data may be contributing to this extreme bias and, therefore, need to be improved in conjunction with the ML model to close this gap.

Although it seems that the runoff performance of the two models in the study area is relatively close, it should be noted that the ABPIAS of ET, which is one of the important water balance elements, is more pronounced in GSWAT+ than in SEGSWAT+ under the original snow and ice sublimation calculation conditions (Figure 3). By comparing with the actual ET remote sensing product (SynthesizedET), the average absolute error of SEGSWAT+ after the improvement of snow and ice sublimation in the ET of the study area was effectively limited to less than 10% (with a maximum value of about 8.8%). In comparison, the maximum value of the error of GSWAT+ was 17%. Overall, the improved model has an average value of 3.6%, which is a performance improvement of about 50% for ET alone compared to the average 7.6% error of GSWAT+. The slightest possible error in the water balance element is more conducive to the accuracy of the water balance assessment. Therefore, we used the water balance element from the SEGSWAT+ model rows as inputs for the subsequent ML model rather than GSWAT+.

During the hyperparameter tuning of the ML models, we explored various structural configurations to ensure that the models all achieved optimal performance. In this phase, for ANN and LSTM, we tested dynamic learning rates from 0.1 to 0.0001; evaluated fully-connected layer sizes of 64, 128, and 256; and tried network stacking layers of 1, 2, and 4. Moreover, the number of decision trees from 1 to 100 was tried for RF and XGBoost. The number of training epochs was set to 1000, 2000, and 4000 to measure the learning ability and stability of the model over time. To reduce the risk of overfitting, dropout rates between 0.2 and 0.5 were scrutinized, a key step in ensuring that the model both learns accurately from the training data and remains robust to new unseen data. These experiments resulted in the adoption of appropriate parameters for the ML models under study, with the hyperparameters for each model shown explicitly in Table A2.

Figure 4 presents the results of the assessment of the performance of all models in predicting runoff residuals using water balance element features that combine data from both phases of the study period. The model performance is evaluated by two indicators, R² and NRMSE, where R² indicates the degree of model fit; the higher the value, the better the model fit, and NRMSE indicates the model prediction error; the lower the value, the better the model prediction accuracy. The R² values of ANN are 0.78 and 0.63 in the calibration and validation phases, respectively, which are the lowest among all the models. This indicates that the ANN has a poor fitting effect. In addition, the NRMSE value of the ANN was higher than 0.2 in both phases, further indicating its lower prediction accuracy. In contrast, LSTM and RF perform better. Both have R² values above 0.7, but their performance varies slightly in different stages. Among them, LSTM outperforms RF in the calibration phase and slightly less in the validation phase. XGBoost model performs the best. Its R² values reached 0.89 and 0.83 in the calibration and validation phases, respectively, which were significantly higher than the other models. Meanwhile, the NRMSE value of XGBoost is only 0.08 in the calibration stage, which is a minor prediction error among all models, further proving its excellent performance in runoff residual prediction. Combining the above results, we conclude that there is an obvious bottleneck in the traditional single machine learning model in hydrological modeling tasks, especially when the quantity of data is small, with low fitting ability and prediction accuracy, which is in line with the conclusions of previous studies [44]. Deep learning models and ensemble learning methods performed significantly better than traditional methods and were robust. Despite the competitiveness of the deep learning models in some phases, the ensemble learning methods (especially XGBoost) show superior overall performance after sufficient hyperparameter tuning.

3.2. Comparison of Runoff Prediction

Figure 5 illustrates the monthly flow series simulated and predicted by the proposed models for two distinct periods, along with the corresponding performance assessments at various stages. The results demonstrate that the residual-corrected runoff models consistently outperform standalone conventional hydrological models, regardless of the machine learning (ML) method employed. This finding aligns with previously published studies, even though the training features for ML in this research differ significantly. During the calibration phase, the hybrid models accurately identified and captured both peak and low-flow conditions, with Nash–Sutcliffe efficiency (NSE) values exceeding 0.8 for both periods—an outcome generally regarded as decisive in runoff simulation research. Among the hybrid models, the highest NSE during calibration was achieved by XGBoost (0.92), followed closely by LSTM and RF (both at 0.91). In the validation phase, the performance ranking of the ML models improved in the order of ANN, LSTM, RF, and XGBoost. In subsequent stages of this study, the results remained consistent. The absolute percent bias (APBIAS) metric showed significant improvement across all hybrid methods compared to the standalone hydrological model, with biases reduced to within 10%. Notably, XGBoost demonstrated exceptional performance, limiting the bias to less than 2% across all periods and stages. Analysis of the time series data reveals that the primary advantage of the hybrid models lies in their ability to capture extreme values more effectively. This observation is further corroborated by the Taylor diagrams, which indicate that the correlation coefficients of the hybrid models exceed 0.9. Additionally, their ratio of the standard deviation falls within ±0.1, except for ANN and RF, which are slightly outside this range but remain close to 0.1. Among all metrics, XGBoost consistently exhibited the best performance, followed by RF and LSTM, which showed comparable results, with ANN ranking last. These findings are consistent with performance comparisons reported in other hydrological modeling studies [1,45]. Overall, the results indicate that the inter-model coupling strategy, which incorporates water balance components as inputs for ML models, significantly enhances runoff simulation accuracy in the study area.

3.3. Water Balance Correction

We use SHAP values to quantify the influence of different characteristic elements on model residuals in prediction, which contribute positively to residual prediction when SHAP values are more significant than 0 and negatively when they are less than 0. Figure 6 shows the distribution of SHAP values for the four coupled models, while Figure 7 shows the average degree of contribution of each element. The results show that the distribution of each element is similar in the four modeling results. According to the overall degree of SHAP influence, snowmelt water (M_snow) has the most significant influence on runoff residuals, followed by glacial meltwater (M_glacier) and net precipitation (Rain). Specifically, the three water-generating elements contribute positively to the residual prediction at mostly low values while contributing negatively to the residual prediction at high values. This suggests that in the study area, inaccuracies in the inputs of the water quantity generating elements during periods of low precipitation, snowmelt, or ice melt are the leading cause of residuals. In terms of water-consuming elements, the change in soil water content (ΔSW) had the greatest impact on the prediction of runoff residuals, and the contribution of the change in soil water content to the prediction of runoff residuals was positively correlated when ΔSW was increased. The more ΔSW was added, the more pronounced the impact of this element on the corresponding runoff residuals. As for the remaining elements, evapotranspiration (ET) has a positive contribution to the runoff residuals at most low values, and the SHAP distribution of groundwater behaves oppositely. While infiltration and surface runoff do not have obvious distribution patterns of high and low values, most of their SHAP values are concentrated near the 0 value. Thus, the influence of their contribution to the runoff residual is ranked lower. Overall, according to the relative magnitude performance of SHAP, ΔSW is the most influential of all water balance elements on the residual prediction results, followed by ET. The related study proved that precipitation-driven data in the upper Ili River were still significantly underestimated even after calibration of the actual measurements, which also led to the inaccuracy of ET [46]. The deviation between these two elements themselves and the observed values during the study period may be the main reason for the residuals and supports the viewpoints of the previous studies. Therefore, corrections to the balance of water quantity elements may compensate for these biases. Considering that the pattern of SHAP distributions is similar in all modeling results, we used the best-performing XGBoost model outputs for residual-based water balance correction.

To achieve local interpretability, this study ranked the residual prediction results of XGBoost in order of magnitude and divided them into four equal parts (25% of data per part). The median of each data share is used to decompose the amount of feature contribution quantitatively, and the results are shown in Figure 8. The results show that at lower model outputs (a), M_glacier and M_snow push the predicted values higher, while other elements such as ET and surface runoff push the predicted values lower. ET has the most significant degree of influence on the output, accounting for 39.5% of the total influence, pushing down the prediction of the runoff residual by about 7.9 m³/s. At this point, the value of ET itself is 79.54 mm, which is at a higher level of evapotranspiration. When the residual prediction rose to the second median (b), the main influences that pushed the prediction up were ET and W_seep. The main influences that pushed it down were ΔSW and Q_gw, which contributed 10.5%, 14.1%, 43.5%, and 11.8%, respectively, with ΔSW having the most significant influence. In the third median sample, there were no elements with a significantly high degree of influence, with M_snow being the most important element pushing the forecasts up and ΔSW being the most important element pushing them down. Finally, in the median sample (d), which represents the highest residual forecasts, ET was the most important factor pushing the forecasts up, with a degree of influence of about 31.0%, followed by rainfall, with a degree of influence of about 16.7%. Snow was the most important factor pushing the forecasts down at this point. In all the results, M_glacier is in the state of pushing up the forecast, and it does so more significantly when the residuals change from negative to positive values. Similarly, ET is pushed up when the prediction is optimistic, pushed down when the prediction is pessimistic, and is high (>20 mm) in all three samples (a, b, and d), suggesting that ET misalignment may be dominating the generation of runoff residuals.

Although the accuracy of all the water balance process elements cannot be fully known, it can be explored with data from some of the elements. Considering the data scarcity in the high mountainous areas of Asia, remote sensed data were used to evaluate the calibration results, and the results are shown in Figure 9 for the deviation results between the results and the rocker products. The boxplot results show that the APBIAS of ΔSW and ET showed significant improvement after calibration. The mean absolute deviation of ΔSW was about 14% before calibration, which decreased to about 9% after calibration. For the extreme values, the most significant deviation value also decreased from about 29% to about 22%. The narrowing of the box results indicates more concentrated data and reduced overall uncertainty of the deviation. For ET, the mean value of ET before calibration was 8.7% and after calibration decreased to 5.5%, with the maximum deviation decreasing from 19% to 15%, with a similar narrowing of the box. After calibration, the absolute deviations of both metrics (ΔSW and ET) are significantly more minor than before calibration, both in terms of median, mean, and error distribution range. The improvement in ΔSW is more significant than in ET, with the distribution of ΔSW deviations being significantly more concentrated and the median significantly lower after calibration. This suggests that the calibration process of water balance elements has a significant effect on reducing the absolute deviations of land wetness (ΔSW) and evapotranspiration (ET) and that the hybrid modeling approach model significantly improves the reliability and accuracy of the simulation results.

Given that glaciers are particularly sensitive to climate change, glacier melt rates can be a valuable indicator for assessing the water balance element in simulations that incorporate glacier runoff, especially in the absence of measured data [3]. Based on data from the First and Second China Glacier Inventories (FCGI and SCGI), the average annual rate of glacier area reduction in the early part of the study area was 0.84%. However, this value had an error of up to 19% before correction. After the water balance correction, the error is reduced to 8 percent, and the corrected result is closer to the actual degradation rate of the glacier area. In addition, evapotranspiration (ET) in the study area has shown a clear upward trend in the last 60 years, and the proportion of ET in the water balance should have increased [46]. However, the pre-correction result was the opposite (a decrease of 0.33 percent). After the correction, the ET adjustment shows an increase of 1.12 percent, which is consistent with the observed trend. About the specific contribution of the two periods, the average correction to the elemental bias was 10.8 percent in the former period and 9.6 per cent in the latter. Thus, by evaluating the input and output components of the water balance in each period, our interpretable hybrid framework significantly improves the accuracy and validity of the water balance assessment.

4. Discussion

In this study, the hybrid model constructed for glacier basins demonstrated excellent performance in enhancing water balance assessments. Simultaneously, the SHAP method effectively explained the influence of water balance components on residual runoff predictions. In previous research on the upper Ili River, numerous studies have focused on identifying the primary factors influencing runoff variability [23,47]. Here, we explore the seasonal differences in the impact of meteorological and anthropogenic factors on runoff residuals through the lens of water balance components. The standardized SHAP values of each component across different seasons are presented in Figure 10.

The effect of the snowmelt factor on the residuals was most significant in the spring, gradually approaching zero over time. This indicates that snowmelt runoff plays an important role in the spring. In contrast, in other seasons, as the amount of snowmelt decreases, the contribution of snowmelt runoff to the runoff residuals also decreases. Similarly, the contribution of glacier runoff to the runoff residual is mainly in summer, which is the main period of glacier meltwater generation in the Ili River basin. Rain, as the most dominant water source, has a significant seasonal contribution to the residual, as does glacial meltwater. Gan et al. [46] showed that increased rainfall and increased snow and ice melt dominated the increase in runoff in the Kash watershed, a sub-watershed of this study. Thus, the uncertainty in hydrologic modeling may increase further with future warming and increased precipitation in the upper Tien Shan region of the watershed [48].

Among all water balance components, ΔSW is the only factor whose contribution to runoff residuals does not show a clear seasonal trend. In the SWAT model, soil water changes are calculated after other components, often leading to notable deviations [25]. This explains why ΔSW consistently exhibits a significant contribution to residuals in most cases. On the other hand, groundwater, a widely studied component, shows different patterns of contribution to runoff residuals during the two distinct periods analyzed. Considering the complexity of groundwater flow as a physical process and given that SWAT has specialized modules for improving groundwater simulation, the present study cannot effectively elucidate how groundwater dynamics influence runoff residuals [49]. This limitation highlights an area for improvement in this research. Exploring groundwater variability in the study area could serve as a valuable direction for future investigations.

Accurate water balance assessments are critical for sustainable water resource management and for mitigating the adverse impacts of climate change and human activities on regional hydrology. The introduction of interpretable mixing models has significantly improved the accuracy of runoff residual predictions in glacial basins, enhancing the precision and reliability of water balance assessments by offering meaningful feature interpretations. However, the computational duration required for SHAP-based interpretation increases with the number of features, underscoring the need for optimization in future studies. Potential strategies include what has been attempted in current research, exploring new diagnostic methods that can be interpreted locally, alternative methods, etc., to balance interpretation accuracy and computational efficiency [50,51]. Furthermore, the flexibility of the model coupling strategy used in constructing the hybrid model extends its applicability beyond glacial basins, suggesting potential for broader regional applications that merit further validation. This water balance correction approach demonstrates more pronounced and verifiable results when applied at larger scales. Although this study encompasses a significant area, generalizing these findings to larger watersheds or regional scales remains a challenge, limiting their broader applicability. Future research should expand equilibrium correction analyses to cover larger areas, such as the entire Ili–Lake Balkhash basin, to evaluate the scalability of the proposed corrections.

Notably, the corrected water balance structures revealed significant differences between the two study periods, highlighting the extensive impacts of climate change and human activities on the upper Ili River Basin over the past six decades. While this study primarily focused on water balance corrections, the impacts of these changes—particularly regarding glacier dynamics—were not explored in depth. Follow-up research dedicated to quantifying these impacts could provide a more comprehensive understanding of water quantity changes across the closed basin, thereby enhancing the accuracy of hydrological assessments on broader spatial and temporal scales. The integration of physical and statistical modeling with interpretable methods demonstrated in this study underscores their effectiveness in glacial basin hydrological modeling and analysis. Beyond improved prediction accuracy, the proposed approach also significantly enhanced the accuracy of individual water balance components. In conclusion, the proposed hybrid approach broadens the application of interpretive ML models in hydrological studies and opens the possibility of broader regional applications.

5. Conclusions

This study presents an interpretable hybrid framework that integrates an improved traditional hydrological model (SEGSWAT+), some interpretable ML models and its interpretation method based on water balance components (SHAP) to achieve more accurate monthly and interannual-scale glacier basin runoff simulations and water balance results. In addition, this study provides insights into the differences in the interpretability of different water balance components across seasons. To demonstrate the improved effectiveness of the proposed hybrid model, it was applied to two consecutive periods in the Upper Ili River sub-basin, and the comparison process included ANN, LSTM, RF, and XGBoost models. Based on the analyses, the following main conclusions were drawn:

(1) Hybrid modeling constructed based on deep models or integrated learning models outperforms single modeling approaches, and the interpretable hybrid model of XGBoost exhibits the most robust and accurate runoff residual prediction capability in the study area compared to other models.

(2) The hybrid framework dramatically improves the accuracy of water balance results derived from conventional hydrological models of glacial basins. Providing localizable interpretability demonstrates the great potential of interpretable machine learning methods in improving water balance assessment.

Author Contributions

Conceptualization, R.Y. and J.W.; methodology, R.Y., J.W. and G.G.; software, R.Y.; formal analysis, R.Y.; data curation, R.G. and H.Z.; writing—original draft preparation, R.Y.; writing—review and editing, J.W.; supervision, J.W.; funding acquisition, J.W. and G.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Third Xinjiang Scientific Expedition and Research Program (Grant No. 2022xjkk070201) and the National Natural Science Foundation of China (U2003202, 42071054).

Data Availability Statement

Meteorological datasets were provided by the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) (http://cds.climate.copernicus.eu, accessed on 20 April 2024). Glaciological and hydrological datasets were provided by the National Cryosphere Desert Data Centre (http://www.ncdc.ac.cn, accessed on 20 April 2024). Other model codes and data supporting the results of this study are available upon request from the corresponding author.

Acknowledgments

The authors express their gratitude to everyone that provided assistance in the present study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Similarities and differences between the SWAT+ hydrologic model and its derived versions.

	SWAT+	GSWAT+	SEGWAT+
Glacier dynamics simulation	No	Yes	Yes
Additional data requirements	No	Yes (glacier inventory)	Yes (glacier inventory)
Glacier/Snow sublimation Calculator	Snow: Evaporation residual subtraction Glacier: No	Snow: Evaporation residual subtraction Glacier: Additional parameters	Snow/Glacier: Empirical formula
Additional water balance elements	No	Glacial melt	Glacial melt

Table A2. ML models and their hyperparameters.

	ANN	LSTM	RF	XGBoost
Type	Machine learning	Deep learning	Ensemble learning	Ensemble learning
Modeling method	Scikit-learn	Pytorch	Scikit-learn	XGBoost
Parameter optimization	Bayesian optimization	Bayesian optimization	Bayesian optimization	Bayesian optimization
Hyperparameters	learning_rate: 0.01 num_hidden_layers: 2 num_units: 128 dropout: 0.2 batch_size: 64	learning_rate: 0.01 hidden_units: 2 time_steps: 12 dropout: 0.2 optimizer: Adam batch_size: 64	n_estimators: 20, max_depth: 3 max_features: 7	learning_rate: 0.01 n_estimators: 20 max_depth: 3
Number of parameters	1.8 × 10⁵	5.2 × 10⁵	Tree-based: 4 × 10²	Tree-based: 4 × 10²

References

Yang, H.; Zhang, Z.; Liu, X.; Jing, P. Monthly-Scale Hydro-Climatic Forecasting and Climate Change Impact Evaluation Based on a Novel DCNN-Transformer Network. Environ. Res. 2023, 236, 116821. [Google Scholar] [CrossRef] [PubMed]
Gupta, A.; Govindaraju, R.S. Uncertainty Quantification in Watershed Hydrology: Which Method to Use? J. Hydrol. 2023, 616, 128749. [Google Scholar] [CrossRef]
Chen, Y.; Li, W.; Fang, G.; Li, Z. Review Article: Hydrological Modeling in Glacierized Catchments of Central Asia—Status and Challenges. Hydrol. Earth Syst. Sci. 2017, 21, 669–684. [Google Scholar] [CrossRef]
Uncertainties in Prediction of Streamflows Using SWAT Model—Role of Remote Sensing and Precipitation Sources. Available online: https://www.mdpi.com/2072-4292/14/21/5385 (accessed on 18 November 2024).
Beniston, M. Climatic Change in Mountain Regions: A Review of Possible Impacts. Clim. Chang. 2003, 59, 5–31. [Google Scholar] [CrossRef]
Ren, W.W.; Yang, T.; Huang, C.S.; Xu, C.Y.; Shao, Q.X. Improving Monthly Streamflow Prediction in Alpine Regions: Integrating HBV Model with Bayesian Neural Network. Stoch. Environ. Res. Risk Assess. 2018, 32, 3381–3396. [Google Scholar] [CrossRef]
Uusitalo, L.; Lehikoinen, A.; Helle, I.; Myrberg, K. An Overview of Methods to Evaluate Uncertainty of Deterministic Models in Decision Support. Environ. Modell. Softw. 2015, 63, 24–31. [Google Scholar] [CrossRef]
Adnan, M.; Kang, S.; Zhang, G.; Anjum, M.N.; Zaman, M.; Zhang, Y. Evaluation of SWAT Model Performance on Glaciated and Non-Glaciated Subbasins of Nam Co Lake, Southern Tibetan Plateau, China. J. Mt. Sci. 2019, 16, 1075–1097. [Google Scholar] [CrossRef]
Luo, Y.; Arnold, J.; Liu, S.; Wang, X.; Chen, X. Inclusion of Glacier Processes for Distributed Hydrological Modeling at Basin Scale with Application to a Watershed in Tianshan Mountains, Northwest China. J. Hydrol. 2013, 477, 72–85. [Google Scholar] [CrossRef]
Zhang, L.; Su, F.; Yang, D.; Hao, Z.; Tong, K. Discharge Regime and Simulation for the Upstream of Major Rivers over Tibetan Plateau. J. Geophys. Res. Atmos. 2013, 118, 8500–8518. [Google Scholar] [CrossRef]
Chan, H.-C.; Chen, P.-A.; Lee, J.-T. Rainfall-Induced Landslide Susceptibility Using a Rainfall–Runoff Model and Logistic Regression. Water 2018, 10, 1354. [Google Scholar] [CrossRef]
Bray, M.; Han, D. Identification of Support Vector Machines for Runoff Modelling. J. Hydroinf. 2004, 6, 265–280. [Google Scholar] [CrossRef]
Srinivasulu, S.; Jain, A. A Comparative Analysis of Training Methods for Artificial Neural Network Rainfall–Runoff Models. Appl. Soft Comput. 2006, 6, 295–306. [Google Scholar] [CrossRef]
Behrouz, M.S.; Yazdi, M.N.; Sample, D.J. Using Random Forest, a Machine Learning Approach to Predict Nitrogen, Phosphorus, and Sediment Event Mean Concentrations in Urban Runoff. J. Environ. Manag. 2022, 317, 115412. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Peng, H. Multiple Spatio-Temporal Scale Runoff Forecasting and Driving Mechanism Exploration by K-Means Optimized XGBoost and SHAP. J. Hydrol. 2024, 630, 130650. [Google Scholar] [CrossRef]
Rahimzad, M.; Moghaddam Nia, A.; Zolfonoon, H.; Soltani, J.; Danandeh Mehr, A.; Kwon, H.-H. Performance Comparison of an LSTM-Based Deep Learning Model versus Conventional Machine Learning Algorithms for Streamflow Forecasting. Water Resour. Manag. 2021, 35, 4167–4187. [Google Scholar] [CrossRef]
Yang, C.; Xu, M.; Kang, S.; Fu, C.; Hu, D. Improvement of Streamflow Simulation by Combining Physically Hydrological Model with Deep Learning Methods in Data-Scarce Glacial River Basin. J. Hydrol. 2023, 625, 129990. [Google Scholar] [CrossRef]
Yang, R.; Zheng, G.; Hu, P.; Liu, Y.; Xu, W.; Bao, A. Snowmelt Flood Susceptibility Assessment in Kunlun Mountains Based on the Swin Transformer Deep Learning Method. Remote Sens. 2022, 14, 6360. [Google Scholar] [CrossRef]
Lu, X.; Tang, G.; Wang, X.; Liu, Y.; Jia, L.; Xie, G.; Li, S.; Zhang, Y. Correcting GPM IMERG Precipitation Data over the Tianshan Mountains in China. J. Hydrol. 2019, 575, 1239–1252. [Google Scholar] [CrossRef]
Xu, T.; Longyang, Q.; Tyson, C.; Zeng, R.; Neilson, B.T. Hybrid Physically Based and Deep Learning Modeling of a Snow Dominated, Mountainous, Karst Watershed. Water Resour. Res. 2022, 58, e2021WR030993. [Google Scholar] [CrossRef]
Wang, S.; Peng, H.; Hu, Q.; Jiang, M. Analysis of Runoff Generation Driving Factors Based on Hydrological Model and Interpretable Machine Learning Method. J. Hydrol. Reg. Stud. 2022, 42, 101139. [Google Scholar] [CrossRef]
Li, B.; Sun, T.; Tian, F.; Tudaji, M.; Qin, L.; Ni, G. Hybrid Hydrological Modeling for Large Alpine Basins: A Semi-Distributed Approach. Hydrol. Earth Syst. Sci. 2024, 28, 4521–4538. [Google Scholar] [CrossRef]
Jin, C.; Wang, B.; Cheng, T.F.; Dai, L.; Wang, T. How Much We Know about Precipitation Climatology over Tianshan Mountains––the Central Asian Water Tower. NPJ Clim. Atmos. Sci. 2024, 7, 1–10. [Google Scholar] [CrossRef]
Arnold, J.G.; Srinivasan, R.; Muttiah, R.S.; Williams, J.R. Large Area Hydrologic Modeling and Assessment Part I: Model Development1. JAWRA J. Am. Water Resour. Assoc. 1998, 34, 73–89. [Google Scholar] [CrossRef]
Bieger, K.; Arnold, J.G.; Rathjens, H.; White, M.J.; Bosch, D.D.; Allen, P.M.; Volk, M.; Srinivasan, R. Introduction to SWAT+, a Completely Restructured Version of the Soil and Water Assessment Tool. JAWRA J. Am. Water Resour. Assoc. 2017, 53, 115–130. [Google Scholar] [CrossRef]
Yang, C.; Xu, M.; Fu, C.; Kang, S.; Luo, Y. The Coupling of Glacier Melt Module in SWAT+ Model Based on Multi-Source Remote Sensing Data: A Case Study in the Upper Yarkant River Basin. Remote Sens. 2022, 14, 6080. [Google Scholar] [CrossRef]
Du, X.; Silwal, G.; Faramarzi, M. Investigating the Impacts of Glacier Melt on Stream Temperature in a Cold-Region Watershed: Coupling a Glacier Melt Model with a Hydrological Model. J. Hydrol. 2022, 605, 127303. [Google Scholar] [CrossRef]
Zhang, Y.; Ohata, T.; Ersi, K.; Tandong, Y. Observation and Estimation of Evaporation from the Ground Surface of the Cryosphere in Eastern Asia. Hydrol. Process. 2003, 17, 1135–1147. [Google Scholar] [CrossRef]
Zhou, J.; Wang, L.; Zhang, Y.; Guo, Y.; Li, X.; Liu, W. Exploring the Water Storage Changes in the Largest Lake (Selin Co) over the Tibetan Plateau during 2003–2012 from a Basin-Wide Hydrological Modeling. Water Resour. Res. 2015, 51, 8060–8086. [Google Scholar] [CrossRef]
Wang, W.; Gu, M.; Hong, Y.; Hu, X.; Zang, H.; Chen, X.; Jin, Y. SMGformer: Integrating STL and Multi-Head Self-Attention in Deep Learning Model for Multi-Step Runoff Forecasting. Sci. Rep. 2024, 14, 23550. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Chadalawada, J.; Babovic, V. Review and Comparison of Performance Indices for Automatic Model Induction. J. Hydroinf. 2017, 21, 13–31. [Google Scholar] [CrossRef]
Shrestha, M.K.; Recknagel, F.; Frizenschaf, J.; Meyer, W. Assessing SWAT Models Based on Single and Multi-Site Calibration for the Simulation of Flow and Nutrient Loads in the Semi-Arid Onkaparinga Catchment in South Australia. Agric. Water Manag. 2016, 175, 61–71. [Google Scholar] [CrossRef]
Wang, N.; Liu, W.; Wang, H.; Sun, F.; Duan, W.; Li, Z.; Li, Z.; Chen, Y. Improving Streamflow and Flood Simulations in Three Headwater Catchments of the Tarim River Based on a Coupled Glacier-Hydrological Model. J. Hydrol. 2021, 603, 127048. [Google Scholar] [CrossRef]
Xu, Y.; Chen, Y.; Li, W.; Fu, A.; Ma, X.; Gui, D.; Chen, Y. Distribution Pattern of Plant Species Diversity in the Mountainous Region of Ili River Valley, Xinjiang. Environ. Monit. Assess. 2011, 177, 681–694. [Google Scholar] [CrossRef] [PubMed]
Munoz-Sabater, J.; Dutra, E.; Agusti-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A State-of-the-Art Global Reanalysis Dataset for Land Applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
He, J.; Yang, K.; Tang, W.; Lu, H.; Qin, J.; Chen, Y.; Li, X. The First High-Resolution Meteorological Forcing Dataset for Land Process Studies over China. Sci. Data 2020, 7, 25. [Google Scholar] [CrossRef]
Elnashar, A.; Wang, L.; Wu, B.; Zhu, W.; Zeng, H. Synthesis of Global Actual Evapotranspiration from 1982 to 2019. Earth Syst. Sci. Data 2021, 13, 447–480. [Google Scholar] [CrossRef]
Dorigo, W.; Wagner, W.; Albergel, C.; Albrecht, F.; Balsamo, G.; Brocca, L.; Chung, D.; Ertl, M.; Forkel, M.; Gruber, A.; et al. ESA CCI Soil Moisture for Improved Earth System Understanding: State-of-the Art and Future Directions. Remote Sens. Environ. 2017, 203, 185–215. [Google Scholar] [CrossRef]
Franks, S.; Rengarajan, R. Evaluation of Copernicus DEM and Comparison to the DEM Used for Landsat Collection-2 Processing. Remote Sens. 2023, 15, 2509. [Google Scholar] [CrossRef]
Lopez-Ballesteros, A.; Nielsen, A.; Castellanos-Osorio, G.; Trolle, D.; Senent-Aparicio, J. DSOLMap, a Novel High-Resolution Global Digital Soil Property Map for the SWAT plus Model: Development and Hydrological Evaluation. Catena 2023, 231, 107339. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global Land-Cover Product with Fine Classification System at 30m Using Time-Series Landsat Imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
Nguyen, T.V.; Dietrich, J.; Dang, T.D.; Tran, D.A.; Van Doan, B.; Sarrazin, F.J.; Abbaspour, K.; Srinivasan, R. An Interactive Graphical Interface Tool for Parameter Calibration, Sensitivity Analysis, Uncertainty Analysis, and Visualization for the Soil and Water Assessment Tool. Environ. Model. Softw. 2022, 156, 105497. [Google Scholar] [CrossRef]
Lyu, Y.; Yong, B. A Novel Double Machine Learning Strategy for Producing High-Precision Multi-Source Merging Precipitation Estimates over the Tibetan Plateau. Water Resour. Res. 2024, 60, e2023WR035643. [Google Scholar] [CrossRef]
Guo, S.; Wen, Y.; Zhang, X.; Chen, H. Monthly Runoff Prediction Using the VMD-LSTM-Transformer Hybrid Model: A Case Study of the Miyun Reservoir in Beijing. J. Water Clim. Chang. 2023, 14, 3221–3236. [Google Scholar] [CrossRef]
Gan, G.; Wu, J.; Hori, M.; Fan, X.; Liu, Y. Attribution of Decadal Runoff Changes by Considering Remotely Sensed Snow/Ice Melt and Actual Evapotranspiration in Two Contrasting Watersheds in the Tienshan Mountains. J. Hydrol. 2022, 610, 127810. [Google Scholar] [CrossRef]
Wang, H.; Li, Y.; Huang, G.; Ma, Y.; Zhang, Q.; Li, Y. Analyzing Variation of Water Inflow to Inland Lakes under Climate Change: Integrating Deep Learning and Time Series Data Mining. Environ. Res. 2024, 259, 119478. [Google Scholar] [CrossRef]
Cui, T.; Li, Y.; Yang, L.; Nan, Y.; Li, K.; Tudaji, M.; Hu, H.; Long, D.; Shahid, M.; Mubeen, A.; et al. Non-Monotonic Changes in Asian Water Towers’ Streamflow at Increasing Warming Levels. Nat. Commun. 2023, 14, 1176. [Google Scholar] [CrossRef]
Melaku, N.D.; Wang, J. A Modified SWAT Module for Estimating Groundwater Table at Lethbridge and Barons, Alberta, Canada. J. Hydrol. 2019, 575, 420–431. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Arel-Bundock, V.; Greifer, N.; Heiss, A. How to Interpret Statistical Models Using Marginaleffects for R and Python. J. Stat. Softw. 2024, 111, 1–32. [Google Scholar] [CrossRef]

Figure 1. Hybrid modeling framework.

Figure 2. Geographic location of the study area.

Figure 3. Distribution of absolute values of relative errors between hydrologic model simulated evapotranspiration (ET) and remotely sensed products.

Figure 4. Comparison of calibration (Cal) and validation (Val) performance between ML models. The vertical axis represents the R² and NRMSE values, both ranging from 0 to 1. Higher R² values indicate better model fit, while lower NRMSE values denote greater predictive accuracy.

Figure 5. Modeling results and performance assessment of total outlet runoff from a watershed. (I) Runoff sequence fitting results of multiple hybrid models in two consecutive periods, with black hollow circles as observations. (II) Performance evaluation in two consecutive periods, with (a,c) as calibration periods and (b,d) as validation periods. (III) Comparison of the simulation performance of different combinations of methods in four periods, where the pink line represents the NRMSE and the gray solid line indicates the correlation coefficient (r) between simulations and observations. The horizontal and vertical axes represent the ratio of the standard deviation of the observed values to the corresponding simulated values. Specifically, REF on the axes indicates that the standard deviation is zero; the simulated results are in perfect agreement with the observations.

Figure 6. Summary of SHAP value distribution. The red color represents the high value of the corresponding eigenvalue and the blue color represents the low value. The SHAP value greater than 0 means a positive contribution to the residual prediction, and a SHAP value less than 0 represents a negative contribution. The upper dashed line is for water-generating elements and the lower line is for water-dissipating elements.

Figure 7. Importance ranking of water balance elements among different models, expressed using the average of the absolute values of SHAP.

Figure 8. Local interpretation analysis from residual predictions from low to high force plots, with red representing pushing predictions higher and blue representing pushing predictions lower. The length of the bars represents the degree of influence contribution. f(x) is the residual prediction output from the model. (a) median of the first residual quartile; (b) median of the second residual quartile; (c) median of the third residual quartile; (d) median of the fourth residual quartile.

Figure 9. Relative soil change (ΔSW) and evapotranspiration (ET) errors before and after calibration. Lower values indicate smaller errors.

Figure 10. Box plots of standardized SHAP values for different seasons for water balance elements. The blue line represents the previous period of this study, and the red line represents the subsequent period (Spr, Sum, Aut, and Win represent spring, summer, fall, and winter seasons, respectively).

Table 1. Study area sites and observation data coverage period.

Hydrological Station	Previous Period	Subsequent Period
Yamadu	1961–1979	2000–2008
Tuohai	1961–1979	-
Wulasitai	1961–1979	2000–2005
Qiafuqihai	1961–1979	2000–2008
Jiefangdaqiao	-	2006–2008

Table 2. Hydrologic model runoff indicator assessment (multi-site mean).

Indicators	SEGSWAT+				GSWAT+
	Previous Period		Subsequent Period		Previous Period		Subsequent Period
	Calibration	Validation	Calibration	Validation	Calibration	Validation	Calibration	Validation
r	0.87	0.80	0.88	0.78	0.83	0.7	0.90	0.61
NSE	0.79	0.77	0.88	0.83	0.87	0.71	0.84	0.77
KGE	0.77	0.70	0.79	0.67	0.81	0.67	0.77	0.71
NRMSE	0.21	0.24	0.17	0.28	0.19	0.27	0.20	0.31
APBIAS	10.28	10.57	10.10	11.65	11.32	13.57	12.90	10.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, R.; Wu, J.; Gan, G.; Guo, R.; Zhang, H. Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins. Water 2024, 16, 3699. https://doi.org/10.3390/w16243699

AMA Style

Yang R, Wu J, Gan G, Guo R, Zhang H. Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins. Water. 2024; 16(24):3699. https://doi.org/10.3390/w16243699

Chicago/Turabian Style

Yang, Ruibiao, Jinglu Wu, Guojing Gan, Ru Guo, and Hongliang Zhang. 2024. "Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins" Water 16, no. 24: 3699. https://doi.org/10.3390/w16243699

APA Style

Yang, R., Wu, J., Gan, G., Guo, R., & Zhang, H. (2024). Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins. Water, 16(24), 3699. https://doi.org/10.3390/w16243699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining Physical Hydrological Model with Explainable Machine Learning Methods to Enhance Water Balance Assessment in Glacial River Basins

Abstract

1. Introduction

2. Materials and Methods

2.1. SEGSWAT+

2.2. Machine Learning Models

2.3. STL and SHAP

2.4. Hybrid Modeling Framework

2.5. Study Area

2.6. Data and Preprocessing

3. Results

3.1. Model Evaluation

3.2. Comparison of Runoff Prediction

3.3. Water Balance Correction

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI