1. Introduction
Streamflow prediction is a critical component of water resource management, particularly in watersheds and basins where snowmelt significantly contributes to river discharge [
1]. The Upper Indus Basin (UIB) of Pakistan, a region dominated by snow and glacier melt, is a critical water resource for agriculture and hydropower generation in the country [
2]. Understanding and predicting streamflow in this region is crucial for effective water resource management, especially in the context of climate change, which significantly impacts snow cover and glacial melt [
3]. The Moderate Resolution Imaging Spectroradiometer (MODIS), one of the most recent developments in remote sensing technologies, offers useful information on snow-covered areas (SCAs), facilitating improved monitoring and analysis of these crucial factors [
3,
4] (Tayyab et al., 2018; Bilal et al., 2019). Globally, MODIS snow cover products have been utilized to monitor snow dynamics in regions like the Tibetan Plateau, where they play a critical role in understanding water availability for millions of people downstream [
5]. Additionally, research utilizing MODIS data has provided insights into the impacts of climate variability on snow cover persistence and melt timing in Canada based on surface snow depth observations [
6]. Some other applications of MODIS include assessing snow cover in mountainous regions such as Turkey [
7], Austria [
8], the Colorado Rocky Mountains, the Upper Rio Grande, California’s Sierra Nevada, the Nepal Himalaya [
9], China [
10], and the Moroccan Atlas Mountains [
11]. Considering the combination of meteorological, hydrometrical, and remotely sensed data leads to building better and more adequate hydrological models.
Traditionally, hydrological models have relied heavily on statistical [
12], empirical [
13], Stochastic [
14], and physically based [
15] models to predict streamflow. However, these models often struggle to capture the complex and non-linear relationships between various hydrological variables [
16]. Recent advances in machine learning (ML) have significantly improved the accuracy and reliability of hydrological models. Preliminary results indicate that ML models consistently outperform traditional approaches in terms of NSE and RMSE, demonstrating their superior ability to capture non-linear relationships within hydrological data. For instance, studies have shown that hybrid models incorporating deep learning architectures yield significantly higher NSE values compared to conventional models when applied to similar datasets [
17,
18,
19].
In this respect, AI-based data-driven techniques and ML models have been successfully employed in modeling and simulating sophisticated hydrological events, such as streamflow prediction [
17,
18]. Among the developed ML models in hydrological applications, artificial neural networks (ANNs) play an important and dominant role. The earlier employment of ANNs in stream flow and river flow prediction, like shallow neural networks, demonstrates their superior capability in capturing the non-linear nature of surface flow rate compared to conventional conceptual and empirical models due to the availability of authentic datasets (e.g., hydrometric and meteorological data) [
19].
Some researchers have applied shallow learning ANNs and tree-based models for modeling and predicating streamflow in the UIB and Himalayan basins. Rahman et al. [
20] compared the capability of two different types of hydrological models, such as the empirical soil and water assessment tool (SWAT) and multi-layer perceptron ANN, to simulate streamflow in the UIB. The authors claimed the superiority of the applied ANN model over the SWAT model. In another similar study, Raaj et al. [
21] integrated the SWAT model with some ML models (e.g., ANN and XGBoost) for the estimation of peak flow in the Himalayan River basin. The general outcome of the study highlighted the successful integration of SWAT and ML models in achieving promising results for peak flow prediction. Mushtaq et al. [
22] utilized three ML models, including CART (classification and regression tree), XGBoost (extreme gradient boosting), and RF (random forest), for 10-daily streamflow prediction in the UIB region. The models were built based on climate data (precipitation, snow water equivalent, temperature, and evapotranspiration). The findings of the study demonstrated the significant ability of ML models to predict streamflow.
Over the course of the last decade, with the advent of deep learning, advanced ANN models such as long short-term memory (LSTM) [
23], bidirectional LSTM (BiLSTM) [
24], gated recurrent unit (GRU) [
25,
26] and convolutional neural networks (CNNs) [
27] have shown great promise in hydrological time series forecasting due to their ability to learn and model long-term dependencies. Deep learning can be introduced as a subset of ML modeling, which utilizes multi-layered neural networks to automatically extract features from large datasets. By this, in specific cases, deep learning enables ML models to produce more accurate predictions without extensive feature engineering. This approach is particularly advantageous in hydrology, where complex non-linear relationships exist between various hydrological variables. Several studies have highlighted the effectiveness of deep learning models in capturing these relationships and improving prediction accuracy compared to traditional models. For instance, Imran et al. [
28] applied stochastic models (e.g., SARIMA) as well as LSTM models for flood forecasting in the UIB region. It was found that the LSTM model acted better and provided more accurate forecasting performance than the stochastic model.
Studies have demonstrated the efficacy of hybrid models that combine different types of deep learning strategies (i.e., CNN-LSTM, CNN-GRU, CNN-BiLSTM, and CNN-BiGRU) to enhance predictive performance [
17,
18,
29,
30]. According to Thapa et al. [
1], the integration of CNN with LSTM (CNN-LSTM) has been effective in extracting features from high-resolution hydro-meteorological data for streamflow simulation in mountainous catchments, significantly improving model performance when combined with the Gamma test method to select optimal input variables. Wang et al. [
31] developed a hybrid CNN-LSTM model to extract physical and meteorological characteristics from high-resolution data, significantly improving streamflow simulation accuracy in mountainous catchments. Similarly, a hybrid CNN-LSTM model introduced by Zhou et al. [
32], which integrates self-attention mechanisms with CNN and LSTM, has shown robust performance in hourly runoff prediction by effectively considering temporal and feature dependencies. The use of bidirectional LSTM (BiLSTM) and bidirectional GRU (BiGRU) models, which process data in both forward and backward directions, could further improve the capture of complex temporal dependencies in streamflow data [
33].
The aforementioned research suggests that current developments in machine learning and deep learning have transformed methods for predicting streamflow, moving beyond traditional statistical approaches that often fail to capture the complexities of hydrological processes. Recent studies have shown that hybrid models combining various ML techniques and data extracted from MODIS can significantly enhance the predictive performance of each method [
34,
35,
36,
37].
According to the studied literature, the integration of snow cover data from MODIS, along with advanced DL models like LSTM, GRU, and CNN, offers a robust framework for accurate and reliable streamflow prediction, essential for effective water resource planning and flood control. In this study, we aim to leverage these advanced deep learning models for monthly streamflow prediction in the Upper Indus Basin located in Pakistan. In the recent literature, CNN-BiGRU and CNN-BiLSTM models have been successfully applied to predict various hydrological variables [
38,
39,
40,
41]. However, these models have not been extensively evaluated or compared for streamflow prediction. Addressing this research gap motivated us to select CNN-BiLSTM and CNN-BiGRU as the primary deep learning models in this study for streamflow prediction. Lagged streamflow values and snow cover area (SCA) data were chosen as inputs in this study. The autocorrelation function (ACF) feature selection technique was employed to identify key input variables from these datasets. These inputs were selected based on their proven effectiveness in previous studies on streamflow simulation [
42,
43,
44,
45]. The ACF technique was adopted due to its successful application in identifying influential variables for hydrological modeling [
46,
47]. By integrating lagged streamflow values identified using autocorrelation analysis with SCA data derived from MODIS, we propose a novel approach to enhance prediction accuracy. Specifically, the MODIS MOD10CM remote sensing product was utilized to extract SCA. This product was selected based on its established effectiveness in deriving SCA for high-altitude regions, as reported in the literature [
48,
49,
50]. To evaluate the performance of the proposed models, statistical metrics such as root mean square error (RMSE), mean absolute error (MAE), the coefficient of determination (R
2), and Nash–Sutcliffe efficiency (NSE) were used. The selection of these metrics is justified by their frequent and successful application in hydrological modeling studies [
51,
52]. The specific objectives of this research are as follows:
To develop and compare the performance of various deep learning models (LSTM, GRU, CNN, BiLSTM, BiGRU, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU) for streamflow prediction in the UIB.
To assess the impact of including SCA data from MODIS on the prediction accuracy of these models.
To identify the model that best captures the non-linear relationships between past streamflow values based on the autocorrelation function (ACF) and SCA data, providing the most accurate monthly streamflow predictions.
The novelty of this research lies in the integration of MODIS-derived SCA data with hybrid deep learning models for streamflow prediction, which has not been extensively explored in existing literature. Previous studies have focused primarily on traditional hydrological models or have used individual deep learning models incorporating remote sensing data [
34,
35]. By combining these advanced methodologies, our research provides a more robust and accurate prediction framework, which can significantly improve water resource management. To our knowledge, no previous research has looked at the combined impact of catchment features and climate, specifically snow conditions, on catchment storage and low flows in the UIB region. In addition, the contributions of this study are threefold: (i) a comprehensive evaluation of various deep learning architectures for streamflow prediction in a snow-dominated basin; (ii) a novel way to incorporate SCA data from MODIS into hybrid deep learning models (e.g., CNN-BiLSTM and CNN-BiGRU) to enhance their prediction capabilities; and (iii) insights into the effectiveness of different model architectures in capturing the complex dynamics of streamflow in the UIB, contributing to the broader field of hydrological modeling and management.
5. Results
To analyze the performance of LSTM models in streamflow prediction, we compare the results from both the training and testing stages in
Table 1. In this table, Qt-1, SCA, and MN refer to the streamflow from the previous month, snow cover area, and month number, respectively. In the training stage, the RMSE values decrease as more input features are added, from 221.3 with just Qt-1 to 104.1 when all features (Qt-1, Qt-11, Qt-12, SCA, MN) are included. Similarly, RMSE values decrease from 234.6 with Qt-1 to 132.5 with all features. The RMSE values are consistently higher in the testing stage than in the training stage, indicating that the model performs better on the training data. However, the trend of improvement with additional inputs is consistent in both stages, suggesting that the model generalizes well. MAE decreases from 164.4 with Qt-1 to 57.4 with all inputs in the training stage, while the MAE of the testing stage decreases from 167.5 with Qt-1 to 74.5 with all inputs. Like RMSE, the MAE values are higher during testing, but the reduction in error with more input features remains consistent across both stages. R
2 in the training stage increases from 0.514 with Qt-1 to 0.896 with all inputs, indicating a better fit as more features are added. The R
2 of the testing stage increases from 0.476 with Qt-1 to 0.841 with all inputs. The R
2 values are slightly lower in the testing stage, reflecting a decrease in model performance on unseen data. However, the overall pattern of improvement with additional inputs is mirrored in both training and testing, showing that the model’s ability to explain variance improves with more features. In the training stage, NSE increases from 0.511 with Qt-1 to 0.891 with all inputs, indicating better predictive power. The NSE of the testing stage increases from 0.465 with Qt-1 to 0.838 with all inputs. The NSE values are also lower during testing, similar to R
2, but the upward trend with additional inputs remains. This suggests that while the model is not overfitting, there is a slight drop in performance on testing data. The models consistently improve performance metrics (lower RMSE and MAE, higher R
2 and NSE) as more input features are included in the training and testing phases. This indicates that the additional input variables (SCA, MN) provide valuable information for predicting streamflow. There is a noticeable difference between training and testing results, with training performance being better. This is typical in machine learning, where the model performs slightly better on the data it was trained on. However, the consistency in trends across both datasets suggests that the model generalizes well to unseen data although there is room for improvement.
Table 2 compares the performance of bidirectional long short-term memory (BILSTM) models during both the training and testing stages. The models consistently improve predictive accuracy as more features are added. Training RMSE drops from 217.8 with only Qt-1 to 102.8, while testing RMSE decreases from 225.9 to 125.6. MAE and NSE metrics follow a similar pattern, with MAE decreasing from 153.2 to 54.07 in training and from 161.5 to 70.62 in testing, and NSE increasing from 0.526 to 0.901 in training and from 0.503 to 0.853 in testing. R
2 improves from 0.529 to 0.908 in training and from 0.509 to 0.859 in testing. Including SCA and MN consistently enhances model performance, indicating these features help capture seasonal variations and improve prediction accuracy, even with slightly better results on training data. The performance of CNN-LSTM models during both the training and testing stages of streamflow prediction is compared in
Table 3. RMSE decreases from 214.6 to 91.4 in training and from 219.6 to 119.5 in testing as more features are included. MAE follows this trend, dropping from 151.3 to 51.82 in training and from 157.8 to 68.15 in testing. R
2 and NSE also show improvements, with R
2 increasing from 0.549 to 0.926 in training, from 0.526 to 0.87 in testing, and NSE improving from 0.544 to 0.918 in training and from 0.514 to 0.862 in testing. The consistent improvement across all metrics with the inclusion of SCA and MN demonstrates the model’s effectiveness in capturing both seasonal and temporal data despite slightly higher testing errors than training.
Table 4 summarizes the performance of CNN-BiLSTM models during the training and testing stages. The models show significant improvement in accuracy with more features. Training RMSE decreases from 209.7 to 75.8, while testing RMSE drops from 215.7 to 101.6. MAE and NSE improvements are also noted, with training MAE reducing from 147.3 to 42.3 and testing MAE from 156.2 to 54.9. R
2 increases from 0.565 to 0.952 in training and from 0.543 to 0.905 in testing. Including SCA and MN results in consistent performance improvements across all metrics, suggesting these features effectively capture temporal and seasonal variations, leading to better predictions.
Table 5 illustrates the outcomes of GRU (gated recurrent unit) models during both the training and testing stages. The models benefit from additional input features, with training RMSE decreasing from 218.3 to 102.7 and testing RMSE from 226.4 to 127.7. MAE shows similar reductions, from 154.42 to 55.03 in training and 162.3 to 74.6 in testing. R
2 and NSE improvements are also observed, with R
2 increasing from 0.527 to 0.904 in training and from 0.507 to 0.850 in testing, and NSE from 0.524 to 0.897 in training, from 0.501 to 0.844 in testing. The consistent reduction in errors with the inclusion of SCA and MN highlights their importance in enhancing model performance by capturing seasonal variations.
The performance of BiGRU (bidirectional gated recurrent unit) models during both the training and testing stages is reported in
Table 6. The models show improved accuracy with added features, with training RMSE decreasing from 215.5 to 97.62 and testing RMSE from 221.3 to 122.8. MAE follows the same trend, decreasing from 152.6 to 52.45 in training and from 159.2 to 69.6 in testing. R
2 and NSE values increase, indicating better model fit, with R
2 improving from 0.545 to 0.917 in training and from 0.516 to 0.862 in testing, and NSE from 0.542 to 0.912 in training and from 0.509 to 0.856 in testing. Including SCA and MN enhances model performance across all metrics, reflecting the value of incorporating both temporal and seasonal data for accurate streamflow predictions.
Table 7 reports the performance of CNN-GRU (convolutional neural network–gated recurrent unit) models in streamflow prediction during the training and testing stages. The models demonstrate significant accuracy gains with additional features. Training RMSE decreases from 212.7 to 80.24, while testing RMSE reduces from 217.3 to 111.2. MAE decreases from 150.8 to 47.03 in training and 157.4 to 66.32 in testing. R
2 and NSE also improve, with R
2 increasing from 0.553 to 0.942 in training, from 0.532 to 0.884 in testing, from 0.551 to 0.936 in training, and from 0.529 to 0.881 in testing. The consistent improvement across all metrics, including SCA and MN, highlights the model’s ability to generalize well and accurately predict streamflow variations.
Table 8 performs CNN-BiGRU (convolutional neural network–bidirectional gated recurrent unit) models in streamflow prediction during the training and testing stages. The models show substantial improvements in performance with more features. Training RMSE decreases from 207.8 to 71.6, and testing RMSE drops from 213.6 to 95.7. MAE, R
2, and NSE metrics also show consistent improvement, with MAE decreasing from 144.7 to 39.62 in training and from 155.2 to 50.7 in testing, R
2 improving from 0.578 to 0.962 in training and from 0.558 to 0.929 in testing, and NSE increasing from 0.574 to 0.957 in training and from 0.553 to 0.921 in testing. Including SCA and MN significantly enhances model performance across all metrics, demonstrating their importance in capturing the temporal and seasonal dynamics of streamflow.
All models show significant RMSE reductions as more input features are added in the training stage (
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8). CNN-based models, particularly CNN-BiGRU, achieve the lowest RMSE, indicating superior training data fitting accuracy. The CNN-BiGRU also reduces MAE during training, followed closely by CNN-LSTM and CNN-GRU, showing these models’ precision in aligning predictions with observed values. The CNN-BiGRU and CNN-GRU achieve the highest R
2 and NSE values during training, indicating robust model fits and reliable predictions. The CNN-BiGRU maintains the lowest RMSE and MAE in testing, demonstrating excellent generalization to unseen data. CNN-GRU also performs well, showing consistency in both training and testing. These models (CNN-BiGRU and CNN-GRU) also achieve high R
2 and NSE in testing, suggesting they effectively capture the variance and dynamics of streamflow, even with unseen data. The LSTM and BiLSTM models show solid improvements with added features, particularly in training, but generally perform slightly lower than CNN-integrated models in testing. They offer good baseline performance with the advantage of simpler architectures compared to CNN-based models. GRU models perform comparably to LSTM models, with BiGRU showing slightly better results due to its bidirectional nature. These models effectively handle temporal sequences, making them reliable for streamflow prediction but slightly behind CNN-based models in testing performance. These models benefit from combining CNN’s spatial feature extraction and LSTM/GRU’s temporal handling. They consistently perform well in both stages, with CNN-GRU slightly outperforming CNN-LSTM in testing, indicating robust generalization and strong predictive power. The CNN-BiGRU model stands out across all metrics in both stages. It combines the strengths of CNN for spatial pattern recognition and BiGRU for capturing temporal dynamics from both directions, leading to the best overall performance in accuracy, precision, and generalization. The CNN-BiGRU and CNN-GRU are the top performers across both training and testing stages, excelling in RMSE, MAE, R
2, and NSE metrics. These models effectively balance spatial and temporal data processing, achieving high predictive accuracy and reliability. For applications requiring high accuracy and generalization, CNN-BiGRU is recommended. For a balance between complexity and performance, CNN-GRU offers strong predictive capabilities with slightly simpler architecture. While simpler models like LSTM, BiLSTM, and GRU are effective and easier to implement, including CNN layers significantly enhances performance, especially in more complex temporal-spatial scenarios like streamflow prediction. The bidirectional capabilities of BiGRU further boosts model performance, making CNN-BiGRU the most robust and reliable option among the models evaluated.
For a comparison of the model’s prediction performances, a graphical comparison is also performed. For this purpose, scatter plots, Taylor diagrams, and violin plots are utilized (
Figure 7,
Figure 8 and
Figure 9). The scatterplots (
Figure 7) demonstrate that hybrid models incorporating CNN layers, particularly CNN-BiGRU and CNN-BiLSTM, significantly outperform standalone LSTM, GRU, and BiGRU models. CNN-BiGRU, in particular, shows the highest R
2 value, indicating its strong predictive power and ability to capture complex streamflow dynamics accurately. The Taylor diagram (
Figure 8) shows that CNN-BiGRU outperforms other models in correlation, variability, and RMSE, followed closely by CNN-BiLSTM. This diagram confirms that hybrid models incorporating CNN and bidirectional recurrent layers (GRU or LSTM) offer superior performance for streamflow prediction. The other models, such as LSTM, GRU, and BiLSTM, perform moderately well but fall short of the hybrid CNN models in accurately capturing the observed streamflow characteristics. The violin plots (
Figure 9) highlight that hybrid models with CNN layers, particularly CNN-BiGRU and CNN-BiLSTM, provide the most accurate representation of observed streamflow distribution. CNN-BiGRU has the best alignment with the observed distribution, capturing both the variability and extreme values effectively. This visualization reinforces the findings from previous analyses, suggesting that CNN-BiGRU is the most robust model for accurately predicting streamflow, followed closely by CNN-BiLSTM. In contrast, standalone GRU and LSTM models show limitations in capturing the full range of streamflow values, especially for extreme flows.
Table 9 compares the performance of various models (LSTM, BiLSTM, CNN-LSTM, CNN-BiLSTM, GRU, BiGRU, CNN-GRU, and CNN-BiGRU) in predicting peak streamflow values during the testing stage. The key metric for comparison is the relative prediction error percentage for dates with observed peak streamflow greater than 807. CNN-BiGRU has the lowest absolute error (108.4), indicating the most accurate peak streamflow predictions. CNN-BiLSTM follows with an absolute error of 144.1, showing strong predictive accuracy. CNN-GRU and CNN-LSTM also perform well, with absolute errors of 157.1 and 211.9, respectively. LSTM and GRU models have higher absolute errors, with LSTM showing the highest error at 284.8. The performance varies across dates, but CNN-BiGRU consistently shows lower prediction errors compared to other models, especially in later years (e.g., 2010–2015). CNN-LSTM and CNN-BiLSTM models also perform well but show higher errors in some cases, particularly in earlier years (e.g., 2006–2007). Non-CNN models like LSTM and BiLSTM show higher errors, particularly in cases where observed values are higher. The trend shows that models incorporating CNN layers (especially in combination with BiGRU) tend to outperform others, suggesting that the convolutional layers help capture more intricate patterns in peak streamflow data. CNN-BiGRU is the most effective model for predicting peak streamflow, with the lowest overall prediction error, indicating its strength in handling complex patterns in peak data. CNN-BiLSTM and CNN-GRU also perform well, showing that models combining CNN with either BiLSTM or GRU can capture and predict peak streamflow more accurately than traditional LSTM or GRU models. LSTM and GRU models, while still useful, show higher errors in peak streamflow prediction, indicating potential limitations in capturing complex temporal patterns without the added CNN layers. The analysis suggests that for peak streamflow prediction, models that integrate CNN with BiGRU or BiLSTM provide the most accurate results.
Table 10 presents a seasonal analysis of the streamflow prediction models’ performance, comparing different metrics across winter, spring, summer, and autumn. the models include CNN-BiLSTM and CNN-BiGRU, with additional variations incorporating snow-covered area (SCA) data as an input feature. In winter, The CNN-BiGRU model with SCA data performs best, with the lowest RMSE (4.268 in training and 4.417 in testing) and highest R
2 (0.763 in training and 0.742 in testing). This indicates that the inclusion of SCA data helps capture wintertime streamflow patterns, which are heavily influenced by snow conditions. Spring has the highest overall R
2 values, with CNN-BiGRU + SCA again showing the best performance (R
2 of 0.941 in training and 0.937 in testing). The MAE and RMSE values also indicate high accuracy, with CNN-BiGRU + SCA achieving RMSE values of 48.61 in training and 45.64 in testing. This superior performance can be attributed to spring meltwater dynamics, which the models capture more accurately with SCA data integration. Summer shows a slight drop in R
2 and NSE scores compared to spring, likely due to increased variability in streamflow from rapid glacial melt and monsoon impacts. The CNN-BiGRU + SCA model again performs best, with an RMSE of 114.1 in training and 118.9 in testing, indicating it effectively captures the complex summer flow patterns when SCA data are included. The models maintain high accuracy into autumn, with CNN-BiGRU + SCA achieving the best results across all metrics. This model’s RMSE values are 40.19 in training and 37.84 in testing, and R
2 reaches 0.914 in training and 0.908 in testing, suggesting a reliable fit. The strong autumn performance may reflect lower seasonal variability in streamflow as snowmelt slows, allowing the model to achieve stable predictions. The integration of SCA data consistently enhances model accuracy across seasons, especially with CNN-BiGRU. The seasonal performance variation indicates that snow cover plays a critical role in predicting streamflow in the Upper Indus Basin, as evidenced by improved RMSE, MAE, R
2, and NSE values with SCA. The CNN-BiGRU model with SCA integration emerges as the most effective model for seasonal streamflow prediction, reflecting its ability to generalize across seasonal patterns with high reliability.
6. Discussion
In this study, LSTM was applied as a baseline model for streamflow prediction. The results indicated that while LSTM performed reasonably well, its accuracy was surpassed by more complex models like CNN-BiGRU and CNN-BiLSTM, particularly in peak streamflow predictions. LSTM models are widely recognized in hydrological studies for their ability to capture long-term dependencies in time-series data. Studies like Nakhaei et al. [
23] have shown that LSTM models significantly outperform traditional models, particularly in capturing the non-linear and temporal dependencies of streamflow data. However, these models can be further enhanced when combined with other deep learning techniques, such as CNN, to improve feature extraction and overall prediction accuracy. BiLSTM models have been noted for their superior performance in various hydrological modeling tasks compared to unidirectional LSTM. Abdoulhalik and Ahmed [
24] highlighted that BiLSTM can better capture the bidirectional dependencies in streamflow data, making it more suitable for complex hydrological forecasting. However, consistent with our study, the literature suggests that BiLSTM’s performance can be further improved by integrating CNN or other deep learning architectures to enhance spatial feature extraction. GRU models are known for their ability to achieve similar performance to LSTM models but with fewer parameters and faster training times. Wegayehu et al. [
26] and Vatanchi et al. [
25] have demonstrated that GRU is effective for streamflow prediction, particularly in scenarios with limited data. However, similar to our findings, GRU often benefits from hybridization with CNN to improve accuracy. The use of CNN in hydrological modeling, especially when combined with RNN architectures like LSTM or GRU, has been shown to enhance model performance by effectively extracting spatial patterns from input data [
27,
31]. Studies such as those by Wu et al. [
30] and Maiti et al. [
29] support our findings, demonstrating that hybrid models like CNN-LSTM excel at handling complex, high-resolution datasets, thereby improving both accuracy and robustness in streamflow prediction. Similarly, research by Hassan and Hassan [
85] and others highlights the critical role of incorporating remotely sensed data, such as SCA, into hydrological models to better represent snowmelt dynamics and enhance streamflow predictions. The use of MODIS-derived SCA data in our study aligns with these findings and underscores its importance in improving model performance, particularly in snow-fed river basins. Our results are consistent with existing literature, which shows that hybrid models combining CNN with LSTM, BiLSTM, GRU, and BiGRU outperform traditional and standalone deep learning approaches for streamflow prediction. This advantage is particularly evident in their ability to capture complex temporal and spatial patterns, as well as peak streamflow events.
Table 10 reveals distinct seasonal patterns in model performance, highlighting the critical role of snow cover and seasonal temperature shifts in predicting streamflow in snow-dominated basins like the Upper Indus Basin (UIB). The CNN-BiGRU model with SCA data consistently achieved the best performance across all seasons, demonstrating its capability to capture complex seasonal variations. Notably, model accuracy was highest in spring, as this period represents a steady increase in snowmelt-driven flow, which the model successfully predicts. Winter accuracy, although slightly lower, remains robust due to the model’s integration of SCA data, which helps capture low-flow dynamics. The summer season presented the greatest challenge due to heightened variability from glacial melt and monsoon influence, leading to higher RMSE values. Nonetheless, the CNN-BiGRU model effectively generalized across this variability, maintaining prediction reliability. This seasonal analysis underscores the added value of incorporating SCA data, particularly in snow-fed systems, and highlights the importance of model selection based on seasonal characteristics.
7. Conclusions
This study conducted a comprehensive evaluation of advanced deep learning models for streamflow prediction in the Upper Indus Basin (UIB), focusing on their ability to accurately forecast monthly and peak streamflow. The models applied included LSTM, BiLSTM, GRU, CNN, and their hybrid combinations (CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU). A novel aspect of this research was the integration of MODIS-derived snow-covered area (SCA) data, which provided critical information on snowmelt dynamics, a significant contributor to streamflow in the UIB. The hybrid models, particularly CNN-BiGRU and CNN-BiLSTM, demonstrated superior performance over traditional models like LSTM and GRU. For instance, CNN-BiGRU achieved the lowest RMSE (71.6 in training and 95.7 in testing), MAE (39.62 in training and 50.7 in testing), and the highest R2 (0.962 in training and 0.929 in testing) and NSE (0.957 in training and 0.921 in testing). This significant improvement highlights the efficacy of hybrid models in capturing complex temporal and spatial patterns in streamflow data.
The integration of SCA data from MODIS was found to enhance model accuracy substantially. For example, when SCA data were included, the CNN-BiLSTM model’s RMSE improved from 83.6 to 71.6 during training and from 108.6 to 95.7 during testing. This indicates that SCA data are a crucial factor in improving the predictive capability of deep learning models in snow-dominated basins like the UIB. In peak streamflow prediction, CNN-BiGRU outperformed other models with the lowest absolute error (108.4), followed by CNN-BiLSTM (144.1). This outcome is significant as it demonstrates the model’s ability to predict extreme events accurately, which is critical for flood forecasting and water resource management. The findings are consistent with the broader literature, where hybrid models integrating CNN with RNN architectures (like LSTM, BiLSTM, and GRU) are shown to outperform traditional models in hydrological forecasting tasks. This study’s results reinforce the notion that combining CNN’s spatial feature extraction capabilities with the temporal dependencies captured by LSTM or GRU significantly enhances model accuracy.
The study’s results underscore the importance of using hybrid deep learning models for hydrological forecasting in regions like the UIB, where snow and glacier melt significantly influence streamflow. By accurately capturing both the temporal dynamics of streamflow and the spatial characteristics of snow cover, these models provide a robust framework for water resource management. The quantitative outcomes of this study suggest that hybrid models could be particularly valuable for operational forecasting in similar snow-dominated basins globally. The demonstrated improvements in prediction accuracy, especially for extreme events, highlight the potential for these models to support more informed decision-making in flood risk management and water allocation. In conclusion, the integration of advanced deep learning techniques, particularly hybrid models like CNN-BiGRU and CNN-BiLSTM, with remotely sensed data like SCA offers a powerful approach for streamflow prediction. The quantitative improvements observed in this study—such as the reduction in RMSE and MAE and the increase in R2 and NSE—demonstrate the significant potential of these models to enhance the accuracy and reliability of hydrological forecasting, ultimately contributing to better water resource management and planning in the UIB and similar regions.
The seasonal analysis findings underscore the utility of hybrid models, particularly CNN-BiGRU with SCA integration, in accurately predicting streamflow across distinct seasonal conditions. The ability of these models to handle complex temporal and spatial patterns was evident in their performance, particularly in the spring and autumn seasons. Although summer predictions were slightly less accurate due to increased flow variability, the CNN-BiGRU model demonstrated strong generalization capabilities. The seasonal breakdown highlights the importance of integrating snow cover data and supports the adoption of seasonally responsive models for hydrological forecasting in snow-dominated basins.
The accuracy of the models heavily depends on the quality and resolution of the input data. While MODIS provides valuable SCA data, the study could be limited by the availability of high-resolution, continuous datasets for other meteorological and hydrological variables. The models were trained and tested specifically for the UIB, which has unique hydrological characteristics. The results may not be directly applicable to other regions with different climatic and hydrological conditions without further calibration and validation. To improve model robustness and generalization, future studies should incorporate more diverse and higher-resolution datasets, including additional remote sensing products and in situ measurements. Expanding the temporal range of data used for training could also enhance model accuracy. Given the specific nature of the UIB, applying transfer learning techniques could allow these models to be adapted and applied to other regions with different hydrological characteristics, improving their generalizability.