1. Introduction
Integrated water resource management (IWRM) promotes a coordinated and holistic approach to managing water resources, considering the interconnections among water, land, and related resources. Dams, by their very function, affect the environment, industries, residential areas, and agricultural lands. Efficient management of a dam can minimize its negative impacts and enhance its positive effects on all these elements. Environmental benefits of efficient management include reducing adverse effects like floods and droughts. For industries and residential areas, it improves energy efficiency and access to clean water. For agricultural lands, it provides more efficient irrigation opportunities. To achieve these benefits, IWRM must be implemented by measuring and forecasting dam levels. This study examines the optimal method for predicting dam occupancy levels precisely in the context of IWRM. We aim to enhance water and environmental sustainability by increasing the operational efficiency of dams and supporting business sustainability by proposing an efficient algorithm that uses less energy and requires less hardware than its alternatives in the scientific literature. To achieve our goal, we first propose combining physical calculations with data-driven models to predict dam occupancy levels accurately and precisely. Second, we implement AI algorithms, such as Extreme Trees and deep learning, to enhance energy and hardware efficiency. This approach enables the implementation of intervention strategies more precisely against scenarios, such as droughts and floods, and the preservation of downstream ecological integrity with an efficient approach.
The occupancy rates of seven different dams in Istanbul were estimated over the past five years. The reason for estimating occupancy is that it normalizes varying water levels across different dams. The inputs for the prediction models consisted of weather data that have a high correlation with occupancy, Penman–Monteith equation parameters [
1], industrial and residential water consumption, and historical reservoir data. Thus, reservoir level, rainfall, and evapotranspiration were evaluated during the prediction process. The prediction performance of models with weather data and models without weather data was compared to understand the effect of weather data on the prediction.
We utilized AI algorithms with the proposed dataset to determine the most effective method through comparative analysis. The algorithms considered in this assessment include Orthogonal Matching Pursuit CV, Lasso Lars CV, Extra Trees, Random Forest, Ridge CV, Transformed Target Regressor, and LSTM. Our objective in choosing these algorithms was to assess their performance in water level prediction compared to alternative methods that we believe may offer superior results. To conduct this evaluation from multiple perspectives, we formulated four prediction scenarios: daily, weekly, bi-weekly (15-day periods), and monthly (30-day periods). We assessed the performance of these scenarios using statistical hypothesis tests.
The structure of this paper is as follows: first, the research background on dam level prediction studies and methods for predicting water levels is presented. Next, an in-depth introduction to selected dams in Istanbul is provided. This is followed by a detailed case study, results, and discussion. Finally, this paper concludes with a summary of key findings.
2. Literature Review
This section seeks to present a summary of the current research on the intersection of dam reservoir challenges and sea level prediction methods. It reviews the existing literature on the subject and highlights research gaps that form the foundation for this and future work.
Water is a highly valuable resource that significantly contributes to both the environment and the economy. Therefore, the levels of water resources, such as reservoir levels, directly impact total added value [
2] and the net present value of current and future incomes [
3]. Water resources serve multiple purposes, including agriculture, energy generation, fish farming, and drinking water [
4]. Consequently, both reservoir levels and their intended uses influence how these resources are utilized. Scientific research on water resources thus focuses on areas such as safety, water level measurement technologies, integrated water resource management, and the effects on the environment and agriculture, with a particular emphasis on dams. From a safety perspective, environmental risks and structural health issues, such as dam floods, concrete displacement, and seepage, are investigated at the water level [
5,
6,
7,
8] and in terms of water temperature [
9,
10]. Measuring the water level enables IWRM in dams, as maintaining the water level is essential not only for ensuring the efficiency of dam operations but also for effective reservoir management and a reliable freshwater supply [
11]. The measurement of water levels relies on Internet of Things (IoT) technology, which involves placing sensors in a dam [
12,
13], or on remote sensing using artificial intelligence (AI) with satellite imagery [
11,
14,
15]. Whereas those technologies are essential in different aspects to improve dam management, one step ahead is to predict the water level and develop intervention strategies for enhancing the improvement [
16]. The prediction of water levels significantly contributes to IRWM [
17]. Particularly, the prediction of climatic factors such as drought and excessive rainfalls is required to develop effective intervention strategies [
18,
19]. Overall, the strength of the contribution is related to the accuracy of the prediction methods used [
20]. The accuracy of predictions depends on the algorithms used, the input parameters, the climate, and the behavior of the reservoir as a system.
In the context of water level prediction, inflow, outflow, historical data of the reservoir, maximum temperature, visibility, humidity, wind speed, cloud cover, and rainfall are utilized as input parameters for models [
20,
21,
22,
23,
24,
25]. Additionally, parameters such as evapotranspiration, solar radiation, and ambient temperature affect the water level of dams [
1]. These parameters are utilized by various ML algorithms, including artificial neural networks [
21,
22], Long Short-Term Memory [
16,
24], nonlinear autoregressive models [
24], multiple linear regression [
22], and support vector machines [
23]. Overall, the level of water is predicted between 1% and 12% mean absolute percentage error (MAPE) [
11,
17,
21,
23], as presented in
Table 1. Models provide predictions on a daily or monthly basis, which are one day to thirteen days and one month [
17,
23,
24,
25]. At this level, three different input combinations are commonly used. The first combination includes only historical reservoir data [
22]. The second uses only weather data [
21]. The third combines both weather data and historical reservoir data [
21,
22]. The level of water in dams commonly having autocorrelation makes historical data of the reservoir an indispensable input parameter; however, the prediction of future states of the dam requires weather events like rainfall variability or weather characteristics having environmental effects such as droughts [
26,
27].
When the previous studies were reviewed, it was seen that dam water levels were predicted using time series analysis. The literature reveals a lack of research on predicting dam water levels using a hybrid model that integrates a physical model of a dam’s inflow and outflow, including evapotranspiration, with data-driven models incorporating weather data. This study aims to be the first to address this gap.
3. Dams of Istanbul
We considered seven dams in Istanbul as follows: Ömerli, Darlık, Elmalı, Terkos, Alibey, Büyükçekmece, and Sazlıdere. The Ömerli Dam (in
Figure 1a), the largest in Istanbul, was constructed primarily to supply the city’s drinking water. This dam, which is of the earth-fill type, has a height of 52 m from the riverbed. At the normal water level, the reservoir volume is 386.50 hm
3, and the reservoir area is 23.10 km
2. It provides 180 hm
3 of drinking and utility water annually. Darlık Dam (in
Figure 1b) was built to supply drinking water, with a height of 73 m from the riverbed. At the normal water level, the reservoir volume is 107 hm
3, and the reservoir area is 5.56 km
2. The dam provides 108 hm
3 of drinking and utility water annually. Elmalı Dam (in
Figure 1c) was built to supply drinking, utility, and industrial water. The dam, which is of the concrete type, has a body volume of 103,000 m
3 and a height of 42.5 m from the riverbed. At the normal water level, the reservoir volume is 10 hm
3, and the reservoir area is 2.80 km
2. It ensures the supply of 10 hm
3 of drinking water annually. Terkos Dam (in
Figure 1d) was constructed as a concrete fill-type dam to supply drinking water. The height of the dam from the riverbed is 8.80 m. At the normal water level, the reservoir volume is 186.80 hm
3, and the reservoir area is 30.40 km
2.
Alibey Dam (in
Figure 2a) was constructed as an earth-fill type dam to supply drinking, utility, and industrial water. The dam has a body volume of 1,930,000 m
3 and a height of 30 m from the riverbed. At the normal water level, the reservoir volume is 66.80 hm
3, and the reservoir area is 4.66 km
2. It provides 39 hm
3 of drinking water annually. Büyükçekmece Dam (in
Figure 2b) was constructed as an earth-fill type dam to supply drinking, utility, and industrial water. The dam has a body volume of 2,020,000 m
3 and a height of 13 m from the riverbed. At the normal water level, the reservoir volume is 161.61 hm
3, and the reservoir area is 43 km
2. The dam provides 102 hm
3 of drinking and utility water annually.
Sazlıdere Dam (in
Figure 2c) was constructed to obtain drinking water. The dam, which is of the rock-fill type, has a body volume of 1,880,000 m
3. Its height from the riverbed is 48 m. At the normal water level, the reservoir volume is 91.60 hm
3, and the reservoir area is 11.81 km
2. The dam provides 50 hm
3 of drinking water annually.
All dams contribute to supplying drinking water, but some dams also provide water for industrial and other utility purposes. Notably, these dams lack hydroelectric plants, and therefore do not contribute to electricity production. As a result, the water from these dams is used directly by consumers, making these dams irreplaceable. Therefore, accurately predicting their water occupancy levels is crucial for effective IWRM. This enables timely intervention strategies to be applied, ensuring better resource management and preventing shortages.
According to the occupancy data shown in
Figure 3, some dams are nearing zero occupancy levels, posing a significant risk of drought. Additionally, some dams are observed to be full for several months, which increases the risk of dam flooding in the event of extreme rainfall. As indicated in
Figure 3, predictive prevention strategies are necessary for these dams to mitigate the risks of both drought and flooding. The plots in
Figure 3 are exceptionally smooth, indicating that sharp fluctuations in the occupancy levels of the dams are rare. The smoothness of plots is beneficial, as AI methods are particularly adept at fitting smooth curves. Consequently, our investigation focuses on identifying key indicators that shed light on water filling and water loss dynamics within the dam.
4. Design of the Dataset
The model’s input data includes weather data, evapotranspiration data, daily water consumption data, and historical reservoir data. In this section, we provide a detailed explanation of each parameter within each dataset and present comprehensive calculations along with Pearson’s correlation values. We chose Pearson’s correlation because of several key advantages. First, it is widely recognized and used across various fields, which enhances the clarity and accessibility of our results. Second, Pearson’s correlation is particularly effective when the normality assumption is met, offering greater efficiency compared with non-parametric methods like Spearman’s or Kendall’s correlations. Third, it is ideal for continuous interval or ratio data, where the differences between values are meaningful.
In the first step, we built a dataset that includes a variety of parameters related to dam occupancy levels. Historical reservoir data, water consumption data, and weather data, collected using industrial-grade sensors, were downloaded from institutional data services. However, we calculated evapotranspiration (
) based on weather data and the geographical information of Istanbul.
plays a significant role in predicting water levels in dams by providing critical information about water loss; the evapotranspiration data involve the evapotranspiration value calculated using the Penman–Monteith method (as described in Equation (1), as well as the method’s input variables, which include the following: solar radiation, wind speed, and pressure [
1]. The
, the combined process of water evaporation from the soil and transpiration from plants, is influenced by several key factors including solar radiation, temperature, humidity, and wind speed. The
was calculated by the slope of the vapor pressure curve (Δ), the net radiation at the crop surface (
), the soil heat flux density (
), the mean daily air temperature at 2 m height (
), the wind speed at 2 m height (
), the saturation vapor pressure (
), the actual vapor pressure (
), the saturation vapor pressure deficit (
), and the psychrometric constant (
), as given in Equation (1). Evapotranspiration has a noticeable correlation with occupancy levels compared with most weather parameters.
The saturation vapor pressure was calculated based on the daily temperature using Equation (2) below. Similarly, the actual vapor pressure was determined using the dew point, as shown in Equation (3).
The slope of the vapor pressure curve was calculated from the saturation vapor pressure using Equation (4), as shown below.
To calculate the psychrometric constant, we first determined the atmospheric pressure (
) using the latitude (
) of Istanbul, which is 41.0151. Equation (5) represents the atmospheric pressure. In Equation (6), we calculated the psychrometric constant using the specific heat at constant pressure (
) value of 0.001013.
The wind speed, net radiation, and temperature were obtained from weather data, while the soil heat flux density was assumed to be zero for the
calculation, as shown in
Figure 4. The figure indicates that occupancy levels generally start decreasing in the spring and summer periods as
increases.
Secondly, we analyzed the correlations among weather data, water consumption,
, and dam occupancy levels to identify the most influential parameters for accurate prediction. Since
and water consumption contribute to outflow, their relationship with dam occupancy levels is expected. Additionally, weather data has been observed to have a strong correlation with dam water levels [
21], making it a crucial factor in our correlation analysis. The weather data consisted of temperature, felt temperature, humidity, dew point, cloud cover, rainfall, snow depth, and daylight duration.
Table 2 shows that solar radiation, cloud cover, and daylight duration have a significantly stronger correlation with dam occupancy levels compared with the other weather data. Although water consumption and
data do not exhibit a meaningful correlation with dam occupancy levels, they are included in the dataset because of their relevance to outflow.
Thirdly, we analyzed daily water consumption data and historical reservoir data to identify the autocorrelation that influences occupancy rates. Daily and weekly historical reservoir data exhibited a strong correlation with dam water levels, demonstrating autocorrelation, as illustrated in
Table 3. This made it feasible to predict the dam occupancy levels as a time series. To improve prediction accuracy, historical reservoir data were analyzed over periods ranging from 1 to 7 past intervals, with evaluations conducted on 1-, 7-, 15-, and 30-day bases.
Lastly, based on the correlation values (ranging from −1 to 1, represented by white to dark blue) presented in
Table 2 and
Table 3, we generated the dataset by incorporating weather data, consumption data, evapotranspiration, and historical reservoir data. From the weather data, we selected solar radiation, dew point, daylight duration, and rainfall, as these factors directly influence outflow and inflow. Despite rainfall showing a weak correlation with occupancy rates, it significantly enhanced prediction performance. As a result, the dataset includes our
calculations and industrial-grade sensor data on weather and dam conditions, which are directly related to dam inflows and outflows, all sourced from institutional data services. It includes daily data spanning approximately five years (1777 days), from 2019 to 2024, with no missing entries. All inputs were confirmed to follow a normal distribution through the KS test, yielding a
p-value of 0. Thus, we applied the Z-Score outlier detection method using a Z-value threshold of 3. No outliers were detected in any of the inputs or outputs. As a result, we constructed two datasets. The first dataset included
, weather data, and consumption data, which were parameters of the physical model for inflow and outflow, along with historical data based on a data-driven modeling approach. The second dataset included only historical reservoir data relying on a data-driven approach. After constructing the dataset, we split it into 80% for training and 20% for testing during the model development process.
5. Prediction Models
Because of the autocorrelation in dam occupancy levels, we selected algorithms commonly used for predictive modeling. The algorithms evaluated include Orthogonal Matching Pursuit CV, Lasso Lars CV, Extra Trees, Random Forest, Ridge CV, and Transformed Target Regressor, all of which are designed to mitigate overfitting. Additionally, we used LSTM, which is commonly used in studies for water level prediction. Finally, we compared the performance of the commonly used methods in water level prediction studies with alternative approaches that we believe may offer higher performance. To conduct this comparison from various perspectives, we designed four prediction scenarios, which were daily, weekly, bi-weekly (15-day periods), and monthly (30-day periods). Our approach allowed us to analyze the performance of algorithms across different prediction horizons and identify which algorithms are more suitable for intervention strategies. In the correlation analysis tables, we abbreviated the following terms: weather data (WD), consumption data (CD), evapotranspiration (), and historical reservoir data (HRD).
5.1. Long Short-Term Memory
Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) designed to capture long-term dependencies in sequential data. This capability makes them well-suited for tasks such as time series forecasting as presented in
Table 4.
The average MAPE of the LSTM model varies between 0.5% and 3.5% depending on the prediction horizon, typically remaining below 1% for daily predictions. To achieve this performance, we implemented an LSTM network with the following hidden layers: the first layer has 60 nodes with a ReLU activation function, and the second layer consists of 120 nodes, also with a ReLU activation function. We used the Adam optimizer with a learning rate of 0.015, a beta of 0.9, and an epsilon of 1 × 10−7. We used MSE as the loss function during the training process.
5.2. Orthogonal Matching Pursuit CV
Orthogonal Matching Pursuit CV (OMPCV) identifies the best features for a cross-validated estimation process using a sparse approximation algorithm. It employs an orthogonal projection basis to find the optimal matching projections of multidimensional data. We implemented OMPCV with a 5-fold cross-validation parameter. The maximum number of iterations was limited to either 10% of the total number of features or five, whichever was greater.
Table 5 presents the performance of the OMPCV model for each combination of features and prediction horizons. While short-term predictions perform exceptionally well, their accuracy significantly decreases as the forecast horizon lengthens. Furthermore, weather data do not significantly affect the prediction performance of OMPCV.
5.3. Lasso Lars CV
Lasso Lars CV (LLCV) is a regression analysis method that simultaneously performs variable selection and regularization. This enhances the prediction accuracy and interpretability of the cross-validated estimation process on multidimensional data. We implemented LLCV with 5-fold cross-validation, similar to OMPCV. The maximum number of iterations was set to 500, and the maximum number of points for computing residuals in the cross-validation was 1000. The machine-precision regularization was 2.22044 × 10
−16.
Table 6 presents the performance of the LLCV model for each combination of features and prediction horizons. Similarly, with OMPCV, LLVC delivers outstanding short-term predictions. However, its accuracy significantly diminishes as the forecast horizon extends. Additionally, the positive impact of weather data, consumption data, and
becomes more evident as the term length increases.
5.4. Random Forest
The Random Forest (RF) algorithm combines ensemble learning methods with the decision tree algorithm to create multiple decision trees, each drawn randomly from the data. The results of these trees are averaged to produce a final result, often leading to more accurate predictions and classifications. RF was implemented with mean squared error as the criterion to minimize variance, utilizing a forest of 100 trees. The minimum number of samples required to split an internal node was set to two, and at least one sample was required to form a leaf node. When searching for the best split, only one feature was considered.
Table 7 presents the performance of the RF model for each combination of features and prediction horizons. RF delivers outstanding short-term predictions and maintains relatively good performance as the forecast horizon extends. Also, weather data do not significantly affect the prediction performance of RF.
5.5. Extra Trees
The Extra Trees (ET) algorithm, similar to the Random Forests algorithm, generates multiple decision trees. However, unlike Random Forests, Extra Trees uses random sampling without replacement, resulting in a unique dataset for each tree. Additionally, a specific number of features from the total set are randomly selected for each tree. The most distinctive characteristic of Extra Trees is its random selection of splitting values for features. Instead of computing a locally optimal value using criteria like Gini impurity or entropy, the algorithm randomly selects a split value. This approach enhances the diversity and reduces the correlation among the trees. ET was implemented with the same parameters as RF.
Table 8 presents the performance of the ET model for each combination of features and prediction horizons. ET is the most accurate method for predicting occupancy levels across all horizons. The impact of weather data on ET’s prediction accuracy is noticeable only for daily occupancy predictions.
6. Results and Discussion
The importance of water for both the environment and its economic value makes it a resource that must be managed carefully. In this context, we evaluated how to implement the most effective IWRM in dams, which are among the most crucial water resources in the world. Our review of studies presented in the scientific literature revealed that efficient water management requires both measuring water levels and predicting these levels for more advanced planning. Accordingly, we investigated the most suitable prediction model for achieving the most effective IWRM. In this context, we explored the advantages of a hybrid model that integrates a physical model-driven approach with a data-driven approach. We examined the correlations among weather data, water consumption, evapotranspiration, historical reservoir data related to inflow and outflow, and dam occupancy levels for seven dams in Istanbul, as shown in
Table 2 and
Table 3. The correlation analysis indicates that historical reservoir data and dam occupancy levels have a very strong correlation, suggesting that occupancy levels should be treated as a time series. Water consumption and rainfall do not exhibit a strong correlation with dam occupancy levels. Nevertheless, because all dams supply drinking water and rainfall contributes to their water supply, daily water consumption and rainfall are included as features in the proposed models. We concluded that the prediction model should be developed using parameters that account for both the water level in the dam as well as the inflow and outflow of water [
1]. Additionally, solar radiation, humidity, cloud cover, daylight duration, and evapotranspiration have a meaningful correlation with dam occupancy levels.
Based on this analysis, we propose that the most effective model for predicting dam occupancy levels should incorporate weather data, water consumption data, evapotranspiration calculations, and historical reservoir data. To validate this approach, we employed RF and LSTM, which have demonstrated successful results in the scientific literature, as well as ET, OMPCV, and LLCV to make predictions for daily, weekly, bi-weekly, and monthly intervals. To assess the impact of weather data, water consumption data, and evapotranspiration on prediction performance, we compared the prediction performance of identical models created with a dataset consisting of weather data and historical reservoir data against models created using only historical reservoir data.
As shown in
Figure 5a,
, weather data, and consumption data have a positive effect on prediction accuracy as the prediction horizon increases. Therefore, the physical model parameters for inflow and outflow have a significant positive impact on average MAPE values. As illustrated in
Figure 5b, LSTM, RF, and ET models demonstrate strong performance with low MAPE for long-term predictions as well. The average MAPE of LSTM models ranges from 1% to 3.5% for the monthly basis prediction of each dam’s occupancy. This accuracy is better than levels reported for LSTM in similar scientific studies. Notably, ET consistently achieved a MAPE ranging from 0.3% to 1.4% across all intervals, demonstrating a remarkable performance. Consequently, our research contributes to the scientific literature by proposing AI algorithms such as ET, OMPCV, and LLCV for predicting dam reservoir levels. We demonstrate that ET provides more precise predictions of dam occupancy levels compared with RF and LSTM, which are commonly used in the scientific literature.
To reinforce our proposed model with more robust evidence, we conducted a statistical test on the MAPE values. This analysis aimed to identify the conditions that result in the smallest prediction errors. First, we performed the Kolmogorov–Smirnov (KS) test on the MAPE values for two different input sets. The first input set, which includes
, weather data, consumption data, and historical reservoir data, fits the normal distribution with a
p-value of 4.74 × 10
−33. The second input set, which includes only historical reservoir data, also fits the normal distribution with a
p-value of 4.73 × 10
−33. Based on these results, we then performed a two-sided Z-Test to determine whether the samples were identical. The Z-Test resulted in a
p-value of 0.0155. This indicates that the two samples were not identical at the 1% significance level, leading us to accept the null hypothesis. To compare the means of the samples, we performed both the Z-Test and paired T-Test with a less-than-alternative hypothesis. The Z-test produced a
p-value of 0.0078, while the T-Test resulted in a
p-value of 2.59 × 10
−6 with 139 degrees of freedom (DF). Consequently, the null hypothesis was rejected in both tests. Therefore, it can be concluded that
, weather data, and consumption data turned out to reduce the prediction error rate. After validating our hypothesis with statistical tests, we tried to figure out the best model utilizing the dataset “
+ WD + CD + HRD”. Firstly, we performed an ANOVA test on the performance data of all models. The test resulted in a
p-value of 7.579 × 10
−11, which led us to accept the alternative hypothesis. Since the alternative hypothesis of the ANOVA test posits that the samples are not identical, we compared their performances using a paired
T-Test with a less-than-alternative hypothesis. First, we compared LLCV and OMPCV. The performance of LLCV was superior to OMPCV, as indicated by a
p-value of 0.0008 with 27 degrees of freedom. Next, we compared LSTM and LLCV. The performance of LSTM was superior to that of LLCV, with a
p-value of 5.535 × 10
−5 and 27 degrees of freedom. Third, we compared RF and LSTM. The performance of RF was superior to that of LSTM, with a
p-value of 6.54 × 10
−5 and 27 degrees of freedom. Finally, we compared ET and RF. The performance of ET was superior to that of RF, with a
p-value of 2.085 × 10
−9 and 27 degrees of freedom. As a result of this comparison, the best AI method for predicting dam occupancy levels is ET. The results of all statistical tests leading to this conclusion are summarized in
Table 9.
The best model in terms of average MAPE, ET, performs exceptionally well over the medium and long term (from weekly to monthly) using only historical reservoir data. However, on a daily basis, the ET model achieves lower MAPE when incorporating physical model parameters. The highest MAPE for ET, the best-performing model, is observed in the monthly predictions.
Figure 6a illustrates ET’s predictions for a one-month period at the Ömerli, Darlık, and Elmalı Dams, while
Figure 6b shows the predictions for the Terkos, Alibey, Büyükçekmece, and Sazlıdere Dams. Both
Figure 6 and the MSE values in
Table 8 clearly indicate that ET surpasses other methods in accurately capturing and adapting to the trend. The average MAPE of the best ET model for monthly predictions is 1.2 × 10
−4, as shown in
Table 8, compared with 2.35 × 10
−4 for the LSTM model in
Table 4 and 3.55 × 10
−4 for the RF model in
Table 7.
7. Conclusions
IWRM is a key function to utilize water resources serving multiple purposes, including agriculture, energy generation, fish farming, and drinking water [
4]. One of the significant contributions to the IWRM [
17] field is the prediction of water levels. Predicting water levels enables the implementation of intervention strategies to enhance the efficiency and effectiveness of IWRM. However, achieving truly efficient and effective IWRM requires precise water level predictions. In this context, we propose that combining physical model-based calculations measured data related to inflow and outflow water can significantly improve the accuracy of AI-based water level predictions. Physical model parameters, including water consumption data,
, solar radiation, dew point, daylight duration, and rainfall, contribute positively to reducing the average MAPE of the methods considered. However, for ET, the most effective method, these parameters only have a positive impact on a daily basis. To illustrate this positive effect, we utilized two distinct datasets derived from data collected from seven dams in Istanbul. The first dataset includes “
, WD, CD, and HRD” while the second dataset contains only HRD. We applied the LSTM, RF, LLCV, OMPCV, and ET algorithms to these datasets, aiming to validate our hypothesis and evaluate alternative AI algorithms compared to those commonly used in IWRM studies. Finally, we developed occupancy-level prediction models to standardize data from dams with varying depths.
Following the model development phase, we concluded by validating our proposed model. We contributed to the scientific literature with our hybrid model and explored significant input parameters for water level prediction such as solar radiation, dew point, daylight duration, rainfall, daily water consumption, evapotranspiration, and historical reservoir data. Moreover, we considered AI algorithms that are not frequently used in the IWRM field against the algorithms used. Finally, we discovered that ET had superior performance against commonly used algorithms such as LSTM and RF. ET also required less hardware and energy to operate and predicted the occupancy level one month in advance with only a 1% error margin. Our primary contribution to sustainability is predicting dam occupancy levels at this sensitivity with a more efficient approach than deep learning (LSTM), which provides the following benefits:
Ensures water is allocated optimally among various users (agriculture, industry, domestic) based on current water availability.
Enables proactive management of water resources during periods of drought by accurately predicting available water supplies.
Helps in managing dam releases to mitigate downstream flooding risks by maintaining appropriate water levels.
Maintains minimum ecological flows downstream to support aquatic habitats and biodiversity.
Increases the sustainability of businesses with less hardware requirement and less energy need.
In future studies, we plan to investigate how integrating weather forecasts and hyperparameter optimization can further enhance the performance of AI models. Additionally, we aim to explore novel approaches for incorporating real-time data streams and improving the robustness of predictive models in dynamic environments.