1. Introduction
Water is an important natural resource that has economic and social significance for people. The survival of humans would be in jeopardy without water [
1]. Surface water and groundwater are the two most significant drinking supplies globally. Currently, over 1.1 billion people worldwide lack access to safe drinking water [
2]. Both point and non-point pollution sources contribute to deteriorating surface water quality (WQ) worldwide [
3]. Growing global concern surrounds water quality degradation due to extensive human activity [
3]. Developing nations focus on water supply and cleanliness, while industrialized countries prioritize public health and population growth [
4,
5]. Water quality directly impacts human health, biodiversity, and various uses in the world, including in agriculture and industry, including in Africa [
6]. Contaminated water sources pose significant health risks, causing millions of illnesses and fatalities annually, particularly in developing regions, including East Africa [
7]. In the same way, the lack of safe drinking water is observed in developing countries, including Africa and East Africa, where Lake Tanganyika is located. One of the most basic requirements for survival is access to water in an appropriate amount and quality.
In previous times, models such as conventional modeling methods, which are based on Box-Jenkins and autoregressive models, were used to assess WQ and water consumption but had several limitations. They often require gathering numerous parameters, relying on previous knowledge and calibration, which can be resource-intensive and hinder their applicability [
8]. Similarly, physical-based models necessitate making multiple assumptions and demanding a comprehensive understanding of the subject matter beforehand [
9]. For example, a physical model developed in Malaysia required substantial data collection and incurred significant expenses to evaluate floodplain dynamics and water levels [
10]. Numerical models emerged as alternatives to address the limitations of physical models, as exemplified by the Yangtze River water level forecasting model by Wu et al. [
11]. However, studies like Guan et al. [
12] revealed shortcomings in numerical modeling approaches. Despite advancements, numerical models still struggle with replicating specific physical processes accurately. In recent years, data-driven approaches have gained traction for overcoming traditional models’ shortcomings. Machine learning (ML) algorithms, such as radial basis function (RBF) and support vector machines (SVM), have shown promise in forecasting hydrological characteristics like reservoir evaporation [
13]. Similar success has been observed in predicting subsurface evaporation rates and daily water levels in reservoirs using SVM [
14]. Though some challenges persist with ML algorithms, previous research indicates their superiority over traditional methods in accuracy. Adjusting hyperparameters, like weights and activation functions, is crucial to enhancing ML model performance [
15]. Advancements in artificial intelligence (AI), particularly deep learning methods such as long short-term memory (LSTM) and nonlinear auto-regression neural networks (NARNET), have further improved water quality prediction accuracy. In addition, Theyazn and colleagues employed Naïve Bayes (NB), k-nearest neighbors (KNN), and support vector machines (SVM) models to classify water quality. They achieved an accuracy rate of 97.01% using the SVM machine learning technique, and the R
2 of NARNET is 96.17% [
16]. Additionally, machine learning techniques like decision trees (DT) and boosted decision trees (BDT) have demonstrated success in predicting daily precipitation [
17]. Machine learning and deep learning algorithms have been employed in various studies to forecast and categorize water quality, highlighting their versatility and effectiveness [
18].
Nonetheless, urban water distribution networks are crucial infrastructure in cities, necessitating intelligent management to ensure sufficient water supply at the desired pressure and quantity [
19]. Forecasting short-term water demand is a key aspect of water distribution network operation and management, influenced by factors like temperature, population, and water pressure [
20]. Water demand forecasting methods encompass both linear and nonlinear approaches [
21]. Linear methods, including exponential smoothing and autoregressive integrated moving averages (ARIMA), rely on time series analysis [
22]. However, nonlinear methods such as artificial neural networks (ANNs) tend to offer better accuracy for short-term forecasting. For instance, Ghiassi et al. [
23] employed ANNs for water demand prediction. Herrera utilized the support vector machine method. Additionally, fuzzy logic was applied by [
24]. Various other ANNs were employed in previous research for urban water demand prediction, including the generalized regression neural network, radial basis function networks, feedforward neural networks [
25], and the extreme learning machine method [
26]. Despite their effectiveness, ML models face challenges in feature selection and overfitting [
27]. Deep learning methods, like long short-term memory (LSTM) and gated recurrent units (GRU), show promise in improving accuracy for water demand prediction [
28]. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are also employed, with recent studies favoring hybrid DNNs combining both architectures [
28,
29,
30]. Hybrid DNNs integrate CNNs for spatial feature extraction and RNNs for temporal feature modeling. These models have applications across various domains, including human activity recognition and energy forecasting. Despite the effectiveness of hybrid DNNs, they remain underexplored in water demand prediction, a task inherently challenging due to the complex time series nature of water demand data [
30].
However, natural elements and anthropogenic activities such as mining, urbanization, stormwater, construction waste, domestic waste, non-functional wastewater treatment plants (WWTPs), and agriculture are key factors influencing the WQ of Lake Tanganyika [
31]. Point sources, like industrial sites and WWTPs, directly discharge pollutants into the water of Lake Tanganyika, including heavy metals and organic compounds, significantly impacting ecosystems [
32]. Millions of people who rely on the resources of this lake for their livelihoods depend on clean water. The water quality of Lake Tanganyika is crucial for aquatic ecosystems’ productivity, particularly fish resources and human health. It is crucial to preserve high water quality in the Lake Tanganyika region since the lake is a major source of drinking water and irrigation for the local communities.
Moreover, sustainable management practices are crucial to mitigate water quality deterioration and safeguard public health and the ecosystems of Lake Tanganyika. The research is particularly significant as Lake Tanganyika is Bujumbura’s primary water source and is highly polluted [
33]. In addition, water scarcity, population growth, urbanization expansion, industrial development, lifestyle changes, and inefficient distribution are major issues in Bujumbura, resulting in sporadic shortages and inconsistent supplies [
34]. Accurate prediction of water use can help with resource management and provide a more steady supply of water for the region’s expanding population. Monitoring water quality and consumption is, therefore, essential to lessening the effects of contaminated and filthy water.
Nonetheless, recent studies have effectively used recurrent neural networks and machine learning techniques to predict water quality and consumption and classify water quality forecasting. Depending on this success, the current study leverages the water quality index (WQI) and consumption, incorporating diverse metrics to enhance prediction models. The main contributions of this paper include uniquely employing multi-model architectures such as GRU, LSTM, and BiLSTM, differing from previous single-model approaches. These models capture complex temporal dependencies, improve prediction accuracy and robustness, and offer insights into Lake Tanganyika’s water quality and consumption dynamics. The use of multiple models facilitates thorough comparisons, advancing research in water quality assessment and demand planning.
Additionally, SVM, KNN, and RF are employed for water quality categorization, broadening the study’s methodological scope and providing robust, interpretable results for environmental management. Accurate predictions aid in effective water resource management and pollution control. This pioneering study, focusing on the Lake Tanganyika region, offers valuable contributions to water quality management for Burundi, the Democratic Republic of Congo, Zambia, and Tanzania, ensuring ecological integrity and sustainable resource use.
5. Conclusions
Since clean water (CW) is a vital element in health worldwide and is necessary for human health, water quality (WQ) monitoring is required. Environmental protection greatly depends on the modeling and prediction of WQ. Unlike conventional approaches, deep learning and machine-learning-based systems that use features from the water quality index yield reliable findings for predicting water quality, water demand, and the classification of water quality. This paper proposes effective multi-model architectures of recurrent neural networks, including GRU, LSTM, and BiLSTM, that are used to forecast water quality and consumption with excellent accuracy and robustness rather than relying on a single model, aiming to leverage the unique strengths of each model and facilitate thorough comparisons. The suggested method performed exceptionally well, according to experiments using two datasets. Due to their efficiency in managing nonlinear relationships, high-dimensional data, and varied feature interactions, SVM, KNN, and RF are used for water quality classification. It allows for accurate classification, which is crucial for evaluating water quality and guiding decision-making processes. The Random Forest (RF) model demonstrated high effectiveness in forecasting water quality classification (WQC) with a testing accuracy of 96%, achieving a macro average and weighted average of 0.96 across precision, recall, and F1-score metrics. RF’s precision and recall were particularly strong in the “Excellent”, “Good”, and “Poor” categories, though slightly lower for “Very poor”, indicating consistent performance across most classes. The SVM model achieved the highest testing accuracy at 97% and had a macro average of 0.78 and a weighted average of 0.97, with near-perfect scores in most categories except “Excellent”, where precision was 0.00, highlighting an issue in predicting this specific class. The KNN model, with a lower testing accuracy of 87%, showed decent performance in terms of precision and recall, particularly for the “Poor” and “Unfit” categories, but had a relatively lower macro average of 0.76 and weighted average of 0.87, suggesting less consistent performance across all classes compared to RF and SVM. The training accuracy rates of RF are 99.89%; SVM and KNN are 98.52% and 82.88%, respectively. Nevertheless, based on the accuracy obtained of 99.89%, the RF fared better than the SVM and KNN.
Similarly, GRU emerges as the most effective model for this dataset, achieving the highest accuracy and reliability with minimal computational complexity. It is demonstrated by its lowest MAE (0.3975%) and RMSE (0.6941%) and the highest R2 (0.78) and NSE of 0.759 for WQ prediction. BiLSTM, which offers enhanced context understanding, performs better than LSTM but does not surpass GRU. Although LSTM is designed for handling long-term dependencies, it performs the least effectively in this context, as shown by the highest MAE (0.4826%), RMSE (0.7624%), and the lowest R2 (0.69) and NSE of 0.610. The BiLSTM model, processing data in both forward and backward directions, shows intermediate performance. Its MAE (0.4197%) and RMSE (0.7126%) are better than LSTM but not as good as GRU, reflecting moderate prediction accuracy. The R2 value (0.73) and NSE of 0.699 also fall between those of GRU and LSTM. The anticipated and original value of WQI showed relatively little variation, as indicated by the Nash-Sutcliffe efficiency (NSE) score of GRU obtained. In the same way, the GRU model outperforms others with the lowest MAE (374) and RMSE (530) and the highest R2 (0.81) and NSE of 0.720 for water demand, indicating the most accurate predictions and smallest errors. Its simpler architecture enhances computational efficiency and effectiveness. The LSTM model, with an MAE of 461 and an RMSE of 675, shows moderate performance and an R2 of 0.70 and an NSE of 0.650, reflecting its capability to handle long-term dependencies but being less suitable for this dataset compared to GRU. The BiLSTM model has the highest errors (MAE 500, RMSE 726) and the lowest R2 (0.65) and NSE value of 0.590, making it the least effective for this dataset. The GRU model outperforms the LSTM and BiLSTM for both WQI and water demand prediction, according to the R2 and NSE obtained. It has been found that, when compared to more intricate deep learning models like BiLSTM, LSTM, and GRU, simpler GRU recurrent neural network designs perform better. The suggested method performed exceptionally well, according to experiments using two datasets.
The WQI and urban water demand are forecasted using LSTM, GRU, and BiLSTM because they can identify short- and long-term dependencies in time-series data. This makes the forecasts accurate, which is important for managing and controlling WQ, planning water demand, protecting public health, and maintaining ecosystems. This allows for accurate classification, which is crucial for evaluating water quality and guiding decision-making processes. Using multi-model architectures is crucial for maximizing predictive accuracy, effectively capturing diverse data types, and advancing environmental monitoring and management research.
In addition, apart from emphasizing the need for continuous endeavors to oversee and regulate the quality of the local water supply, the article provides valuable insights into the contamination of the water in this vicinity. Compared to the wet season, the dry season has more severe pollution in Lake Tanganyika based on the increased and decreased WQI observed. Urban water demand is also high during the summer months compared to the wet season.
The increased water quality index (WQI) correlated with a significant rise in urban water consumption over the corresponding period in the study. This relationship suggests that as urban water usage increases, so does the potential for pollutants to enter water sources, leading to degradation in water quality. The influx of contaminants from various anthropogenic activities, such as industrial discharge and urban runoff, likely contributed to the deterioration in water quality, as reflected by the elevated WQI values. Therefore, the study indicates a direct association between heightened urban water consumption and deteriorating water quality, highlighting the need for sustainable water management practices to mitigate pollution and preserve water resources. Consequently, particular mitigation actions are required to stop further water quality degradation. These include continuous environmental monitoring, public awareness campaigns, and the establishment of stringent norms for the use and upkeep of lakes. The highest urban water demand is also observed during the dry season, necessitating increased awareness of water conservation and advancements in water-saving technology to progressively lower use. The correlation matrix helps identify important variables causing water quality degradation or improvement. It directs targeted activities for environmental management by providing insights into how variations in these parameters connect with changes in WQI. The negative and positive correlations are observed based on the findings illustrated in
Figure 13. One limitation of this research is the variability and limited availability of high-quality, long-term environmental data specific to this region.
Additionally, the complex interactions between diverse ecological, climatic, and human factors make it challenging to create models that accurately capture and predict the lake’s dynamic conditions. Overfitting of the model could ensue. Using the smallest sample size that can be obtained for the trials is one approach, but doing so would leave too few samples for efficient testing and training. As a result, we intend to expand the dataset in our upcoming studies. Further research is needed to forecast and project WQ and urban water demand while taking climate change into account. This will help identify the issues that Lake Tanganyika has faced and will face in the coming decades. A study regarding the water distribution network (WDN) of Bujumbura City is needed to identify the impact of water pressure on its water consumption.