4.1. Networks Creation Using EPANET
We used EPANET to simulate hydraulic systems, including nodes, pipes, tanks, and reservoirs. Nodes represent connection points with properties like ID, coordinates, elevation, and base demand. Pipes connect nodes and their key properties are length, diameter, and roughness. Tanks store water with modifiable properties such as elevation and capacity. Reservoirs are primary water supply sources.
City selection was based on diverse criteria, including climate, population density, and infrastructure complexity. We selected cities with distinct climate zones (oceanic, Mediterranean, and semi-arid in the case of Lugo, Moratalla, and Balerma, respectively) to ensure our approach’s applicability to various environmental conditions. Additionally, factors such as altitude, average annual rainfall, and urban infrastructure were considered to observe how these variables impact water distribution optimization. This diversity enables our models to address a range of urban water challenges, enhancing generalizability.
4.2. Optimization Algorithm
Three algorithms were developed to optimize the above-created WDNs. We use an LR technique to feed our simulation output. When programming LR, three global variables or parameters have been considered:
threshold_optimization: This indicates when a node has successfully optimized its water demand resources. It is set initially to 0.8, meaning optimization applies once the threshold is below this value. It should not exceed 1.0 (100%). An 80% threshold is used here, meaning optimization occurs when lps_demand () is less than 80% of base_demand (). Thus, values must be at least 20% lower than , reflecting theoretical demand based on area characteristics like average water consumption per inhabitant and population.
Unit_cost: This parameter calculates the cost per unit of base demand, enabling the estimation of total resource costs. It is critical for the evaluation of the financial savings achieved through water optimization. It has no impact on the LR; however, it is used to calculate the costs when predicting with the model, which will allow the estimation of the total resource costs in the network, serving as a cost analysis. For this situation, it has been chosen to be set initially to 0.5, a random value less or equal to the unit. The user can set their own value. The difference concerning the initial value is that the unit cost will be reduced proportionally to the value set, with no interference from the variation in unit_cost in the LR, as demonstrated by tests carried out varying this parameter from 0.25 and 0.75. That parameter has been added to obtain details on the number of resources that could be saved when the optimization condition is reached at nodes.
total_resources: This parameter is set initially to 0 because that symbolizes the starting point where total resources will start to accumulate. This is the ideal starting point, as with each LR performance, total resources that will have been optimized will be obtained; meanwhile, if the value is initialized to a state other than 0, it will always start from a state that is not the one we want for our analysis. It should be pointed out that this parameter is related to the unit_cost parameter, so any change in its value has a proportional effect upon unit_cost. This parameter can collect the total resources used in the system to perform node optimization.
Next, we describe the three above-mentioned algorithms. Algorithm 1 has the functionality to read the provided dataset, which in this case contains the base demands and the lps (liter per second) demands of the nodes of our water distribution scenarios. A machine learning model is trained to obtain a prediction of the based on the , where the model’s performance will be evaluated through the mean absolute error (MAE). Subsequently, a loop is performed at each node, where the cost of the resources that would have to be used to supply that demand will be calculated. For example, suppose the simulated demand, being the value, is lower than the at the threshold determined (80%). In that case, a message on the screen will indicate that optimization has been achieved, reducing the resources by 10%. The 10% reduction criterion was selected based on our preliminary simulation results. When LD (Local Demand) is lower than BD (Base Demand), a conservative reduction of 10% ensures sustainability without risking undersupply, particularly when the system operates near the 80% threshold, a critical level in balancing resource conservation and user demand. In the opposite case, a message will be displayed indicating that optimization has not been achieved, where the total resources of all nodes will be accumulated. This algorithm optimizes water distribution using an ML model to predict when the will be less than the at the threshold to be determined, reducing the resources supplied to those nodes.
Global variables are defined, and then the dataframe is loaded, consisting of
and
, extracted from the EPANET simulation. In lines 4 to 6, the model is trained and evaluated by training the LR model, predicting values, and calculating the MAE error. For the optimization process, a dataframe is loaded (line 8) with data from each node, where for each node, the optimized resources will be calculated, where the id, base demand, and lps demand will be obtained, the initial resources will be calculated, and if the
is below the threshold, a 10% reduction in resources will be applied; otherwise, it will not be considered to have been optimized, and the resources that are optimized will be accumulated. The aim of Algorithm 1 is to predict the lps demand to optimize the resources by comparing the actual demand with the predicted demand at each node, where the final purpose is to reach water resource reduction through optimization of water demand.
Algorithm 1 Water Distribution System Optimization |
- 1:
Globals: , , - 2:
Load Data: , - 3:
Train Model and Evaluate: - 4:
train_test_split(X, y, 0.2, 42) - 5:
LinearRegression().fit(X_train, y_train) - 6:
mean_absolute_error(y_test, ) - 7:
Optimization Process: - 8:
pd.read_excel(’city_.xlsx’, names=[’node_id’, ’base_demand’, ’lps_demand’]) - 9:
for in
do - 10:
, - 11:
- 12:
if then - 13:
print(“Successful optimization at node”, ) - 14:
- 15:
else - 16:
print(“Optimization was not achieved in the node”, ) - 17:
end if - 18:
- 19:
end for
|
Algorithm 2 mainly implements a linear regression model to predict the base demand as a function of pipe diameter in a water distribution environment, where the data loaded into the model come from the results of the simulations, where the measurements of the selected pipe diameters and the base demand corresponding to each diameter are collected for all the nodes deployed in each scenario. Data arrays of diameter and base demand data are extracted from the dataframe, and a linear sklearn regression model is created to be trained with the diameter data (as the independent variable) and base demand (as the dependent variable). Diameter is expected to have the shape that the model expects. Finally, the regression and intersection coefficient of the trained regression line are plotted. This linear regression is intended to derive a mathematical model that captures a relation between pipe diameter and water demand, where this model allows for the prediction of the demand for new diameter values.
Algorithm 3 has the function of reading the set of data provided, which, in this case, will contain the values of node_id, base_demand, lps_demand, and node_inhabitants, to calculate and analyze the potential water savings per node, and achieve the total savings of the network, considering that water savings will only be calculated in those nodes that meet the optimization condition of Algorithm 1. In addition, this algorithm will allow the identification of nodes with a greater opportunity to optimize consumption. The dataframe is loaded in lines 1 and 2, where the values of base_demand and lps_demand, extracted from the EPANET simulation, and node_inhabitants will be obtained. Subsequently, in lines 3 to 15, a function is defined to calculate the savings per node. Before calculating savings, it is necessary to set an optimization condition (line 6), to obtain savings only for those nodes that optimize resources. Considering the previous condition is fulfilled, it is necessary to contrast base_demand with lps_demand to calculate savings. As a result of the variation between both fields, savings in liters per second are obtained, resulting in the lps_saving variable. In this scenario, we want to obtain the savings in liters per day, so using the conversion factor, the units are changed, obtaining variable lpd_saving, and subsequently, a savings value per inhabitant will be obtained. This function is applied to each row of the dataframe, where the column called saving_lpd containing the value of savings per inhabitant of each node is added. Finally, the total resources saved are obtained by adding up all the individual savings. To enhance visual analysis, a condition is set that the nodes are ordered from the greatest to the least in terms of resource savings. The objective of Algorithm 3 is to obtain the individual savings per inhabitant after optimization in a node and provide the total savings achieved due to optimizing all the nodes that make up the network.
Algorithm 2 Linear Regression Model Training and Prediction |
- 1:
Load Data: - 2:
pd.read_excel(’city.xlsx’) - 3:
- 4:
- 5:
Create Linear Regression Model: - 6:
LinearRegression() - 7:
Train the Model: - 8:
- 9:
Extract Coefficients: - 10:
print() - 11:
Extract Intercept: - 12:
print() - 13:
Generate Prediction Data: - 14:
np.linspace(min(diameter), max(diameter), 100) - 15:
model.predict(diameters_line.reshape(−1,1))
|
Algorithm 3 Water Savings Algorithm |
Require: : Threshold value for optimization
- 1:
Load Data: - 2:
- 3:
function calculate_saving() - 4:
- 5:
- 6:
if then - 7:
- 8:
- 9:
- 10:
- 11:
return - 12:
else - 13:
print “Optimization not achieved in ”, - 14:
return 0 - 15:
end if - 16:
end function - 17:
function optimize_water_savings() - 18:
calculate_saving function to each row - 19:
- 20:
print “Total savings: ”, , “ liters/day” - 21:
- 22:
- 23:
return , - 24:
end function
|
In terms of complexity, the algorithm used for optimizing water distribution networks with a linear regression model is efficient both in time and space. The time complexity of training the linear regression model is , where n represents the number of data points, and p represents the number of features. This is relatively efficient, especially when compared to genetic algorithm-based approaches common in related studies, which generally have a higher complexity of . Additionally, the main optimization loop, which iterates over each node to calculate and apply resource-saving measures, has a linear time complexity of , where N is the total number of nodes in the network. This ensures that the algorithm scales well with the number of nodes. The space complexity of the algorithm is also minimal. The linear regression model requires space to store model coefficients, which is manageable given that the number of features p is generally small compared to the number of data points n. Furthermore, the algorithm maintains memory for data storage and accumulations, resulting in an overall space complexity of to store node-specific information, with an additional space for constants such as the threshold_optimization, unit_cost, and total_resources parameters. Consequently, the total complexity of the algorithm is in time and in space, making it both time-efficient and space-efficient for real-time urban water network optimization.
The research compares the water distribution in three scenarios using EPANET and linear regression. Scenarios with fewer nodes relied on relief characteristics for diverse results. Algorithms developed aimed to optimize demand based on base demand and diameter through machine learning, achieving water savings at most nodes. Further analysis will explore these findings and their broader implications for sustainable water management.