1. Introduction
Research and analyses of water demand for the purposes of, e.g., housing, services, or public purpose buildings has been enjoying increasing popularity in recent years. It is primarily due to the development of technology allowing for the measurement of water flow or automatic water meters readings (AMR) with a higher frequency than so far (once a month, quarter, or even year—depending on the adopted system of billing settlements), but also due to an increase in the use of computer networks in the exploitation of modern water distribution systems, allowing for the collection of any amount of data. This progress permits the measurement of water flow with any time interval [
1].
The preparation of an accurate diagram of water demand is essential from the point of view of mathematical modelling of water distribution systems. It also provides the basis for appropriate designing and dimensioning of water supply networks, connections, and selection of the water meter [
1,
2].
Water demand is determined by a number of variables depending on the type of object to which water is supplied. In housing, the primary factors determining the course of water consumption during the day include daily behaviours of residents, their lifestyle, and routine activities involving the consumption of supplied water. Another important aspect is the day of the week, i.e., whether it is a working day or a holiday (weekend or a bank holiday). The announcement of the state of epidemic emergency in the country, caused by the virus SARS-CoV-2, could have considerably affected daily behaviours of residents. The COVID-19 pandemic has imposed massive health and economic burdens on communities worldwide, and no sector of society is going untouched, including the water sector [
3]. Moreover, while business is up for a handful of sectors, such as hospitals and some food production, other important water-using sectors have slowed or shut down entirely. The average effect of the COVID-19 pandemic on total water demand varies, depending on the relative proportion of residential water consumption [
4]. Issues related to water consumption are a growing challenge in terms of sustainability, especially in developing countries, although reaching sustainable water resources development is a matter of global importance [
3], especially during the pandemic period.
Due to high variability of water consumption, however, measurement of water flow and data collection on the server are insufficient for their use in, e.g., mathematical modelling. It is necessary to perform analysis with the application of a statistical model that allows for standardisation, classification, and selection of the most probable water consumption histograms [
4], which is the topic of conducted research described in this article.
1.1. Background
Interest in processing databases of water meter records has been increasing in recent years due to increasingly frequent application of computer hydraulic modelling in the management and dimensioning of water supply networks. Hydraulic modelling of water supply networks is performed with the application of EPS, determining quasi-dynamic behaviour of the system in time, by calculating the state of the system for a series of determined simulations in which hydraulic demand and threshold conditions change in time. Credibility of a hydraulic model for conducting any analyses of the operation of the distribution system requires its relevant calibration [
5]. This process requires accurate measurement of water consumption with the application of AMR throughout the water supply system, and then defining threshold conditions within the consumption range, not only in the form of mean diurnal consumption in computational nodes, but also their changes in time. The already developed models of water consumption can be found in the literature. For example, Obradović and Lonsdale [
6] devoted a considerable part of their publication to models of water consumption for an exceptionally broad circle of recipients, including a hospital, university, school, church, bank, hotel, authority office, military barracks, prison, industrial factories by type, and many others. In the introduction, the authors emphasised, however, that an attempt to determine “typical patterns” of water consumption is doomed to be unsuccessful due to the overlapping of a number of factors, often difficult to identify, that affect water consumption [
6]. Therefore, the proposed form of processing data involved averaged hourly models of water consumption with their maximum and minimum envelopes.
As mentioned above, such models can also be obtained through AMR solution and further analysis of their results. It should be emphasised that this type of solution is possible for all recipients and requires limitation to designating so-called reference recipients for whom hourly patterns of water consumptions are identified and then ascribed to particular nodes of the hydraulic model [
5,
6].
Although the technique of allocation of water consumption variables in time for particular computational nodes based on the selected reference recipients appears simple, unfortunately, it is not common. The conventional method described as the top-down approach is applied considerably more frequently. It assumes water consumption in nodes of the model determined only for entire zones of water supply and then ascribed to all nodes of the network with consideration of correlation coefficients [
7]. This not only permits the simplification of the development of computer models of water supply networks, but also fast obtainment of convergences of computational algorithms in the calibration process. The resulting image of water flows, however, can carry a significant error.
Considering the above, Blokker et al. [
7] proposed an entirely different method of estimation of water consumption in computational nodes of the network, described as the bottom-up approach. They developed a stochastic model of water consumption by the final recipient (for a unitary water supply connection), based only on statistical information on water users such as: number of residents, their age, frequency of water consumption, duration and rate of water flow per water consumption episode, and frequency of occurrence of different types of water consumption for different purposes during the day (e.g., for flushing the toilet, washing clothes, and washing hands). The model is called SIMDEUM (Simulation of water Demand; an End-Use Model) and is based on results of earlier research by Buchberber and Wells [
8]. Its authors evidenced the possibility of describing water consumption patterns for household purposes by means of the non-homogeneous Poisson process (PRP).
The PRP model in reference to modelling water consumption found application among others in countries such as the USA [
9], Spain [
10], or Mexico [
11]. In each case, the implementation required conducting a measurement campaign, and the obtained parameters of the PRP model showed mutual differences. The SIMDEUM model appears to have more practical application, because it does not require conducting research on water flows and is limited to the identification of statistical parameters of end consumption. This offers the possibility of generating models of water consumption also for non-existing areas planned for expansion. Moreover, the model can be applied not only to water consumption for household purposes, as in the case of the PRP model, but also to recipients such as, e.g., hotels, office buildings, or social care facilities. Both models generate consumption patterns with high resolution for the individual recipient, whereas hydraulic models of water supply networks are calculated with a 1 h time step. Therefore, Alvisi et al. [
12] proposed a special spatial–temporal procedure for the aggregation of synthetic water consumption originally generated at the level of a single recipient and small-time step (e.g., 1 min–SIMDEUM and PRP) to synthetic series of consumption related to a group of recipients with a temporal resolution of 1 h.
Next to the already discussed models, Aksela and Aksela [
2] proposed another solution employing the probabilistic model for generating patterns of water demand in single-family and semi-detached housing, with a temporal resolution of 1 h. The computation procedure involves the following first two steps: forecasting of weekly water consumption based on the linear regression model and division of the analysed water recipients into four separate classes based on the recorded mean weekly consumption. The division was performed by means of the cluster analysis and specifically k-means clustering. Then, the probability distribution of water consumption in time was modelled with the application of mixture of Gaussian models, and eventually, final probabilistic water consumption models were developed as a result of sampling from the probability distributions determined earlier.
The aforementioned examples of publications focusing on the issue of modelling of water consumption patterns reveal the current availability of specialised computer models allowing for the determination of the course of water consumption by individual recipients, primarily in housing. This issue, however, still enjoys great interest in the scientific circles, which provided the basis for this paper. For this purpose, a water consumption measurement campaign was conducted in three apartment buildings with a temporal resolution of 1 h. The temporal resolution resulted from the possibility of data registration by the existing AMR system. The determination of patterns of hourly demand for water in these buildings employed the cluster analysis, namely k-means clustering.
1.2. Formulation of the Model
The cluster analysis is a discipline in multi-dimensional statistics that includes a group of methods for the identification of uniform subsets of elements. Based on variables characteristic of our elements, the cluster analysis finds groups (clusters) of elements that are similar to elements belonging to the same cluster and simultaneously differ from those in the remaining clusters [
13]. Importantly, depending on the adopted method, the division can occur to an a priori non-defined or defined number of groups (also known as clusters). The obtained groups cover subsets of the analysed population satisfying the conditions of their decouplability and completeness [
13]. The division of the set of water consumption diagrams initially involved the application of the hierarchical agglomeration method and then the non-hierarchical method–k-means clustering.
The hierarchical agglomeration method initially assumes that each element is a separate cluster. Then, it gradually combines the mutually nearest (most similar) elements into new clusters, until a single cluster is obtained. The determination of sufficient similarity of two clusters requires defining measures of distance between the elements and developed clusters, and the rules of their combining. A review of available distance measures and their characteristics can be found in the publication by Stanisz [
13] and the cited source literature [
14]. The analyses employed the most popular and considered most natural distance metrics: Euclidean metric and square Euclidean distance. The Euclidean distance metric
d(x,y) of elements
x and
y is represented by the following Formula (1) [
13,
15]:
where
x = (
x1, …,
xr) and
y = (
y1, …,
yr). Formula (1) for
p = 2 and
p = 3 is the equivalent to distance on a plane and in space of two points
x and
y.
In the case of methods of linkage, a broad range of algorithms is also available, primarily including the following methods: single linkage, complete linkage, unweighted pair-group average, weighted pair-group average, unweighted pair-group centroid, weighted pair-group centroid, and Ward’s method. The analysis of water consumption employed the popular unweighted pair-group average method, where differences (distances) between all pairs of elements included in particular clusters are calculated. Averaged differences between all pairs are adopted as the measure of distance between particular clusters. The application of that computation method allows for the determination of which elements are similar to each other and can be included to the same clusters, and to what degree particular clusters are similar to each other and can be combined into structures of larger clusters. As a result of these methods, development of characteristic “chains” composed of elements combined into sequences forming clusters is observed for similar elements of the set. This permits easy identification of mutually strongly similar elements. The aforementioned chains are evidently visible in dendrograms resulting from agglomeration, constituting graphical illustration of the structure of a set of elements by decreasing similarity between its elements (and therefore increasing linkage distances).
The methods presented above lead to obtaining a dendrogram in which lower grade clusters are included in the composition of higher-grade clusters. K-means clustering is considerably different. It is a non-hierarchical method resulting in a breakdown in which no cluster is a sub-cluster of another cluster. In k-means clustering,
k clusters are designated, differing from each other to the highest possible degree. It is necessary to assume a priori a specific number
k of clusters into which the set of elements is partitioned. Therefore,
k subsets are developed, and then objects are moved between them for the distance between them within subsets to be possibly small, and for the distance between clusters to be possibly large. The transfer procedure is repeated towards the most effective distinction of clusters. It can be traced in detail based on the calculation example presented in the publication by Larose [
15]. In the study, the possibility of use of the clustering method was preceded by analysis with the application of the agglomeration method. As a result, a potential range of the number of clusters was determined, to be considered in the case of partitioning of patterns of hourly water demand in the object. Then, from that range, the optimal number of clusters was determined, conducted with the application of the analysis of total within sum of squares (
wss) and Caliński and Harabasz Index values (
CHIndex) for a different number of clusters. The total within the sum of squares
wss is calculated based on the Formula (2) below:
where
k—number of clusters,
x—element of cluster,
Ci—
i-th data cluster,
mi—centroid of of the cluster
i, and
—Euclidean distance between two vectors.
The
CHIndex is calculated based of the following Formula (3) [
16]:
where
k—number of clusters,
N—total number of observations (elements of the set),
SSB –total variance between clusters (trace of interclass covariance matrix), and
SSW—total within cluster variation (trace of intraclass covariance matrix).
It is assumed that the optimal number of clusters equals obtaining the maximum value of the
CHIndex. A high value of the index is related to maximising the ratio of
SSB and
SSW, which means that particular clusters show considerable differences, and elements of the set grouped in particular clusters show strong similarity (relatively weakly differentiated). The
wss value is also important. It naturally decreases with an increase in the number of clusters, and after reaching the optimal number of clusters, the rate of the decrease substantially decreases [
16].
As a result of adopting the number of clusters, after running the k-clustering algorithm, it is possible to eventually assign specific patterns to the adopted clusters but also to prepare diagrams of averaged histograms of accumulated water demand for such clusters. The clustering algorithm was run three times by means of the bootstrap method, used for estimating the distribution of estimation errors, by means of multiple random drawing with sample return. This means that the clustering algorithm was performed each time for random samples from the entire set of water demand diagrams. Results obtained in subsequent iterations are comparable, permitting the determination of the bootmean parameter values. The bootmean parameter is calculated as the average value of the Jaccard index (Jaccard similarity coefficient) for each cluster. The Jaccard coefficient itself measures similarity between two sets and is determined as a quotient of the power set of the intersection of sets and power set of the sum of these sets. It is assumed that the value of the bootmean parameter should be higher than 0.6, because it is presumed from this value that the designated clusters do not include a random cluster, i.e., a cluster that includes patterns deviating from the remaining clusters but at the same time mutually dissimilar [
16].
3. Calculations
Data initially obtained from AMR system regarding hourly water consumption in the object were presented in a table with one column specifying the date and time of reading, and the other total hourly flow through the water meter in units dm3/h. The data were partitioned with consideration of two periods: before the outbreak of the COVID-19 pandemic and during the pandemic. Then, the water flow data were processed to obtain data matrices in which rows represented subsequent days of the study, and columns represent the percent contribution of hourly water consumption (of total diurnal water demand). It should be emphasised that each row of the developed matrices formed an individual histogram of water consumption for each day. A total of 6 data matrices were prepared. They were used in subsequent stages to perform classification by means of the cluster analysis.
The first stage of the cluster analysis employed the hierarchical agglomeration method in which the determination of the measure of distance between elements and the resulting clusters employed the Euclidean distance metric, and the linkage rule was determined by means of the unweighted pair-group average. The set agglomeration resulted in a graphical illustration of the structure of the set of elements by decreasing similarity between its elements (i.e., increasing linkage distance). Dendrograms for each of the objects with division into measurement periods are presented in
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6. They suggest the occurrence of clusters, i.e., groups of demand patterns with similar histograms. Strong outlier elements (histograms) were also identified, for example, day number 124 for apartment building 1 (
Figure 1).
It should be emphasised that the cluster analysis is very sensitive to strong outliers. Therefore, it is recommended to remove them from the set of elements. Before further analyses based on the dendrograms (
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6), the strongest outlier histograms were eliminated from the set. Then, agglomeration of data was performed again, resulting in new structures of the set of elements. The operation of exclusion of days with anomalies in water consumption was performed for buildings 2 and 3 in the period before the pandemic and for buildings 1 and 2 for the period during the pandemic. In the case of apartment building 1 for the period before the COVID-19 pandemic, as well as apartment building 3 for the period during the COVID-19 pandemic, the current dendrogram was kept due to single outlier days that did not considerably affect the further part of computations.
Therefore,
Figure 7 and
Figure 8 present dendrograms for buildings 2 and 3 from the period before the pandemic the structures of which changed considerably. The set is evidently divided into two clusters with single histograms deviating from these two clusters.
Figure 9 and
Figure 10 for buildings 1 and 2 from the period during the COVID-19 pandemic also show changed structures. In their case, the partitioning into clusters is ambiguous.
The analyses of data agglomeration resulting in dendrograms permitted the determination of the potential range of the number of clusters that should be considered in the case of partitioning of patterns of hourly water consumption in buildings. The optimal number of clusters required for k-means clustering was determined based on the analysis of the total
wws and
CHIndex for different numbers of clusters. Both of these analyses were again performed and presented in the graphic form for each object separately with division into the period before the pandemic (
Figure 11,
Figure 12 and
Figure 13) and the period during the COVID-19 pandemic (
Figure 14,
Figure 15 and
Figure 16).
For all objects in the study period before the COVID-19 pandemic (
Figure 11,
Figure 12 and
Figure 13), the optimal number of clusters
k should be adopted as 3. The optimal number of clusters equal to 3 results from obtaining the maximum
CHIndex (
CHIndex = 17.38; 41.07; 44.59, respectively) and from the course of the
wws diagram whose rate of decrease in the parameter value from the same point evidently decreases. In the case of apartment building 1 (
Figure 11), a second evident decrease in the
wss parameter value is observed for a higher number of clusters equal to 13. For the optimal number of clusters equal to 13, however, the
CHIndex value is not the highest value in the analysed range from 2 to 20 clusters.
For objects in the period during the COVID-19 pandemic (
Figure 14,
Figure 15 and
Figure 16), the optimal number of clusters
k is different depending on the analysed building. For buildings 1 and 2, the optimal number of clusters
k should be 7, and for apartment building 3, it should be 5. This again results from obtaining the maximum value of the
CHIndex and course of the
wws diagram whose rate of decrease in the parameter value from the same point evidently decreases.
It should be emphasised that in the case of this study period, the CHIndex did not reach a very high value:
CHIndex = 8.03 for apartment building 1,
CHIndex = 7.83 for apartment building 2,
CHIndex = 10.17 for apartment building 3.
This means that particular clusters do not considerably differ from each other, and the elements of the set grouped in particular clusters are mutually similar (relatively weakly variable).
After the determination of the optimal number of clusters for each set of normalised histograms of accumulated water consumption in 3 objects with consideration of the study period (
Table 3), k-means clustering was performed. The results and their analysis are presented in the following section.
4. Results and Discussion
K-means clustering analysis employed the clustering algorithm run multiple times with the application of the bootstrap method. This permitted the determination of the bootmean parameter, used for the determination of the probability of elements in a given cluster. The value of the parameter should be higher than 0.6, because it is presumed from this value that the designated clusters do not include a random cluster, i.e., a cluster that includes outlier patterns that are mutually dissimilar.
Table 4 presents values of the bootmean parameter for particular clusters determined during k-means clustering. Numbers of clusters were automatically assigned by the computational algorithm. The bootmean parameter value did not exceed 0.6 for any cluster. Values above or approximate to that threshold were primarily obtained for the study period before the outbreak of the COVID-19 pandemic. This means that less incidental and outlier events occurred in that period. During the COVID-19 pandemic, use of water by recipients showed a certain degree of chaos that made it impossible to classify the data credibly. This means that a large portion of histograms of diurnal water consumption was mutually dissimilar.
Then, clusters obtained in the cluster analysis, i.e., averaged histograms of accumulated water consumption in buildings (consumption patterns) were analysed in terms of the classified type of days. The results of this procedure for all objects during and before the pandemic are shown in
Table 5. The data show an evident division of histograms into patterns for business days and days free from work. For example, for building 2 in pattern No. 1, 99 business days were classified, and only two days were free from work; therefore, the pattern can be recognised as typical of business days. In the case of apartment building 1, pattern No. 3 covers 74 business days and 35 days free from work. Recognising it as typical of days free from work is only possible after analysing the course of the diagram.
Pattern No. 2 for building 1 also draws attention. It covered only 10 days among all days included in the analysis. This means that only 3.6% of the analysed days strongly deviated from water consumption diagrams typical of the object. This is also confirmed by the bootmean parameter value, which, for the remaining patterns, is higher than 0.7, i.e., histograms for these clusters are strongly similar.
Diagrams of accumulated water consumption obtained in the cluster analysis in the period before the pandemic in buildings with partitioning into patterns for business days and days free from work are shown in
Figure 17 and
Figure 18. It was observed that although patterns obtained for three different objects are compared, their course is similar, and they differ only in the volume of water consumption in a given hour.
Due to the characteristics of their course, histograms generated for business days (
Figure 17) are distributions of diurnal water demand typical of that type of day. They reach two maximums: first in the morning hours (7:00–9:00) when people prepare for work or school, and second, which is considerably greater in the evening hours, when residents spend time preparing meals and bathing (19:00–22:00).
In the case of days free from work (
Figure 18), averaged histograms of water consumption have a somewhat different course in comparison to business days. The first difference is a shift of maximum water consumption in the morning hours, starting only at 9:00 and lasting until 12:00. This suggests that recipients rest longer on these days. Moreover, before noon, maximum water consumption is also considerably greater than in the evening. This suggests that water is consumed not only for daily hygiene activities but also for the purposes of, e.g., cleaning or preparing meals [
19].
As was mentioned above, pattern No. 2 for building 1 showed strongly untypical diagrams of diurnal water consumption by recipients. This is also confirmed by the course of the averaged histogram for 10 days classified for that cluster (
Figure 19). Due to its characteristics, i.e., maximum water demand around noon (between 9:00 and 14:00), the pattern could be classified as days free from work. According to
Table 5, however, the cluster primarily covers business days (70%). Moreover, due to the low number of days in the cluster and low value of the bootmean parameter (0.2810), it cannot be considered representative for analyses, e.g., in mathematical modelling of water supply networks. These histograms should be interpreted individually.
In the case of research conducted after the announcement of the state of epidemic emergency, 7 synthetic patterns of water demand were generated for buildings 1 and 2, and 5 patterns for building 3. The entire data set was divided between particular clusters in a practically even way. No cluster includes a large majority of classified days (
Table 6). Moreover, bootmean parameters for the designated clusters reached the threshold of a value approximate to or higher than 0.6 only in single cases—for pattern No. 2 in building 1 and for pattern No. 1 and 3 in building 3 (
Table 4). This shows a high level of uniqueness in the way of water consumption by recipients.
Wanting to verify the results of classification in terms of type of days assigned to particular clusters, however, in the majority of cases, based on the number of particular days in those sets, we are not able to assign them to a specific type of day (
Table 6). For example, pattern No. 5 for building 1 covers 19 business days (63.33%) and 11 days free from work (36.67%). Similar situations are observed in patterns: No. 6 for building 1; No. 1 and 3 for building 2; and No. 4 for building 3. The remaining patterns show a considerable dominance of one type of day in their set.
It was also observed that some clusters cover a number of assigned days low in comparison to the entire set. They are among others: pattern No. 7 for building 1, including 8 days (which constitutes 4.3%), and pattern No. 5 for building 3, including only 3 days (1.6%).
Presentation of the obtained synthetic diagrams of water demand during the COVID-19 pandemic in the buildings was attempted in
Figure 20 and
Figure 21 with consideration of the partitioning into patterns for business days and days free from work. The obtained patterns showed different courses. Moreover, they differ from those before the pandemic among others in higher water consumption during the day and by night, between 2:00 and 4:00.
Due to reaching two maximums (in the morning and evening), histograms generated for business days (
Figure 20) constitute specific typical distributions of diurnal water demand. Only water consumption during the day is untypical. It is probably the effect of introduction of remote work in the majority of companies and remote learning at schools.
For days free from work (
Figure 21), averaged histograms of water consumption are somewhat more approximate to those generated for the period before the pandemic, i.e., with a shifted maximum in water consumption in the morning to 9:00, and water consumption in the evening considerably lower than in the morning. Each synthetic pattern, however, has its own individual course, and they show no similarities. The synthetic diagrams of water demand presented in
Figure 20 and
Figure 21 therefore confirm a certain type of disturbance in water use by residents in the 3 study objects.
5. Conclusions
This paper’s objective was to develop a methodology supporting clustering and generation of synthetic distributions of diurnal water demand in apartment buildings for the purposes of mathematical modelling that is increasingly frequently applied in the processes of management and dimensioning of water supply networks [
5]. It particularly involves hydraulic computations with the application of the EPS simulation, i.e., for longer temporal horizons, allowing for the understanding of the hydraulics of the distribution system, tracing changes in flows in time, or designation of zones of water mixing. This requires the investigation of the temporal and spatial dynamics of water consumption in particular nodes. A currently accepted simplification is the application in the mathematical model of the top-down approach in which consumption in line with that recorded in measurement points or pumping stations is imposed. A number of publications and scientific studies show that only the bottom-up approach is appropriate. It involves a determination of nodal consumption from the level of an individual recipient. This approach, however, requires knowledge of patterns of water demand in different objects [
20,
21].
Research of diurnal time series of water consumption was conducted in three mutually similar apartment buildings constructed in the 1980s in the same housing estate in Bydgoszcz. The time of recording of hourly water consumption covered a total of 464 days, whereas 276 days occurred before the announcement of the state of epidemic emergency in the country, and the other 188 days were recorded during the COVID-19 pandemic [
22]. The results were analysed in terms of possibilities of the application of the cluster analysis for clustering and generation of synthetic distributions of daily water consumption. It should be emphasised, however, that before performing computations with the application of the k-means method, it was necessary to eliminate outlier days from the data set. This procedure was performed based on dendrograms developed by means of the hierarchical agglomeration method.
The application of the cluster analysis—k-means clustering—permitted the development of characteristic patterns of hourly water demand with division into business days and days free from work and holidays. The division was particularly unambiguous for the period before the pandemic. In the process, three synthetic patterns of hourly water consumption were generated. Two of them were patterns typical of business days, and the third one was a pattern for days free from work with a low number of business days, predominantly close to weekends (it probably results from the so-called “long weekends” in the holiday period). The exception was pattern 2 for building 1. It covered only 10 days completely deviating from the remaining ones and simultaneously dissimilar towards one another, which should be interpreted individually. The resulting averaged histograms of water consumption for particular buildings could be used for the determination of nodal water consumption in mathematical modelling of water supply networks.
A somewhat different situation was observed in the case of recording of data during the COVID-19 pandemic. Considerably more patterns of water demand were generated, and no cluster included a large majority of recorded data. Moreover, the majority of the designated clusters were too similar to each other or included histograms with a random character, as evidenced by the bootmean parameter value. A number of untypical behaviours in terms of water consumption were also observed, for example: water consumption by night, between 2:00 and 4:00, or increased varied water consumption during the day. A change in water consumption by recipients probably results from the introduction of remote work in the majority of companies and remote learning at schools. This suggests that credible classification for this data set is impossible due to the high uniqueness of water consumption by residents. Using the data in a mathematical model of water supply networks would require partitioning the set into smaller subsets and performing the clustering again. It may turn out that the generated clusters include a single or a low number of days. There is no doubt, however, that the COVID-19 pandemic has greatly influenced the daily water demand patterns worldwide [
3]. Many studies [
19,
23,
24,
25,
26,
27,
28,
29,
30] show that during COVID-19 pandemic, water consumption patterns have followed other notable trends of the new normal ‘stay at home’ life. Families are getting their day started later, with peak morning consumption shifting two hours later. In addition, comparing pre-COVID-19 to current residential usage patterns, it is clear that the highest increase in water usage is happening in the afternoon as stay at home schoolers and workers are taking a break, getting up to use the restroom, washing their hands, and prepping meals. In general, people on average are using the bathroom at home three times more each day, and they are flushing the toilet five times more per day than before the pandemic [
24,
27,
29,
30]. They are also showering almost three more times per week than they did. In addition to the increase in the number of showers, the time of day when people are showering has also changed, further reflecting a later start to their day: midday and evening showers increased while morning showers shifted later in the day [
29,
30].
To sum up, this paper presents the method of clustering and generating synthetic diagrams of diurnal water consumption for the purposes of mathematical modelling. It also points to the variability among the generated patterns and the way the COVID-19 pandemic affected water consumption. Therefore, hydraulic analyses with the application of mathematical models require continuous records of water consumption by recipients for the purpose of updating the synthetic patterns of water consumption.