1. Introduction
A few of the changes and problems the world’s population is currently facing are related to the climate, electricity, food, water, transport, utilities, health, education, administration, and industry. Cities use 75% of the energy produced and are answerable for 80% of all dioxide emissions, although they only cover 2% of the planet’s surface. Future cities will need to adapt in order to counteract the effects of factors such as environmental change, population expansion, and social mobility, together with migration, human conflicts and unfairness, economic globalization, technological advancements, food, water, and energy vulnerability, geostrategic shifts, etc. Future cities must manage infrastructure and resources more intelligently to meet people’s needs both now and, in the future, as the world becomes increasingly “urban” [
1,
2].
Mobility is at the core of modern society when the aforementioned factors are considered. Over the next 20 years, there will be a lot of changes in this area as global auto markets and the transportation industry are reshaped by electrification, shared mobility, vehicle networking, and autonomous vehicles. This transformation is supported by technological advancement and other crucial variables, such as legislative directives pointing these two sectors in the direction of low-carbon solutions and increased fuel efficiency [
3,
4,
5,
6].
In the medium and long term, automakers and major EV operators are ensuring that the decarbonization goals are treated to a higher standard. Around the world, there are currently more than 7 million electric vehicles (EVs) in use, and other aspects of road transportation, such as the freight industry, are being electrified.
The electric vehicle is not a recent invention; it first emerged alongside internal combustion engine vehicles. Electric cars outperformed all other car types between 1890 and 1900. Because they were less noisy and polluted than cars with internal combustion engines, they significantly increased in popularity at the beginning of the 20th century. However, the short battery life of electric automobiles meant that their owners could travel great distances.
In recent years, technological advancements and concerns related to climate change have progressively sparked the resurgence of electric automobiles. The transformation from internal combustion to electric engines, which is now occurring in the automotive area, represents the most significant change. To adjust to changing market conditions, automakers have undertaken massive financial investments and unforeseen alliances [
7,
8,
9].
EVs have been more and more popular in recent years due to their capacity to provide a variety of benefits, including [
9]:
Energy efficiency: Electric vehicles use less energy than vehicles powered by traditional internal combustion engines (ICEs).
Electric mobility improves energy security because the road transportation industry is so reliant on petroleum-based fuels. Additionally, electricity may be created using a range of materials and fuels and is frequently produced locally.
Air pollution: Since electric vehicles produce no emissions, they are an excellent solution to the issue of air pollution, particularly in densely populated regions and those nearby, where many citizens can be exposed to dangerous toxins from transit vehicles.
Greenhouse gas (GHG) emissions: Combined with a progressive increase in the production of low-carbon energy, increasing electric mobility can result in considerable reductions in GHG emissions from the transportation infrastructure when compared to other traditional vehicles. Additionally, electric vehicles can behave according to the integration of renewable energy, which is often unpredictable for generating electricity, and offer flexible services for power systems.
Noise reduction: Electric vehicles, especially those with two or three wheels, are quieter than ICE vehicles.
Industrial advancement: Given the relevance of energy storage for the switch to “clean” electricity, electric vehicles also have the capability of storing generated energy. In essence, battery technology, one of the fundamental factors in industrial competitiveness, is a possible facilitator for the significant reduction in cost in the electric car industry.
In many cities across the world today, one can find personal automobiles, common transport, car sharing, taxis, municipal parking lots, two- and three-wheeled machines (mainly electric scooters), as well as an expanding number of commercial and freight vehicle sectors. The proposed study topic fits within the background and is related to actual policies in the decarbonization of transportation areas, minimization of urban pollution, and the integration of EVs into electrical systems [
10,
11,
12].
This study’s objectives were to analyze the impact and potential effects of integrating many EVs into the power network, as the efficiency could be influenced in both positive and negative ways by the need to charge all these EVs, as well as suggest several strategies and measures to integrate them. The article discusses the effects of integrating electric vehicles into the electric distribution networks and analyses those effects from a technical standpoint in terms of how they affect urban electric distribution networks and how that affects changes in electricity usage.
One challenge is connected to energy management, the impact of which may be reduced if the charging of EVs is achieved outside the peak period, which is likely to occur when big fleets of electric vehicles are incorporated into power networks. Electric vehicles (EVs) can be viewed from the perspective of electrical networks as either basic tasks (with continuous consumption) or using the Vehicle-to-Grid (V2G) idea, where storage devices plan recharging intervals or they can inject grid energy using their energy storage devices [
9,
10]. In the case of a significant uptake of EVs, it will be important to coordinate their functioning both as a potential source of revenue and as a new aggregator that affects the electrical network by managing demand.
If EVs are gathered or aggregated, they can contribute to the balancing process by lowering grid usage or injecting electricity into the grid as needed by helping to coordinate the load.
The article is organized in the following way: a presentation of the clustering methods used in this study is first given; electrical load profiling, load-type profiles correlated with low voltage (LV) consumers, and a distribution network loading simulation are discussed along with the results obtained, and then the conclusions are presented.
In recent years, technological developments in the energy domain, including the introduction of smart meters and the transition to the concept of “Smart Grids”, have provided transmission and distribution operators with opportunities to forecast the load required by the system, modeling consumers to take into account their behavior, for the prevention of unplanned outages, the optimal load planning of generating units, etc. [
13]. In these cases, operators must manage a lot of data and perform complex analyses in order to make the best decisions regarding the optimal planning and operation of electricity networks. For the efficient management of large databases, there are two aspects that need to be considered: data extraction (data analysis to obtain specific knowledge, patterns, or models) and database management (data storage, processing, and querying). Both related concepts are crucial in the energy networks’ decision-making process [
14]. The choice of technology and the organization of installation, processing, and maintenance operations are the first steps in the adoption of smart metering systems. These technologies mark a significant development in the interaction between users and network operators. If consumer-mounted smart meters were totally integrated into a modern metering infrastructure and data were adapted, then distribution operators (ODs) could have access to full monitoring, which would make it easier to estimate the state of distribution networks [
15,
16,
17].
The transition to the “Smart Grids” concept can lead to the implementation of smart monitoring and remote communications equipment necessary for optimal power systems operation and planning that will lead to maximizing economic benefits and minimizing the environmental impact. The phrase “Smart Grids” represents a hyperbole that involves the management of the EEA without the intervention of the human factor. The key components of this idea include two-way communication with consumers and all other market participants, as well as digital control of the energy transmission and distribution network. This smart infrastructure will enable various energy services, markets, integrated distributed energy sources, and control systems. The world economy in the future will be supported by the smart grid. It suggests that, in many respects, electricity generation, transmission and distribution companies, regulators, and institutions, indeed, all levels of government, face a real challenge in terms of the energy sector which is the driving force behind the world economy [
18,
19,
20,
21,
22].
Starting from the aspects highlighted above, the efficient solution to many problems related to the management of the electric power system goes through the elaboration of solutions based on one form or another of artificial intelligence. Artificial Intelligence (AI) techniques aim to create intelligent computing systems, systems based on the characteristics of human intelligence: reasoning, ability to learn, solve and communicate, systems for problems for which there is no classical computational algorithm. Over time, with the development of AI techniques, hybrid algorithms have been developed and perfected, such as fuzzy logic-controlled neural networks, fuzzy genetic algorithms or expert systems, and artificial neural networks generated by genetic algorithms or neuro systems, having already proven their effectiveness [
23,
24,
25].
Because an AI system has more complex tasks to solve, the knowledge that needs to be represented in it increases (facts, rules and heuristics of the field, general concepts, and theories). In general, a system may work well, in line with the goal set by the knowledge provided, but any move outside its competence causes its performance to decline rapidly. This phenomenon is also called the fragility of knowledge [
26].
Several machine learning techniques are presented in the studied literature, including supervised and unsupervised approaches that have been utilized for energy consumption level predictions. Among the unsupervised learning techniques, clustering is considered one of the most frequently applied techniques in data mining and machine learning. Clustering involves partitioning objects with similar patterns under observation into different groups. A vast number of works on clustering electricity usage patterns have been presented by researchers. The article [
27] generated a typical load profile from data measured with automatic meter reading systems, then performed cluster analysis using three clustering algorithms, specifically, the hierarchical, k-means, and fuzzy c-means algorithms. In [
28], classified daily load curves in industrial parks, which can be regarded as microgrids from the energy network perspective using SOM, then exploited k-means to obtain a number of clusters, and [
29] demonstrated the possibility of applying disaggregation techniques on smart meter data via fuzzy c-means clustering. Similar work is in [
30], where they utilized the k-means algorithm to group residential houses with similar hourly electricity use profiles, and in [
31] where they proposed a method to characterize medium voltage electricity consumers by using several clustering algorithms. In order to choose the best one among the typical load profiles, they measured the performance of the clustering algorithms in terms of eight clustering validity indices. To deal with the scalability and computational complexity of the power consumption profiling process, the authors of [
32] proposed a multi-layered clustering method for power consumption profiling. First, they acquired local power consumption profiles using k-means, considering clusters with a low number of patterns as abnormal power consumption behavior. In the second stage, a global power consumption profile was derived from the local ones. Furthermore, Refs. [
33,
34] applied an improved k-means algorithm with particle swarm optimization (PSO) to open residential buildings datasets to divide their electricity consumption in an entire region into different levels. The authors of [
35] developed a methodology in which one-dimensional time series smart meter data were reshaped to two-dimensional arrays called load profile images. After performing image processing techniques on those images, they derived the class load image profiles via clustering algorithms. In addition, [
36] partitioned customers into electricity user groups based on similar electricity usage behavior with the SOM, k-means, and hierarchical clustering algorithms. Similar to group electricity consumption profiles, the authors of [
37] investigated a shape-based clustering method.
The discussed literature reveals several limitations of the employed techniques from various perspectives of energy consumption prediction. The literature lacks focus on capturing the recognizable patterns in building smart sensing data, which has a limited number of features. These features can be represented in low-dimensional feature space and may affect the overall performance of data analytic tasks. Many of the existing techniques enquire about the number of clusters to differentiate among distinct categories of data. In addition, the presentation of energy consumption for data analysts and common individuals is a common problem that has not been tackled effectively in the existing literature.
Therefore, based on the above-mentioned problems in household energy predictions, this paper presents a new framework with the following main contributions:
There is not a lot of energy consumption data obtained from smart sensors for residential buildings, and the presence of missing data is a difficult problem in statistical analysis. Less than 1% of missing data can be considered a common problem, and between 1 and 5% can be considered a solvable problem. If the percentage is greater than 5%, difficulties may arise in solving the respective problems. Thus, if the values are between 5 and 15%, the problem requires the application of sophisticated solution methods, and if the value exceeds 15%, the problem may have difficulties in interpretation. Finding recognizable patterns in such data is very difficult, which affects the performance of electricity consumption analysis. To solve this problem, five hierarchical clustering methods were used in the clustering process: average distance (Average Method), the center of gravity (Centroid Method), minimum distance (Single Linkage Method), maximum distance (Complete Linkage Method), and Ward, which were the basis for obtaining the load-type profiles presented in the article. Based on the clustering methods used, we propose a method based on load-type profiles which is more robust, reduces calculation errors for off-peak time/peak time, requires less computation volume, and converts small model representations from data with reduced dimensions in high-level representations, comparable to using large databases.
Clustering algorithms require an input parameter to divide data into multiple clusters. This article, by using the method of simultaneous layers in the hierarchical clustering process, average distance (Average Method), the center of gravity (Centroid Method), minimum distance (Single Linkage Method), maximum distance (Complete Linkage Method), and Ward, obtained an adaptive grouping to organize large data.
After dividing the data into several clusters, the regression model based on first- and second-degree polynomials for every consumption class was applied, performing a predictive statistical analysis on the data to determine which buildings had a high, medium, and low level of energy consumption.
2. Materials and Methods
2.1. Clustering Techniques
The use of clustering algorithms allows for the spatial distribution of characteristic vectors to be used as a basis for grouping input data. Each element connected with a set of data will be characterized by a vector whose components are represented by the representative qualities or attributes of the vector in order to examine the similarity or differences between them and to categorize them. The determination of the characteristic/attribute number and their definition requires a deeper analysis of the database designed on the available information and considering the expertise of specialists [
38,
39]. The vectors connected to the input data are grouped throughout the clustering phase based on the estimated distance between each of them. Depending on the analyzed topic, the clustering process will result in one or more clusters (groups, patterns, models, or classes), which describe the spatial position of the qualities taken into account for the process’ elements. Within each cluster, the elements are closer to a common center when compared to other centers belonging to other groups. This aspect is exemplified in
Figure 1.
In
Figure 1, the elements, represented by vectors with two characteristics
,
i = 1,…,
N (where
N is the maximum number of elements of the database subjected to the clustering procedure) were assembled using a similarity principle defined by the distance calculated between vectors. In this mode, two or more components can be associated with the same pattern if the distance between them is smaller in relation to the distance from the elements of another cluster. Finally, each cluster will be characterized by a representative element, determined by mediating the characteristics of the elements that make up the cluster [
40].
2.2. Stages of the Clustering Procedure
Clustering processes can be applied in various domains in order to group unlabeled components. These domains already involve various assumptions, terms, or techniques, related to clustering procedure phases, as a function of the addressed problems.
The steps that must be covered are described below and represented in
Figure 2 [
40,
41].
Step 1. The components subjected to the clustering procedure must be established. In this phase, it is necessary to consider the option from the database that is best suited to the aim of the problem. The type and size of the attributes available for the clustering procedure can be chosen.
Step 2. The attributes/characteristics of the components subjected to the process must be extracted. The identification of the most useful and important characteristics must be achieved. During this process, one or more component transformations can be accomplished to obtain new dominant attributes.
Step 3. A similarity measure must be defined. Usually, the similarity can be determined by measuring the distance connecting adjacent items. Once a vector has been attributed to each component, this length may indicate how similar two elements are. The literature has defined several different distance measurement techniques, with Euclidean distance being the most used.
Step 4. This phase represents the actual clustering procedure. It can be achieved in various modes depending upon the techniques chosen by the decisional factor. All clustering techniques should conduct several clusters for any input data set. If no clusters resulted from the process, other techniques can be applied in order to obtain the desired results. The obtained results can be “clear”, meaning that the separation of the components is achieved in well-defined clusters, or they can be “fuzzy”, meaning that each component has a degree of dependence on each cluster.
Step 5. The results must be extracted. For this, an accurate interpretation of the results is necessary so that their rendering can be achieved in a simple way that is easy to interpret by the decisional factor. In this scenario, either from the point of view of automatic analysis (where a computational system can perform further data processing effectively) or from the perspective of a human, simplicity is required (the representation used for results is easier to understand by decision makers). The extraction of the results from the clustering method is a brief illustration of each cluster using representative components.
Step 6. Evaluation of results. An assessment of the clustering procedure is taken into account when analyzing the validity of the outcomes (represented by clusters), and this evaluation often employs an optimization criterion. This impartial analysis examines whether the outcomes are accurate. If a cluster does not occur unintentionally or for other reasons, it is validated.
2.3. Clustering Methods
Next, certain terms will be defined and the notions that will be used will be briefly described in the next section.
The element , j = 1,…, N (where N is the total number of elements), is a unique object used in the clustering process. It is usually represented by an n-dimensional vector .
Scalar components , i, i = 1,..., n, are called characteristics/attributes of the vector and are established by the decision maker.
The distance between vectors is a metric in the space of the attributes , i = 1, .., n, corresponding to input vectors , j = 1,…, N, and used to determine the similarity between elements. The most used is the Euclidean distance.
For a database X, consisting of the vectors
,
j = 1,...,
N, with n characteristics,
, different distances between vectors can be defined. Thus, if two vectors
and
are taken into account, the distance can be calculated with the relation:
There are several ways to classify clustering methods in the literature. The most used classifications are given in [
40,
42]: hierarchical methods and the K-means method.
Hierarchical clustering methods can be subdivided, according to their meaning, into methods of agglomeration and division. In the case of agglomeration methods, for example, we start from the k clusters, each containing a single element
,
j = 1,...,
N, and by successive mergers, form a single cluster, containing all N elements. In the case of division methods, the direction of deployment is inverse, i.e., starting from a single cluster containing all the
,
j = 1,...,
N elements, we reach
k clusters, each containing a single element
. Agglomeration techniques are usually used more frequently. As shown above, in the hierarchical spatial grouping, an agglomeration process goes through a series of mergers/couplings of groups/classes,
. The first,
, consists of
n “groups” with a single element/object, and the last
includes a single group having all n elements/objects. At each stage, the method couples two close groups (at the first level, of course, this means the coupling of two elements/objects that are close to each other (in distance), since at the initial stage, each group has an element) [
40,
43].
The clustering process can be illustrated as a two-dimensional diagram named a dendrogram. These methods are suitable for small tables, having a few hundred rows. The desired number of clusters can be chosen after the proper shaft is designed by imposing a threshold [
30,
40].
The difference among the agglomeration techniques is given by the method for defining the distance between the clusters.
Considering two clusters,
and
, containing
and
elements, the average distance
is calculated on an Euclidean distance base:
can be expressed through the relation:
Starting from the above-mentioned, the most used hierarchical clustering methods are briefly presented below.
The following figure shows the result obtained by applying the hierarchical clustering algorithm (
Figure 3).
- a.
Minimum distance method (minimum method)
This method is the simplest, being based on the minimum distance, also known as the method to the closest neighbor. In such cases, the distance between the clusters represents the distance between the closest items:
where the
f component is attributed to the
cluster and the element
h to cluster
. In this case, the distance is calculated between each possible pair of elements (
f,
h). The minimum value is the distance between the cluster’s
and
. In other words, the distance between two clusters is given by the shortest link value. At every stage of the clustering process, the
and
clusters, for which
is minimal, will be coupled. A graphical interpretation of the minimum distance between the clusters is shown in
Figure 4.
- b.
Maximum distance method (maximum method)
This method, also known as the farthest neighbor method, is based on distance maximum, being the opposite of the minimum method. In this technique, the distance
is calculated using the equation:
where the
f element is attributed to the
cluster and the
h element to the cluster
. In such a case, the distance between two clusters is given by the longest link value. At each phase of the spatial hierarchical grouping, the clusters
and
, for which
is maximum, will be coupled.
Figure 5 presents a graphical interpretation of the distance between the clusters.
- c.
The average distance method
In this method, the distance between two clusters is defined as the average of the distances between all element pairs, where each pair contains one element from each cluster. The average distance
is calculated with the relation:
where
is the sum of all possible distances between the elements of
cluster and the elements of
and
nr and
ns represent the number of elements in the cluster
and
, respectively.
At each stage of the clustering process, the
and
clusters for which the distance
is minimum, are coupled.
Figure 6 illustrates how the average distance is defined.
- d.
Center of weighted method
In this technique, the distance between two clusters is described as the square Euclidean distance between their centers of weight.
where
and
are the mean vectors for the
and
clusters.
This method is much more robust, deviating from the average more than other methods of hierarchical clustering, but in other situations may not give as good results as the Ward method or the average distance method.
- e.
Ward method
This method seeks to form
partitions in a way that minimizes the information loss associated with each cluster and measures them in an easy-to-use interpreted form. At each step of the analysis, two clusters are combined, the fusion of which leads to results that minimize the increase in “lost information”. The lost information is defined by Ward in the conditions of the criterion of the square sum of the error.
where
and
are the mean vectors for the
and
clusters and
nr and
represent the number of elements from the
and
clusters, respectively.
- f.
K-means method
This method involves a simple and easy mechanism to classify the input data set into several K clusters (K fixed a priori). The basic plan implies defining K centers of weight, one by one for each group. These centers of weight must be rationally fixed because different locations lead to different results. The best choice is to fix them, if possible, as far apart as possible from each other. The next step is to take each element of the input data set and link it with the closest center of weight. The first grouping stage ends when there are no more ungrouped items. At this point, it is mandatory to recalculate new K centers of the groups arising from the previous phase. The process continues until the positions of the new centers no longer change significantly.
The objective of the method is to minimize an objective function (quadratic error function) given by the expression:
where
is the distance measured between point
,
l = 1,...,
nk, where
is the total number of components of the k cluster, and the center of the group
,
k = 1,...,
K.
2.4. Validation of Results
The evaluation of the results obtained from the clustering process is the main concern of cluster validation. At this stage, the density, size and shape, separation of clusters, and robustness of classification were examined. The literature mentions the following tests to validate the clustering process [
39,
40,
44,
45]:
External tests—data not included in the basic ranking are compared with the categorization results of the input data.
Internal tests—only input data are utilized to evaluate the classification’s quality; each cluster’s separate validation is carried out using this test.
Relative tests—this approach takes into account several classifications of the database, the results being analyzed using the same clustering method, but with various input data.
Internal cluster validation tests are more common and effective in real-world settings. Testing based on the creation of a global silhouette index of clusters is one of them and is also one of the most popular. This test determines the average shape width for each cluster, the median shape width of each element, and the average shape width of the entire collection of input data. With this method, each cluster might have a “shape” that is based on comparing its separation and density. The clustering procedure is then validated using the shape’s average width, and the ideal cluster number will also be set using the same information.
, the local silhouette coefficient, is calculated using the relation:
, the silhouette width coefficient for element
l, is calculated by:
is the average distance from element
l and the elements from cluster
k and
is the minimum median distance from component
l and the components in the closest cluster
k.
In Equation (12), if the element
l is unique inside a cluster, then
= 0. The literature proposes the following explanation of the GSI coefficient [
38,
40]:
0.71 to 1.00 (a strong structure was highlighted);
0.51 to 0.70 (a reasonable structure was obtained);
0.26 to 0.50 (the structure is weak and might be artificial);
<0.25 (no substantial structure was noticed).
Verifying the clustering process quality is one of the main steps in analyzing the inherent database characteristics. Its purpose is to evaluate the results of the process clustering and select the schema that best fits the elements in the database.
2.5. Electric Load Profiling
The loads from the nodes within the electrical networks range in consumption time and place. Consequently, distribution operators (ODs) need details regarding the load of fed consumers so that they will be able to optimally plan and operate the network and ensure proper power supply and operation modes, load management, and proper billing [
40,
46,
47,
48]. The load demanded by consumers depends on various parameters such as:
Consumer type: consumption type, with/without electric heating, or size of the building;
Time factor: time of day, weekday, and month;
Climatic factors: humidity, temperature, cloudiness, wind speed, etc.;
Other electrical charges related to the analyzed load.
For a certain consumer, his behavior is determined by a load profile correlated with the electricity consumption for each interval. The accessibility of this data is dependent on the type of consumer. In general, small consumers (such as residential ones) have an uncertain behavior, because the implementation of smart metering on a large scale would lead to large investments whose recovery time from their energy consumption would be too long. For these consumers, there is only the consumption of electricity at certain periods of time each year. For large consumers (such as industrial consumers), the installation of smart meters is facilitated by advantages related to billing (done every month) and high electricity consumption (justifying the investment by the fact that the detailed recording of consumption allows for the application of differentiated tariffs varying with consumption period).
In the traditional strategy for a distribution system plan, load profiles are employed to evaluate the maximum necessary load, in correlation with the simultaneity coefficient of the consumers coupled to the network node. Despite the fact that this strategy is appropriate, some major disadvantages arise:
There are inherent inaccuracies, due to the simultaneity coefficients, which must be highlighted;
The energy consumption and losses calculation does not have an increased precision;
The voltage within the network nodes from various hours is not known;
The load profiles of nodes having arbitrary variations in power requirement cannot be accurately modeled or evaluated.
Utilizing modern techniques of load analysis, load forecasts and the calculation of power and energy losses (for any period) can be achieved. The use of load profiles has some advantages [
40,
49]:
It is not necessary to estimate or calculate concurrency coefficients, as load profiles already include the information;
The energy consumption and power/energy losses calculation can be correctly achieved at any point within the network;
The main voltage and charging are known for any period;
The optimal position of the transformer plot can be determined, both for the peak load period, as well as for other times of the day;
The effect of overloading or increasing the load is modeled more accurately than in the traditional method.
Medium voltage (MV) and low voltage (LV) networks, mainly urban ones, have many nodes even if only the nodes where electric distribution substations (EDSs) are located are taken into account. Thus, monitoring the system consumption for every node can become overwhelming, sometimes virtually impossible. This is practically overcome if, in the organization studies corresponding to the networks, the load-type profiles are associated with the node groups.
2.6. Load-Type Profiles Correlated with Nodes in Electrical Distribution Networks
This paragraph presents an approach based on hierarchical clustering techniques (presented in
Section 2) for calculating the load-type profiles (LTP) correlated with nodes’ high voltage (HV) and medium voltage (MV) electrical distribution networks. Through knowledge of the load profile of the nodes, the OD can clarify the procedure for estimating the requirement in a certain sector.
In this respect, it is mandatory to understand the daily loading profiles. The load diagram of the nodes must be reconstructed by applying the standard load profile and daily demand. This standard profile is characterized by 24 or 48 load values.
The type and season influence the shape of load profiles. Because many profile responsibilities are correlated with various nodes of the network and may complicate the problem, they must be grouped into clusters, taking into consideration some similarities among them. For every cluster, the typical load profile can be established.
In this sense, all the measurements performed must be processed, by arrangement and normalization, using a convenient normalization factor (average power, peak power, or more frequently, the energy consumption of the studied period):
where
is the normal value of the power in
i node at
h hour,
is the real value of the power in
i node at
h hour, and
is the total energy consumption in the interval
T (24 h).
It is worth mentioning that after the clustering procedure is applied, clusters can be acquired as coherent and representative so that the diagrams within the same cluster are similar [
40,
49,
50]. In the end, every cluster will be related to a typical task profile, estimated using the graphs median.
where
is the cluster numbers derived from the node’s classification in concordance with the absorbed load (active power) and
is the node number from each cluster
,
.
2.7. Load-Type Profiles Correlated to Low-Voltage Consumers
In recent years, distribution operators have increasingly used smart metering systems (Smart Metering System) to monitor the electricity consumption of consumers. The development of these systems begins with technology selection and planning for installation, operation, and maintenance. In general, the implementation of residential or non-residential consumer categories is quite discrete in the case of many countries in the European Union. Currently, there are two alternative solutions to solve the problem of metering electricity consumption for consumers.
The first solution envisages the installation of smart meters for all consumers, being expensive and uneconomical, but the most accurate. The second solution envisages the continued use of traditional meters and the attachment of load-type profiles to the monthly energy consumption, which is then distributed over days and hours.
For large consumers, in the category of industrial ones, the presence of smart meters is necessary for several reasons: invoicing is done every month, the consumption of electricity is high (which implies rapid amortization of the investment), and it provides a detailed recording of consumption necessary for the process of invoicing.
This section presents an algorithm for determining the type of load profiles associated with LV consumers according to the category of energy consumption in which they fall. Consumption categories are identified from historical information and can be updated following changes in consumer behavior.
The algorithm phases are [
40]:
Phase 1. In this phase of load analysis and database formation, a representative specimen must be identified from the crowd of consumers who have installed smart meters and the sampling step for the purchase of load schedules must be described. Consequently, a database with the registered load program and the consumer category is created.
Phase 2. Technical problems in the pre-processing of the data may affect the quality of the database in real cases of monitoring consumer charges, requiring many meters spread over a large geographic region over an extended period. The most relevant and frequent problems are communication issues, interruptions, meter failure, and, occasionally, the irregular behavior of individual consumers. These issues will influence the records of the database, appearing as null values or exceeding a particular threshold set by the connection notice. Such records must be identified and working techniques must be applied to obtain the missing data, resulting in the substitution of missing or equal to zero data with some estimated values so the database can be made ready to obtain clusters.
Phase 3. The database with records of load schedules must be divided into clusters of consumption, defined by the consumer’s type: residential, commercial, or industrial.
Phase 4. This phase represents the clustering procedure. A hierarchization of clusters is achieved, taking into consideration the daily consumers’ energy consumption within every consumption macro-category. In this respect, the K-media clustering technique is applied. For every cluster, the representative load profile is determined by applying the average of the load graphs’ hourly values.
Phase 5. Task-type profiles must be assigned. For every customer’s class, a typical task profile is assigned as a function of the activity macro-category to which the consumption belongs.
The suggested algorithm was evaluated using a database with 296 load diagrams. After the macro-categories were divided, 147 consumers were distributed in the residential consumers class, 97 in the commercial consumers class, and 52 in the industrial consumers class.
The utilization of the k-means clustering technique within every activity macro-category resulted in five clusters within the residential consumers class and three clusters within the commercial and industrial consumers ones.
Using these standard profiles, consumers can be better delineated in connection with load modification than the standard profiles correlated to the complete activity macro-category. This detail can be emphasized if a correlation is achieved between the load-type profiles of every activity macro-category and the load-type profiles linked to these macro-categories (
Figure 7).
The change in consumers’ energy consumption determines important problems in the planning of activities associated with the technological procedure, in terms of adopting the optimal power supply and operation solutions. Solving these issues can be efficiently achieved by utilizing consumption profiles correlated to energy carriers.
This section proposes a perspective based on clustering methods for the determination of load-type profiles for electric vehicle charging networks. The standard profile’s forms, which take into account both the type of electrical equipment or installations supplied and the time of year the survey is conducted, represent the specifics of how consumers use electricity.
The methodology proposed for the load profiling process for electrical vehicles as (industrial) consumers is presented in
Figure 8 [
40].
The importance of every phase from
Figure 7 is the same as for all profiling procedures that are operated with clustering methods from the previous paragraphs.
2.8. Distribution Network Loading Simulation
In the distribution network, the basic component represents the loading simulation of MV/LV transformers. Because tens of thousands of transformers are positioned in the distribution matrix, their hourly load is hard to determine, due to the many distribution networks, including current and voltage sensors mounted in devices, transformers, and MV connections unequipped with recording meters with transmission capacity, remote, real-time recording, and load level. Consequently, it is hard to identify those transformers operating at overload or to estimate the loads of MV connections intended for transfer between distributors without a simulation.
The most effective route to evaluate a load of transformers, without performing real measurements, is represented by the utilization of simulation programs, with some of the following elements being taken into consideration [
40]:
The number of consumers coupled to every transformer;
The consumer type;
The annual energy consumption for every consumer;
Task type diagrams correlated with every consumer class;
Software able to calculate the load of transformers.
It must be emphasized that, when a load of transformers is estimated, the maximum and hourly active powers (within low voltage/medium voltage side) are computed on peak days and varying typical intervals (winter, summer, average working days of the week, etc.).
2.9. MV/LV Distribution Transformers Load Simulation by Clustering Procedure
The database structure that is necessary to simulate the load of the distribution transformers in the MV/LV substations includes:
The LV database having a “consumer link-substation (CS)” (number and class of consumers from every transformer in CS);
Basic profiles containing task-type profiles for all consumers classes;
A consumption database containing data on the annual energy consumption and consumer class.
In order to establish the typical load profiles correlated with consumers in LV networks, a database is required to include as many registered load diagrams as possible, representing all consumption categories. The decisional procedure of associating a typical task profile with a certain consumer constitutes a complicated problem. Consequently, a load profiling algorithm using clustering methods has been suggested for residential and non-residential consumers.
Figure 9 shows the implementation diagram for determining load-type profiles for residential and non-residential LV consumers [
40] and the software pseudocode. In the following figures, the software implementation for the proposed algorithm is represented. The software development was carried out in the LabVIEW programming language developed by National Instruments.
3. Results
An important issue in the optimal operation and planning of electric distribution networks by electric companies is the estimation of the maximum load of consumers. This problem occurs especially with home consumers. The decision to determine the optimal number of consumption categories within the same class of consumers and the estimation of the maximum load is a complex problem. Therefore, an improved regression method with clustering techniques is presented. Within the proposed method, to define the consumption categories of consumers, a classification can be made considering the monthly energy consumption and the maximum load (data obtained from the recorded load graphs). After identifying the consumption category, the maximum load of each consumer is estimated using a regression model corresponding to each consumption class. The major advantage of the proposed method is the exploration of the data using clustering techniques in order to obtain models/patterns/categories of consumption.
The contributions to modeling loads in electrical networks, the effectiveness of clustering techniques in their profiling, and the load simulation for MV/LV distribution transformers using clustering procedures are highlighted in the study.
3.1. Estimating the Maximum Load of LV Users (Consumers) Using Clustering Techniques
The estimation of the maximum load of users is a key parameter in the effective operation and planning of electricity distribution networks by energy providers. This problem occurs especially in household consumers, but also in the integration of electric vehicle charging. From analyzing the built databases and using data exploration techniques, load patterns/models can be identified that can be extrapolated to other potential consumers such as electric vehicle charging networks.
The decision to estimate the appropriate number of consumption categories within the same consumer class and the maximum burden is a complex issue. Thus, a customized regression method using clustering techniques is presented.
The monthly energy consumption, as well as the maximum load, can be classified using the suggested approach to determine the consumption groups of users (data obtained from the recorded load graphs). After determining the consumption category, a regression pattern matching every consumption class is used to estimate the maximum load of each consumer. The main benefit of the suggested approach is the data exploration utilizing clustering techniques to produce models, trends, and consumption categories that would aid in consumer modeling.
Its stages are presented below [
40]:
Stage 1. Database. For consumers in the selected pilot area where smart meters have been installed, the load graphs for the analyzed time period are recorded. For each consumer, the variables that characterize the consumption category (maximum load, monthly and annual energy) are extracted.
Stage 2. Data pre-processing. All records from the database are analyzed, and those that contain missing values, equal to zero or atypical, are subjected to the process of treatment with missing data techniques. After all processing, the data are used to classify consumption categories.
Stage 3. Data exploration process. The first step in this stage is the use of clustering techniques in order to obtain consumption categories according to the maximum load and monthly energy consumption. The K-media technique is utilized for the clustering process. Then, a regression model is built for each consumption category to estimate the maximum load related to consumers.
Step 4. Estimation of peak load. The regression models obtained in Stage 3 are used to estimate the maximum load absorbed by the monitored consumers by means of classical meters.
The structure of the database includes information on the type of consumers (residential, industrial, commercial, and public) and their daily/monthly/annual consumption respective to the maximum load absorbed. This information can be obtained using smart meters. The information acquired is represented by the load curves that describe the consumer’s behavior during the day. The processing of the load curves allows us to determine the data on energy consumption respective to the maximum load absorbed by each consumer. For the method implementation, the database was split in two, a working base and a testing base. The application of the working base will lead to mathematical regression models able to conduct the estimation of the maximum load for each type of consumer within the same category of energy consumption. Energy consumption categories are then acquired using the K-average clustering procedure for each consumer type (residential, industrial, commercial, and public). The outcomes of the clustering technique are confirmed using a group quality assessment, based on the shape coefficient. The method is then tested to estimate the maximum load absorbed by consumers belonging to the testing base.
The method was applied using an initial database containing records of load schedules for 1160 household consumers located in a countryside region over a one-month period. For each consumer, the characteristic variables related to the peak load (Pmax) and monthly energy consumption (Wluna) were extracted.
In Stage 2, the analysis of load graphs and characteristic variables led to the elimination of 15 consumers due to zero energy consumption. Thus, in the next phase, the database consisted only of the associated information for 1145 consumers. This base was split in two, a working and a testing base. The size of the working and testing bases varies in the literature and primarily depends on the total amount of records in the original database. The split between the two bases ranges from 90/10 to 60/40. In this instance, the working database contained 814 consumers (representing 66 percent of the database), leaving the testing database with the remaining customers (331 consumers, 33 percent of the database).
In Stage 3, the data exploration process was initialized through the K-media clustering method to obtain the consumption categories in which the consumers in the work base will be integrated.
The first phase of the K-average method consisted of the determination of the maximum cluster number (consumption categories) with the relation:
Then, for each
K = 2, …, 29, the
K-averages technique was applied. For the clustering processes initiated for the
K value of the number of clusters, the quality of the grouping process was estimated by calculating the global silhouette index (GSI). The values of the shape index obtained for each
K = 2, …, 29 are represented in
Figure 10. This grouping is the best result of the clustering technique since the GSI has the greatest value for
K = 5. The results for the global silhouette index (GSI = 0.775) show that the clustering technique was quite effective. The graphical representation of the clusters obtained in the case
is illustrated in
Figure 11.
The characteristic variables represented by the mean values (m) and the dispersion (σ) corresponding to the maximum load (P
max) and the monthly energy (W
month) for each cluster (consumption category) are summarized in
Table 1.
The analysis of data from each consumption category indicated a correlation between the maximum load and the monthly energy consumption that can be mathematically modeled by applying regression models. In
Figure 12,
Figure 13 and
Figure 14, regression models are represented based on first- and second-degree polynomials for every consumption class.
Regression models based on the degree II polynomial led to a better approximation than the degree I polynomial. Thus, regression models based on the degree II polynomial were adopted to determine the P
max (W
month) dependence; coefficients of the regression models determined for each consumption class are summarized in
Table 2.
In Stage 4, for all consumers in the test base, the maximum load is estimated by applying the regression model of the consumption class associated with every consumer, depending on the monthly energy consumed.
Figure 15 presents the real and forecasted values associated with the maximum consumer tasks within the test base, assembled according to the consumption class.
Of the 331 testing base consumers, 206 (approximately 62%) forecast errors were ≤3%, 77 (approximately 23%) errors were between 3 and 7%, and 48 (approximately 15%) errors were between 7 and 10% (
Figure 16); thus, the average estimation error in the testing base was 4.01%. This value is reasonable in the context that most consumers do not have permanent monitoring.
3.2. Simulation of a Load of MV/LV Distribution Transformers by Clustering Procedure Application
The future power supply networks of electric vehicles will be powered by medium and low-voltage networks.
The following algorithm is suggested for simulating the load of the distribution transformers from the MV/LV transformation stations [
40].
Phase 1. Database: representative consumers with smart meters will be sorted out from the database. Typical load diagrams will be recorded for every consumer, then, the main characteristic variables will be detached, i.e., minimum (Pmin) and maximum active power (Pmax), daily energy (Wz), and consumption class.
Phase 2. Pre-processing load diagrams: all records involving missing data or values will be excluded or subjected to processing. After being pre-processed and reduced, the results will be applied to obtain the classification into consumption categories (clusters) using the clustering procedure.
Phase 3. The division into consumption macro-categories: the database with the records of load schedules will be split into clusters described by the consumer’s type: residential, commercial, and industrial.
Phase 4. Clustering procedure: a clustering technique will be applied for load-type profile determination to determine the optimum results. In the end, a typical load profile for each consumption class will be obtained by applying the average of the hourly values for the load diagrams.
Phase 5. Determining the load profiles: a typical load profile will be attributed to every consumer’s class, depending on their consumption class.
Phase 6. Estimating the load of the MV/LV transformer: a simulation protocol will then be proposed based on the Equation [
40]:
where:
is the MV/LV transformer load from the transformer station at h, (kW);
is the consumer number, where k is the consumption class;
is the average energy consumption, where k is the consumption class, (kWh);
is the hourly coefficient of transformation for energy consumed by the consumers, (kW/kWh);
is the standard deviation of the power distribution necessary to the cluster consumption (kW/kWh);
is the cluster number (consumption class) correlated to the consumer’s feed.
The weight center technique was applied to evaluate the load-type profiles through a database containing 180 load curves registered by smart meters from residential consumers in a distribution system from the LV pilot.
To evaluate the representative load profiles, the gravity center was applied, taking into consideration the scattering of the LV pilot located in a region from Romania [
40]. Every load graph is set by 48 hourly values correlated with consumer behavior over one day. Missing load curves or abnormal values of zero throughout the day must be eliminated from the procedure, with only 144 consumers remaining eligible. The clustering procedure resulted in five consumption classes (clusters) (
Table 3).
It can be observed that the most representative class of consumption is C5 (50% of the total consumer number), the least representative being C2 (only 3.5%). The load-type profiles correlated to every cluster (consumption class) are depicted in
Figure 17,
Figure 18 and
Figure 19 and the consumer distribution is presented in
Figure 20.