1. Introduction
Electricity is part of a composite market that involves generation, transmission, and consumption agents. Such a free market has become highly competitive in recent years, leveraging the participation of several investors, electric companies, and public agencies [
1]. Once these stakeholders seek to maximize their profits while minimizing their expenses, a suitable prediction of energy generation has been mandatory for those who interact in this business, especially because of the competitive electricity market, influenced by the supply and demand conditions. Moreover, as most system operational decisions occur as a response to data gathered and processed at the control center, the use of data-driven platforms are crucial to get useful information and make intelligent choices [
2]. These data-guided frameworks are particularly important in the Brazilian context—as the goal of our work—since the national power grid is currently operated by a general grid operator that arbitrates when and how much each power plant will produce from official computer models [
3]. As the Brazilian electricity matrix is mainly composed by renewable power sources, which vary in nature [
2], the electricity market prices may reflect the stochastic behavior of these sources. This means that, in most cases, the wholesale market prices in Brazil are determined by the opportunity costs of these renewable power plants, based on the acquired data and supply and demand tendencies [
4].
Techniques devoted to forecasting electricity demand aim at estimating the amount of energy needed, over a historical time series, for transmission and later consumption by others. Despite the adaptation of several machine learning models to properly address the problem [
5,
6,
7,
8,
9], tracking the progressive use of the electricity is not a straightforward task in practice. In fact, the electricity dispatch is intrinsically related to the internal operations of the power systems such as the periodic scheduling of power generation in hydroelectric plants, the preventive maintenance of the generators, the reliability evaluation of power systems, etc. [
10]. Moreover, the problem becomes even more challenging and especially interesting when one has to deal with the highly nonlinear tendency of power data, as it is mathematically modeled by highly-oscillating time series whose parameters can be affected by exogenous variables such as weather/ambient conditions [
11] and economy-related factors [
12].
Formally, the problem of predicting the power demand on time series can be described as follows: given a time series of electric load
, in which
accounts for the historical energy load at the instant
i,
, the goal is to predict the quantity
, where
h establishes the forecast horizon [
13,
14]. Taxonomically speaking, this kind of prediction usually comprises three categories of planning horizons: (i) long-term (years/months); (ii) regular-term (days/weeks); and (iii) short-term (minutes/hours). Since estimating the electricity demand becomes harder as the planning horizon increases, the predictions can be strongly influenced by several nonuniform variables such as electric consumption, temperature, air humidity, and socioeconomic aspects. Moreover, long- and regular-term time series make the problem more difficult to be technically managed and solved, as obtaining a computationally robust solution to act in real scenarios requires the integration of customized tuning approaches and non-linear models as a unified framework to properly work [
15,
16,
17,
18,
19]. Therefore, in this paper, our main interest lies in designing well-behaved forecasters to assess and predict the electricity demand in Brazil for both long- and regular-term time series.
Considering the recent advances in Machine Learning (ML) for electricity load forecasting, the literature offers a variety of approaches, most of them specifically designed to solve a particular case study of energy consumption. For instance, Qiu [
20] proposed a Support Vector Regression (SVR) variant based on Particle Swarm Optimization (PSO) to forecast the energy trade of the Taiwan electricity market. Hybrid methods based on well-established ML models such as random forest, neural networks, and fuzzy logic adjusted to gauge electric consumption were also given in [
21], where the authors monitored the amount of energy consumed in buildings located at the Polytechnic University of Catalonia, in Spain. Hernandez et al. [
22] described a hybrid decision-making tool to analyze and inspect the energy consumption in an industrial park, also in Spain. They combined self-organizing maps and k-means clustering into a cascade-based application to supervise the power consumption flow in the evaluated industrial park.
In contrast to the above study cases, covering larger territories takes a lot of data from different resources to produce meaningful results. As a consequence, studies dedicated to investigating larger areas such as full countries only have contributed to the minority of literature in electricity demand. This was the case faced by Zhao et al. [
23], as their method integrates a
grey forecasting model with parameters optimization to assess the electricity consumption in Mongolia. Zhao’s method was later modified by Liang and Liang [
24] to cope with the electricity demand of China between 2016 and 2020. Their method divides the forecasting task into two steps: two prior single predictions, and the combination of both forecasts as a new, more accurate, one to produce the definitive estimation. A rich discussion comprising the energy demand for different sectors in U.S. was presented by Ameyaw and Yao [
25], through a Recurrent Neural Network (RNN) designed as an assumption-free-based predictive model for regular-term predictions. RNN was also the learning strategy used by Bouktif et al. [
26], who faced the forecasting problem by applying an RNN architecture, namely Long Short-Term Memory (LSTM), to predict the electric load in France.
Finally, concerning the core works devoted to covering the electricity demand in Brazil—as the goal of our work—contributions were made towards assessing the energy consumption, but assuming as data source only global indicators such as Gross Domestic Product (GDP) and Population Growth Rate (PGR). Maçaira et al. [
27] presented different projections for the Brazilian energy consumption taking the GDP—as an independent variable—and a dynamic regression approach. Despite predicting consumption until 2050, their forecasts were only performed annually, giving no details for the demanded energy during the months. Another long-term forecasting study was conducted by Torrini et al. [
28]. The authors used a fuzzy logic-based methodology, which was calibrated with GDP and PGR indices, and compared their results with official projections for the sector, as provided by the Brazilian Energy Research Office (EPE - Empresa de Pesquisa Energética, in Portuguese:
http://www.epe.gov.br/en). Similarly, a year-by-year estimation for the electric demand in Brazil was also carried out by Trotter et al. [
29], by modeling uncertainty in the estimates of weather variables. Their approach relies on basic features such as population size and national income together with the electricity demand so that a multiple linear regression model is obtained to yield annual forecasts.
Other studies have also been published concerning the electric demand in Brazil, most of them focused on annual estimations [
30,
31,
32], wherein no conclusive forecasts can be delivered if one intends to predict the daily/monthly electricity load in a more detailed level of resolution. Moreover, the literature lacks more comprehensive studies which consider a wider range of data resources to go deeper into what kind of information is really a good match for energy demand predictability in Brazil. As motivation, to cite a few works that can perform robust data exploration while improving the predictability of ML approaches for energy demands in other countries, one may consider Dai et al.’s work [
33], which exploited the problem under the perspective of how to apply ensemble models to improve Support Vector Machines (SVM) to get the energy consumption in China. Another interesting study was conducted by Utterbäck [
34], whose goal focused on answering how weather data geographically vary in Scandinavia, and whether geographical properties are useful to give relevant information to the predictors. Assessing the impact of weather conditions in the forecasting task was also the goal of Zhang et al. [
35], but in the sense of how photovoltaic power systems can be drastically affected due to abrupt weather changes that occur during a whole day. Facing a similar problem, Ceci et al. [
36] applied entropy-based metrics for online training of artificial neural networks to better exploit the non-linear dependencies between the feature space (weather conditions) and the target space (observed power production). Finally, feature space analysis was the key point handled by Sarhani and Afia [
37] for load forecast, based on the combination of PSO and feature selection to improve their SVM variant, and by Liang et al. [
38], by means of a hybrid model which integrates several tuning strategies such as empirical mode decomposition and minimal redundancy maximal relevance into a regression neural network to produce the forecasts.
Contributions
In this study, the problem of predicting the electricity demand in Brazil is addressed for both long- and regular-term time series. Different strategies to optimize the performance and accuracy of the presented ML approaches are discussed in details so as to promote a comprehensive analysis of the explored data while still elucidating how electricity load predictions can be achieved and driven by data exploration and ensemble-based learning models. In contrast to other works that only provide annual forecasting assessments for the Brazilian electricity demand, this paper establishes a solid methodological pipeline for daily/monthly forecasts, by introducing several variables related to the national electrical system instead of employing macro-indices, as previously discussed. As accurateness in predicting the electric demand depends on the amount of available data and how to properly handle the data to build well-behaved predictive models, a new database composed by two national (and official) data repositories in Brazil is also given and discussed, thus filling the gap with respect to the absence of a comprehensive and reliable dataset in the Brazilian context.
This paper is organized as follows.
Section 2 introduces the data analysis apparatus, learning pipelines, and the datasets utilized in our investigation, while
Section 3 gives the results, main findings, and their discussion. Finally,
Section 4 summaries the conclusion of our research.
4. Conclusions
This study focused on the study of electricity load prediction in the Brazilian Interconnected Power Grid by means of different machine learning strategies and data exploration tools. In contrast to most existing works, which give only annual/monthly estimations for the electricity demand in Brazil, here, three ML models were applied and then optimized as new ensemble-based predictors with optimal hyperparameters to provide accurate daily/monthly forecasts. As verified in the evaluation study, the predictive model with the best performance was the GB, surpassing the other methods in terms of accuracy (tuned model: ) and MAPE/MAE (tuned model: and , respectively), therefore attesting the efficacy of GB in the predictability of electricity load demand in the Brazilian context.
The Knowledge Data Discovery (KDD), as conducted via the data analysis tools presented in
Section 3, was also of paramount importance to reveal the statistical behavior and other intrinsic relationships of the collected data. Moreover, there was a substantial gain due to the creation of new artificial variables, as the ones delivered by the resource engineering scheme, which was crucial for weighing the ensemble-based models, as well as improving the SVR, since it did not achieve a satisfactory performance without a proper adjustment of parameters.
Finally, in addition to establishing new methodological pipelines to forecast the energy demand in Brazil and to go deeper into the acquired data, this work provides a full data collection of data taken from official Brazilian agencies to the industry and those who are interested in studying load demand, especially in the Brazilian context.