1. Introduction
Electricity plays an important role in modern society. Within the past years, energy consumption has been continuously growing worldwide, due to steady increase in population, economic growth, and weather factors. This is becoming more evident in developing countries. As consumer patterns change throughout the day and year, power systems are facing more challenges to balance supply and demand in real time, especially during the peak periods when electricity demand is the highest. The peak demand is presenting a growing trend due to the load growth of different customer sectors such as residential, commercial, etc. A study conducted by Burillo et al., for example, found that the peak demand is projected to grow in Los Angeles, California, due to higher penetration of air conditioning systems in residential and commercial buildings [
1]. In many countries, electric vehicles (EVs) are also starting to connect to the grid, which are driving more demand. Modern power systems typically use a balance of expensive peak power plants, small-scale energy storage systems (ESS), and demand response programs to mitigate the peak demand. ESS, especially, are being aggregated in a virtual power plant setting to support the grid during periods of high demand.
Forecasting the peak load has become a major concern for power system planners, as they must ensure that consumer demand is always met for the power grid to operate reliably. Accurate forecasts on the long term will allow planners to determine how much generation capacity is needed to meet future demand, while meeting environment policies.
However, forecasting the monthly peak demand is a challenging problem since it is highly nonlinear and noisy. The peak demand is often correlated with other variables that are also nonlinear, with strong monthly patterns. Therefore, it is very sensitive to these variations. Conventional time series and regression-based models cannot learn the nonlinear relationship between the inputs and outputs and therefore they may not provide good performance. Nowadays, time series data is inherently large-scale, nonlinear, and noisy. The energy sector is breeding big data from the Internet of things devices (e.g., sensors, and smart meters) that contains valuable information to support decision making. However, analyzing this data can be challenging and requires more advanced algorithms for mining the data.
In the last decade, time series forecasting has undergone a paradigm shift from statistical driven models to a hybrid of statistical and machine learning driven models. As a result, machine learning and deep learning methods are rapidly emerging with better performance for time series problems. These consist of trained algorithms that can learn the nonlinear mapping between the inputs and outputs in big data. Deep learning, in particular, consists of advanced algorithms that apply a more powerful approach for learning complex patterns, by using deeper structures that can extract more meaningful information. Therefore, they are more effective at handling high dimensional data. Among the deep learning methods, convolutional neural networks (CNNs) are becoming more attractive for time series analysis since they can leverage spatial information in different dimensions. Long short-term memory (LSTM) is another popular method that has received increased attention since it can model sequential data very well.
The peak load forecasting is an important research topic within the energy sector that has not been fully explored from a deep learning perspective. Therefore, this leads to the main research question of how deep learning methods can provide better support to these areas that are very nonlinear with complex patterns. Thus, the goal of this study is to develop a forecasting model based on deep learning that can predict the monthly peak demand. A real case study on Panama’s power system is presented to validate the model. Two deep learning methods based on CNN and LSTM are introduced. This paper is organized as follows:
Section 2 presents the literature review related to peak demand forecasting and the contributions of this study.
Section 3 provides a high-level overview of the framework.
Section 4 provides a more detailed description of the case study.
Section 5 describes the data exploration and transformation process.
Section 6 outlines the architecture of the models.
Section 7 describes the performance metrics used for model assessment.
Section 8 provides the experimental setup and results.
Section 9 discusses the results obtained.
Section 10 presents the conclusions and future directions for this study.
2. Related Works
A wide variety of models have been proposed in the literature for forecasting peak demand. These models can be classified into four main categories which are (1) regression models, (2) time series models, (3) machine learning models, and (4) deep learning models. Regression models include multiple linear regression (MLR) and simple linear regression. Time series models include autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), and Holt–Winters. These models are well established and are still being used due to their simplicity. Machine learning models include artificial neural networks (ANN), support vector machine (SVM), support vector regression (SVR), k-means clustering, decision trees, and ensembles. These models use shallow networks and tedious feature engineering for learning patterns. Deep learning models, on the other hand, use deeper networks of more than one hidden layer that can automatically learn and extract the features. These include CNN, LSTM, gated recurrent unit (GRU), and recurrent neural networks (RNNs). In recent years, machine learning and deep learning models have become a growing trend for handling time series data, due to their robustness and superior performance. Therefore, they have rapidly become the state-of-the-art methods for time series applications.
Most of the research conducted to this date has been focused on predicting the peak demand for a particular country or smart grid building, to improve demand side management. Different forecasting horizons have been introduced which include daily, weekly, monthly, and yearly. Many of the peak demand models proposed in the literature incorporated weather, population, and economic variables as main indicators. Carvallo et al. conducted a review of the energy forecasting methodologies produced by twelve major utility companies in the United States, between 2003 and 2007 [
2]. The study found that most models incorporated historical sales, weather variables such as Cooling Degree Days (CDD) and Heating Degree Days (HDD), as well as demographic projections of customer loads [
2]. Another major finding of this study was that most utilities overestimated the forecasts for peak demand since they did not project slow economic growth [
2].
Several time series regression models have been introduced in the literature. Rallapalli and Ghosh earlier on proposed a multiplicative SARIMA model to predict the monthly peak demand for five regions in India [
3]. The model was compared with the official forecasts done by the Central Authority. SARIMA achieved the best performance, since it could capture the seasonality better. The mean absolute percentage error (MAPE) ranged between 0.93 and 3.94% for the five regions [
3]. Sigauke and Chikobvu developed three time series regression models for forecasting daily peak electricity demand in South Africa, using historical data from 2000 to 2010. The best-selected model incorporated maximum temperature, minimum temperature, and peak day temperature as significant predictors [
4]. Khorsheed developed four MLR models to predict monthly energy peak demand for the Kingdom of Bahrain, using eight years of historical data. The best model obtained had an R-squared value of 0.94, which incorporated the month and maximum temperature [
5]. AL-Hamad and Qamber evaluated two models based on MLR and adaptive neuro-fuzzy inference systems (ANFIS) to predict long term peak demand for six countries in the Gulf Cooperation Council region up to the year 2024 [
6]. Both models incorporated gross domestic product (GDP) and population as input variables. ANFIS had the highest prediction accuracy, with an average percentage error of 0.53% [
6]. Almazrouee et al. recently evaluated two methods, Prophet (Facebook’s forecasting tool) and Holt–Winters, for forecasting the long term daily peak demand for Kuwait up to the year 2030, using ten years of historical data [
7]. The study found that the Prophet model performed better than Holt–Winters with MAPE of 1.75% [
7].
In recent years, there has been an increasing number of studies based on machine learning and deep learning approaches for forecasting the peak load. Pannakkong et al. applied ensemble machine learning models based on ANN, SVM, and deep belief networks (DBN), for predicting monthly peak demand in Thailand [
8]. The ensemble model of ANN and DBN performed the best for predicting demand one month ahead, with MAPE of 1.44% [
8]. Kwon et al. compared a deep learning LSTM model with MLR for predicting weekly peak load in Korea [
9]. Three input features were used: weekly temperature, GDP, and previous peak demand. The study found that LSTM performed better than MLR, with MAPE of 2.16% and 2.67% for the years 2017 and 2018, respectively [
9].
Elkamel et al. conducted a unique study by comparing a MLR model with three different CNNs for predicting monthly electricity consumption for Florida, using data from 2010 to 2016 for training [
10]. The CNN model outperformed MLR, with MAPE of 2.5%. The authors found that the model performed even better when the dataset was enhanced [
10].
Son and Kim proposed a LSTM model to predict monthly residential demand, a primary driver of peak demand in South Korea [
11]. The model incorporated mostly eight weather variables such as CDD, maximum temperature, and wind speed. The proposed model was compared with four benchmark methods: SVR, ANN, ARIMA, and MLR. LSTM performed the best, with MAPE of 0.07% [
11]. Kim et al., on the other side, proposed various ensemble trees (bagging and boosting) for forecasting the peak load of small industrial facilities [
12]. The model incorporated the previous hourly load patterns as the main feature. The isolation forecast algorithm was introduced to identify outliers. They found that decision trees in overall were not suitable for predicting the peak demand [
12]. Lai et al. [
13] recently evaluated deep neural networks (DNN) with enhanced historical loads to predict daily peak demand for Austria, Czech, and Italy. The model was benchmarked against LSTM, ARIMA, and deep residual networks (DRN). The study concluded that deep neural networks outperformed all the models when tested on the three different datasets [
13]. Mehdipour et al. [
14] performed an empirical comparison of four machine learning methods that are widely used, i.e., SVR, gradient boosting regression trees, ANN, and LSTM for predicting the load demand of individual buildings. It was demonstrated that LSTM and gradient boosting methods were able to effectively detect the daily peak. Shirzadi et al. [
15] proposed a high-level framework based on machine learning and deep learning methods, i.e., SVM, random forest (RF), non-linear autoregressive exogenous (NARX), and LSTM to predict daily load for Bruce County, Canada. Temperature and wind speed were the main input features used for training the models. Both LSTM and NARX provided the best accuracy for forecasting the peaks. RF, on the other side, overestimated the highest peaks but still was within the acceptable range, with MAPE of 10.25%. Atef and Eltawil [
16] compared deep bidirectional-LSTM, unidirectional LSTM, and SVR for electricity forecasting and found that the bidirectional LSTM can stably predict demand without overfitting. SVR, on the other side, was not able to follow the demand pattern. They also found that increasing the network depth did not improve predictive performance.
In addition to the works cited, hybrid models have also been proposed to improve forecasting accuracy. Firstly, Laouafi et al. [
17] introduced a hybrid model consisting of fuzzy c-means clustering and ANFIS for forecasting daily peak demand, where clustering was used to classify the daily peaks according to temperature. Dai et al. [
18] evaluated the robustness of a hybrid model based on ensemble mode decomposition with adaptive noise (EMDAN) and SVM for daily peak load forecasting. The grey wolf optimization was adopted to improve convergence. The model was found to be reliable and achieved superior performance than SVM, with MAPE of 0.25%. Park et al. [
19] proposed a machine learning model that combined Holt-winters and decision trees, i.e., RF and bagging, to accurately predict the peak load, which was further utilized to improve ESS scheduling strategy. The hybrid model based on Holt–Winters and bagging trees was found to be superior, with MAPE of 6.7%. Nepal and Yamaha [
20] recently presented a hybrid learning approach based on k-means clustering and neural networks to predict the peak demand of university buildings. The data was grouped into 6 clusters that presented similar demand behavior. The model performed far better than neural networks, with MAPE of 6.1%. Kazemzadeh et al. introduced a unique hybrid approach that combines ARIMA, ANN and SVR to forecast the yearly peak load in Iran, using historical data of twenty six years [
21]. The study found that the hybrid method outperformed the individual ARIMA and ANN methods, with MAPE of 0.97% [
21]. Wu et al. [
22] recently evaluated a combined GRU-CNN model for short term load forecasting and found that it was able to effectively follow the nonlinear load pattern and predict the highest peaks.
Table 1 lists the accuracy of the best performing models obtained for the papers reviewed, in terms of MAPE.
Based on the discussion above, we found some missing gaps in the literature. Although deep learning is becoming more popular for predicting peak electricity demand, we found that CNNs have not been explored for this particular problem. Only one study [
10] attempted to evaluate CNN and that was for predicting energy consumption. Therefore, the contribution of this paper is significant, since no similar study has been done previously. Furthermore, it was found that no previous work has been conducted on Panama’s power system, making it a potential case study.
3. Methodology
Figure 1 provides a high-level overview of the framework proposed. The framework consists of two deep learning methods based on CNNs and LSTM that were evaluated for forecasting the peak load one month ahead. Therefore, a potential case study on the Panama’s power system was presented to validate the models. First, historical data on the peak demand from 2004 to 2019 was collected to analyze the current electricity demand trend for Panama and determine which methods were the most suitable for modeling this problem. In addition, this study collected monthly data for several input features to understand how they were related to the peak demand. Next, the data was preprocessed and transformed into the appropriate input shape prior to training the models. The data preprocessing consisted of mainly two steps: normalization and partitioning the data into a training and test set. First, the training data is fed to the corresponding CNN and LSTM models, which enabled feature extraction and sequential learning, respectively. The CNNs consisted of three different architectures based on multivariate CNN, multivariate CNN-LSTM and multihead CNN, which are explained more in-depth in
Section 6. The models were built in Keras, an open source tool for neural networks and deep learning.
Section 6 also provides a more detailed description of the architecture of the models. For the CNN models, it was important to define the hyperparameters such as the number of convolutional layers and pooling layers, along with the number of filters and filter size. In the case of LSTM, we had to define the number of LSTM layers and hidden units.
In order to provide a fair comparison, the assessment of the models was rigorously conducted in two steps: without and with Gaussian noise. Therefore, the models were first trained without Gaussian noise to gain a better insight of the model performance. The models were evaluated on the test set, using four performance metrics that are explained in
Section 7. Next, we proceeded to evaluate the robustness of the CNN and LSTM models against Gaussian noise. Finally, we selected the best model for forecasting the peak demand.
3.1. Convolutional Neural Networks
Convolutional neural networks are a special type of deep learning networks that have rapidly transformed the computer vision field, achieving state of the art performance in image classification and object detection tasks. Currently, it is one of the most established methods for spatial pattern recognition. Their success was first demonstrated in 1989 by LeCun, who developed the LeNeT model for classifying handwritten digits [
23]. In the last decade, CNNs have been actively researched in terms of representation learning, depth, and pooling operations [
24], which have contributed to their remarkable progress. Recently, remote sensing and medical images are becoming more increasingly available in big data repositories, which brings new requirements for novel methods that can leverage data structures with increased spatial–spectral dimensions. Therefore, CNNs are being more widely introduced and challenged to support image segmentation and classification.
CNNs are also becoming more attractive for solving time series forecasting problems such as wind speed prediction. Time series data collected from energy systems are very nonlinear and granular. They can exhibit not only sequential but also spatial patterns, which cannot be captured by RNNs. Therefore, CNNs have been extensively investigated to understand how they can better handle univariate and multivariate time series data. Depending on the feature extraction complexity, different CNN structures can be applied such as 1D CNN, two-dimensional (2D) CNN, and three-dimensional (3D) CNN. The 1D CNN, in particular, has been mainly studied for time series analysis [
25,
26,
27], human motion recognition, and signal processing applications, in which the filter moves only in one direction. Many authors also developed hybrid CNN-LSTM models to perform deeper feature extraction.
CNN uses a hierarchical learning approach, which starts by extracting the lower-level features in the first layers followed by more abstract features in the last layers. Two main attributes that distinguish CNNs from traditional neural networks are local connectivity and parameter sharing. One problem that originates in fully connected networks is that more parameters need to be learned. Therefore, CNNs take basic feed-forward ANN one step forward, by connecting the neurons to only local regions in the spatial input data. The weights of the neurons are shared across the space, which can help to segment the regions that contain similar local features. Therefore, locally connected layers can significantly reduce the number of learnable parameters and the training time.
There are four main components and operations that are core to a CNN which are: convolutional layers, nonlinearity, pooling layer, and fully connected layers.
Figure 2 demonstrates the overview of the CNN architecture.
The first layer of the CNN is the convolutional layer, which consists of constructive filters that are convolved with the inputs as they slide over the dataset, generating a set of hierarchical feature maps of reduced size (
Figure 3). These maps provide a more detailed representation of the different types of features extracted that are relevant for predicting the output.
The filters consist of weights that are learned during training, which help them determine what type of features need to be extracted. Next, the feature maps are passed through an activation function to introduce nonlinearity and learn the input to output nonlinear mapping. Rectified linear unit (ReLU) is an activation function that is commonly used because it can handle the vanishing gradient problem and can converge faster. The output feature maps are then further downsized by applying a second pooling layer, which keeps only the dominant features. The max pooling layer, for example, takes the maximum value of each patch in the feature map. Finally, they pass through fully connected layers to predict the final output. The output of the convolutional layer is provided in Equation (1).
where
is the convolutional layer index,
the sigmoid activation function,
is the weight for the
th feature map,
is the filter size,
is the input vector convolved and
is the bias.
3.2. Long Short-Term Memory Networks
The second model proposed is based on long short-term memory network (LSTM), which is benchmarked against the CNN models. LSTM consists of memory blocks that have a special feature known as the cell state
, which helps them to remember long term information, as demonstrated in
Figure 4.
The cell state runs through the top, which stores information learned from previous time steps. This information is updated at each time step using three main gates that control what information should be added or removed from the cell state. These gates are the forget gate, input gate, and output gate. At each time step, the LSTM block takes two entries which are the previous hidden state and the current input . These are processed by three gates that decide what information is useful to update the cell state.
The forget gate is the first gate in the LSTM that decides on what information to forget or keep from the previous cell state Equation (2).
where
is the output of the forget gate;
,
and
represent the parameters (weights and bias); and
the sigmoid activation function.
The input gate decides what new information will be added to the cell state. This is accomplished in two simultaneous steps, each one with a different activation function. Therefore, the entries are passed through the sigmoid function Equation (3), which outputs the values between 0 and 1. These entries are also passed through the tanh function Equation (4) that creates a vector of new candidate values to update the cell state. The results of both activation functions are multiplied and then added to update the cell state Equation (5). Lastly, the output gate determines what information of the cell state to output based on Equations (6) and (7).
where
is the output of the input gate.
,
, and
are the parameters of the input gate.
where
are the new candidate values and tanh the activation function
where
is the updated cell state
where
,
and
are the weights and bias of the output gate.
where
represents the output vector of the memory cell.
9. Discussion
This study proposed a deep learning framework to predict monthly peak demand of Panama. Therefore, we compared the performance of two deep learning methods, CNNs and LSTM, to address this problem. First, a dataset on monthly peak demand was collected for the time period 2004 to 2019. It was clearly evident that the data exhibited highly nonlinear and noisy patterns. We then investigated a list of input features that could account for the upward nonlinear peak demand trend (
Table 2). The objective of this study was to implement 1D CNNs and LSTM to learn the historical patterns in the data and use this trend to project the future peak demand growth for Panama.
We first introduced a CNN approach consisting of three different architectures which were multivariate CNN, multivariate CNN-LSTM, and multihead CNN. The CNNs were then benchmarked against the LSTM, a current state of the art method for time series forecasting. Through the multivariate CNN, we found that the input sequence of three previous timesteps consisting of four input features were significant for predicting the month ahead peak demand. These features were month, big clients consumption, residential consumption, and the previous peak demand. Therefore, we used the same features and time lag to build the rest of the models. All the models were evaluated on a test set from the time period October 2016 to December 2019 that was very unstable. The model assessment was conducted using two steps: with and without noise.
First, the models were built and trained without noise to determine how well they generalized on the test set. Our initial finding was that the CNNs performed far better than the LSTM. The multivariate CNN outperformed all the models with R2 of 0.92, MSE of 1271.65, MAPE of 1.62%, and MAE of 27.86. The feature extraction process was guided by a convolutional layer that consisted of 32 filters (2 × 1), followed by a max pooling layer (2 × 1). The model followed the peak demand pattern fairly well, but it underestimated the highest peaks that occurred in August and December 2019. Therefore, the features used were not able to fully capture these variations. The year 2019, especially, presented a drastic increase in electricity demand, which raised concerns. Some experts believe that the addition of new projects such as the Panama Metro Line as well as the growth of electricity clients are contributing towards the high demand. Therefore, it is important to consider other features that can account for this growth. One of the limitations of this study was that the data on monthly number of clients was only available from March 2013 and onwards. Therefore, we were not able to incorporate this feature into the models.
With respect to the other models proposed, the multivariate CNN-LSTM overestimated the peaks by far. LSTM consistently struggled to follow the peak demand pattern between the period February 2019 and December 2019, underestimating the demand.
In order to validate our initial assumptions about the performance of the models, we decided to evaluate the robustness of the models against noise. Therefore, the Gaussian noise of 30% intensity level was carefully introduced to the training set. After simulating the models five times, we were able to confirm that CNNs were far more robust than the LSTM. The reason behind this could be that the data exhibited spatial patterns, which the CNN was able to effectively extract. LSTM, on the other side, is more suitable for sequential learning. The multivariate CNN and multihead CNN had similar performance, with average R2 value of 0.9 and 0.89, respectively. The multivariate CNN-LSTM performance was slightly inferior with average R2 of 0.86. We also demonstrated that the CNNs required less training time with an average of 23.4 s for the three models.
We benchmarked our best performing model with respect to the proposed models in the literature. For monthly forecast, the MAPE ranged between 0.07% and 3.94% (
Table 1). Therefore, we are confident that our model achieved state-of-the-art performance, with MAPE of 1.62%. Furthermore, while many authors found LSTM to be powerful, this study demonstrated that LSTM was not effective for this particular problem and it performed better when it was combined in a hybrid CNN-LSTM architecture.
10. Conclusions and Future Work
As modern power systems continue to face increasing peak demand due to many factors such as larger consumer loads, rapid acceleration of EVs and weather variations, the need for more accurate forecasting methods has never been greater. The peak demand exhibits a nonlinear and noisy trend, making it a very complex problem that needs to be attended faster. Deep learning is emerging with better performance to support these areas that are very nonlinear, noisy, and present diverse patterns. CNNs, especially, have received increased attention from the computer vision community, due to their outstanding performance in image classification and object recognition. This trend has favored their implementation in other areas such as time series forecasting for leveraging spatial–temporal data. LSTM, on the other side, has quickly become the state-of-the-art method for time series applications due to their capability for processing sequential information.
This paper introduced two deep learning methods, based on 1D CNNs and LSTM, to predict the monthly peak demand for Panama.
The main finding of this study was that CNNs achieved superior performance in both accuracy and robustness compared to LSTM, with short computational time. The reason behind this was that CNNs were able to extract and learn the complex patterns from the input features. Therefore, the implications of this study are significant since we proved that CNNs can effectively support time series forecasting problems, especially when the data exhibits spatial patterns.
The multivariate CNN was the best performing model that provided state-of-the-art accuracy with R2 of 0.92, MSE of 1271.65, MAPE of 1.62%, and MAE of 27.86. Although many studies have demonstrated LSTM to be powerful for predicting the peak demand pattern, we did not find it to be effective for this particular problem. However, LSTM performed better when it was combined in a hybrid CNN-LSTM structure, to decode the sequential patterns of the features extracted by the CNN.
The results of the multivariate CNN model are favorable for predicting the peak demand. However, we believe that there is still room for improvement. Our future work has several directions. First, we propose to improve the model by incorporating other variables that can better account for the highest peaks that occurred in the months of August and December 2019. This study suggests adding variables such as CDD and monthly number of tourists traveling to Panama. Many models have incorporated CDD to better capture the effect of temperature on peak demand. It is also important to investigate other sectors that could be contributing to the peak demand growth, such as the Panama Metro, which started operation in April 2019. Another interesting direction for this study will be to benchmark these models against ensemble machine learning methods such as gradient boosting and ANN-SVR, which are being more widely adopted in the literature to enhance forecasting accuracy. Lastly, we will consider CNNs to support other areas of the energy sector, such as short-term load forecasting (STLF) and renewable prediction which are becoming more prominent. STLF, in particular, has become an active area of research to help improve load smoothing and peak demand shaving of buildings.