Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models

Tang, Bingbing; Hu, Jie; Yang, Mei; Zhang, Chenglong; Bai, Qiang

doi:10.3390/app142411606

Open AccessArticle

Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models

by

Bingbing Tang

¹,

Jie Hu

^1,2,*

,

Mei Yang

¹,

Chenglong Zhang

³ and

Qiang Bai

⁴

¹

College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China

²

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China

³

School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China

⁴

School of Mechanical Engineering, Guiyang University, Guiyang 550002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(24), 11606; https://doi.org/10.3390/app142411606

Submission received: 19 November 2024 / Revised: 10 December 2024 / Accepted: 11 December 2024 / Published: 12 December 2024

(This article belongs to the Special Issue Advances in Neural Networks and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Short-Term Load Forecasting (STLF) is essential for the efficient management of power systems, as it improves forecasting accuracy while optimizing power scheduling efficiency. Despite significant recent advancements in STLF models, forecasting accuracy in high-volatility regions remains a key challenge. To address this issue, this paper introduces a hybrid load forecasting model that integrates the Long Short-Term Memory Network (LSTM) with the Stochastic Configuration Network (SCN). We first verify the Universal Approximation Property of SCN through experiments on two regression datasets. Subsequently, we reconstruct the features and input them into the LSTM for feature extraction. These extracted feature vectors are then used as inputs for SCN-based STLF. Finally, we evaluate the performance of the LSTM-SCN model against other baseline models using the Australian Electricity Load dataset. We also select five high-volatility regions in the test set to validate the LSTM-SCN model’s advantages in such scenarios. The results show that the LSTM-SCN model achieved an RMSE of 56.970, MAE of 43.033, and MAPE of 0.492% on the test set. Compared to the next best model, the LSTM-SCN model reduced errors by 6.016, 8.846, and 0.053% for RMSE, MAE, and MAPE, respectively. Additionally, the model consistently outperformed across all five high-volatility regions analyzed. These findings highlight its contribution to improved power system management, particularly in challenging high-volatility scenarios.

Keywords:

Short-Term Load Forecasting; high-volatility regions; Long Short-Term Memory; Stochastic Configuration Network; universal approximation property

1. Introduction

Power load forecasting is an important area of research within the field of power systems. The field has received increasing academic attention due to its critical role in the efficient and economical operation of power systems [1]. Electrical load forecasting can be categorized into three categories based on the forecasting horizon [2]: Short-Term Load Forecasting (STLF), Medium-Term Load Forecasting (MTLF), and Long-Term Load Forecasting (LTLF). STLF, which ranges from one hour to one week, is significantly influenced by weather conditions and recent load data. These forecasts play a crucial role in real-time grid management. MTLF, typically spanning from one week to several months, is influenced by factors such as seasonal variations in electricity consumption, holidays, and working days. These forecasts are essential for maintenance scheduling and fuel reserve management. LTLF, which extends beyond one year, is influenced by factors such as demographic changes, economic growth, and energy policies. These forecasts are crucial for system planning and optimization. This paper focuses on STLF. The proposed model leverages deep learning techniques, specifically integrating the Stochastic Configuration Network (SCN), to achieve high accuracy in load forecasting results.

With the growth in global electricity demand, improving the accuracy of load forecasting has become increasingly critical [3]. Many factors affect electricity load, such as regional differences, socio-economic activities, weather, and prices [4]. Therefore, power load data are characterized by randomness, volatility, periodicity, and diversity. Extracting the intrinsic patterns of load change from historical power load data and developing an accurate forecasting method are key to successful load forecasting [5]. Various power load forecasting methods have been proposed to address specific challenges, each offering unique advantages based on different technologies, algorithms, or data types. Currently, power load forecasting methods are mainly divided into four categories [6]: statistical methods, artificial intelligence techniques, knowledge-based expert systems, and hybrid approaches.

Statistical methods require the construction of explicit data models to represent the relationship between the electric load and contributing factors. Classical statistical methods include multiple regression analysis, exponential smoothing, and stochastic time series, among others. Krstonijević [7] proposed an adaptive load forecasting method based on the Generalized Additive Model (GAM) and big data estimation techniques. Shi et al. [8] introduced a very short-term bus load forecasting model that utilizes Phase Space Reconstruction (PSR) and a Deep Belief Network (DBN). Barta et al. [9] aimed to establish a national energy consumption forecasting framework using open-access data from the European Network of Transmission System Operators for Electricity (ENTSO-E). To construct the forecast density, they employed Gradient Boosting Regression Trees (GBRTs) and conducted benchmarking based on actual load data and forecasts provided by each country. Wijaya et al. [10] extended the Generalized Additive Model (GAM) to GAM2, where a second GAM is applied to the squared residuals. Many statistical methods rely on linear models, which limit their ability to handle nonlinear relationships and complex patterns. Therefore, the nonlinear characteristics of complex electric loads cannot be precisely characterized using traditional methods [11].

Artificial intelligence methods used in power load forecasting include Artificial Neural Networks (ANNs), fuzzy logic, neuro-fuzzy systems, and Support Vector Machines (SVMs) [12,13,14]. Andriopoulos et al. [15] leveraged Convolutional Neural Networks (CNNs) to optimize neural network hyperparameters for power load forecasting. Duan et al. [16] proposed a power load forecasting model based on the Sparrow Search Algorithm (SSA), Variational Mode Decomposition (VMD), attention mechanism, and Long Short-Term Memory (LSTM). Initially, the SSA is used to optimize VMD parameters; then, LSTM is employed for load forecasting, and an attention mechanism is introduced to enhance the model. Pavlatos et al. [17] applied Bidirectional Long Short-Term Memory (BiLSTM) to power load forecasting and proposed combining bidirectional memory with advanced neural network architectures. Shi et al. [18] applied a Pooling-Based Deep Recurrent Neural Network (PDRNN) to household load forecasting and suggested that adding more hidden layers to the neural network could improve forecasting performance. Chen et al. [19] applied an improved deep residual network to power load forecasting to enhance prediction results. Although the above models show good performance on their respective datasets, they suffer from limited applicability and poor generalization ability [1].

Expert systems, a notable achievement in artificial intelligence, rely on rule-based logic to mimic the decision-making processes of domain experts. These systems are commonly used as decision support tools. Qiu et al. [20] applied expert system theory to analyze load transfer in regional power grids during transformer, busbar, and line faults, providing load transfer schemes. A study utilized a knowledge-based reasoning expert system to assist decision-makers in selecting the most appropriate load forecasting model for long-term planning in power systems [21]. Additionally, researchers have developed a rule-driven method that incorporates prior knowledge from experts on load curves, integrating this knowledge into statistical models to enhance forecasting accuracy [22]. These research outcomes demonstrate that expert systems play a significant role in power load forecasting, enhancing the credibility and precision of forecasting results by combining expert insights with statistical analysis.

Electricity load data are characterized by temporal correlation, volatility, and uncertainty, with volatility leading to significant forecasting errors in models [23]. Statistical methods and traditional machine learning approaches often fail to account for these characteristics simultaneously, resulting in insufficient load forecasting accuracy, indicating room for further improvement [24]. A single method to deal with power load data will result in low computational efficiency, high computational complexity, and high error rates. Over the years, numerous scholars have developed hybrid load forecasting models with the goal of achieving higher forecasting accuracy and lower error rates. Shen et al. [23] proposed a hybrid model based on the Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), combined with CNN, BiLSTM, and Self-Attention (SA) techniques, along with Wavelet Domain Denoising (WDD) to enhance data smoothing, applied to household electricity load data. This model leverages the spatial feature extraction capabilities of the CNN, the bidirectional temporal sequence extraction capabilities of BiLSTM, and the ability of SA to focus on key historical time points to reduce information loss. Ma et al. [25] proposed a hybrid model based on the CNN, Improved Chaotic Particle Swarm Optimization (ICPSO), and LSTM for electric load forecasting. CNNs are used for feature extraction, ICPSO is employed to optimize LSTM parameters, and LSTM is then used for load forecasting. Goh et al. [26] proposed a forecasting model that combines the CNN and LSTM, where CNNs are used to identify and extract local features from time series data, which are then input into LSTM for STLF. Chen et al. [27] proposed a hybrid load forecasting model integrating a Residual Neural Network (ResNet) and LSTM, using a ResNet for feature extraction and LSTM for STLF. Zhou et al. [28] proposed an improved ARIMA-LSTM model that allocates weights based on the forecasting errors of the two models in the training set. Shin et al. [29] proposed a hybrid approach combining Variational Mode Decomposition (VMD) to decompose complex data into Intrinsic Mode Functions (IMFs) and an RVFL network to predict each IMF, thereby improving load forecasting accuracy and robustness.

The SCN, a stochastic learning method, was proposed by Wang et al. [30] in 2017. Common stochastic learning algorithms include Radial Basis Function (RBF) networks [31] and RVFL networks [32]. The SCN learning algorithm outperforms other methods due to its supervision mechanism [33,34,35,36,37]. However, the lack of supervision mechanisms and the challenge of setting appropriate parameter ranges can lead to poor model performance [38], as demonstrated mathematically in [39]. The SCN exhibits faster convergence and superior forecasting ability compared to other stochastic networks [31]. The SCN has achieved great success in various fields, including software implementation, computer vision, medical data analysis, fault detection and diagnosis, and the modeling and forecasting of various systems. In the realm of power load forecasting, Wang et al. [40] developed a recurrent version of the SCN for time series problems. However, when dealing with power load data, comparing it with only a single model does not sufficiently demonstrate its superiority in handling such data. Our aim is to contribute an effective and accurate hybrid load forecasting model to the field.

The proposed LSTM-SCN approach offers distinct advantages over traditional statistical methods. Conventional statistical models typically assume linearity or stationarity in the data and focus on capturing dependencies and seasonal variations in time series, making them more suitable for simpler load forecasting scenarios. In contrast, LSTM-SCN, as a hybrid model, is capable of handling complex nonlinear relationships and data heterogeneity, enabling it to deliver more accurate predictions in challenging power load forecasting tasks. Compared to existing hybrid models, the LSTM-SCN approach introduces the SCN model innovatively to enhance predictive performance.

The main contributions of this paper are as follows: (1) We propose an innovative fusion method for STLF by combining LSTM and the SCN. We extract features using LSTM, which are then fed into the SCN for forecasting. (2) We introduce the SCN as an alternative to traditional forecasting methods, leveraging its Universal Approximation Property to enhance forecasting accuracy in STLF. (3) We contribute to the literature by comparing the LSTM-SCN model with other baseline models, particularly examining its performance in high-volatility regions, and thereby highlighting its superior predictive accuracy.

The rest of the paper is organized as follows: Section 2 introduces the principles of the SCN, LSTM, and the proposed LSTM-SCN hybrid model. Section 3 validates the Universal Approximation Property of the SCN using two regression datasets. Section 4 applies the LSTM-SCN model to the Australian electricity load dataset and analyzes its performance in comparison with other baseline models. Section 5 concludes the paper.

2. Methodologies

In this section, we provide a comprehensive overview of the underlying principles of the SCN, LSTM, and LSTM-SCN models. To enhance understanding, this glossary provides definitions of symbols and abbreviations frequently used throughout the paper (Table 1).

2.1. Stochastic Configuration Network (SCN)

The SCN, an incremental neural network, was proposed by Wang et al. [30] in 2017. The network construction begins with a single node in the hidden layer. In a supervised manner, the input weights and biases of the hidden layer nodes are randomly initialized. The number of hidden layer nodes is then gradually increased, and the output weights are computed using least squares. The SCN consists of input, hidden, and output layers, as shown in Figure 1.

SCN’s construction process is illustrated in Figure 2. Section 2.1.1 and Section 2.1.2 detail the specific steps of the SCN construction, while Section 2.1.3 presents the Universal Approximation Theorem.

2.1.1. Related Work

Given a training set

{X, Y}

, where

X = {x_{1}, x_{2}, \dots, x_{N}

} represents the input data and

Y = {y_{1}, y_{2}, \dots, y_{N}

} represents the corresponding labels. Specifically,

x_{i} = {[x_{i, 1}, x_{i, 2}, \dots, x_{i, d}]}^{T} \in R^{d}

,

y_{i} = {[y_{i, 1}, y_{i, 2}, \dots, y_{i, m}]}^{t} \in R^{m}

,

i = 1, 2, \dots, N

. Given a target function

f : R^{d} \to R^{n}

, suppose that an SCN with

L_{n} - 1

hidden nodes has already been constructed. Set the default error to

ε^{'}

and the maximum hidden layer node to

L_{m a x}^{(n)}

. The output of the current network is calculated using Formula (1).

f_{L - 1} (X) = \sum_{j = 1}^{L - 1} β_{j} g_{j} (ω_{j}^{T} X + b_{j}), L = 1, 2, \dots, L_{m a x}, f_{0} = 0

(1)

Let

β_{j}

denote the output weight of the

j

-th node in the implicit layer, and

g (\cdot)

denote the activation function. Let

w_{j}

and

b_{j}

denote the input weights and bias, respectively, of the

j

-th node in the implicit layer.

The formula for calculating the residual vector of the current network is as follows:

ε_{L - 1} = f - f_{L - 1} (X) = {[ε_{L - 1, 1} (X), \dots, ε_{L - 1, m} (X)]}^{T}

(2)

2.1.2. Network Model Construction

If

{| | ε_{L - 1} | |}^{2} > ε o r L < L_{m a x}

, then add the

L_{n}

node. The input weights and biases are determined by the Universal Approximation Property according to formula Equation (3), as follows:

h_{L} = {[g_{L} (ω_{L}^{T} x_{1} + b_{L}), g_{L} (ω_{L}^{T} x_{2} + b_{L}), \dots, g_{L} (ω_{L}^{T} x_{N} + b_{L})]}^{T}

(3)

ξ_{L, q} = \frac{{〈ε_{L - 1, q}^{T}, h_{L}〉}^{2}}{{h_{L}}^{T}, h_{L}} - (1 - r - μ_{L}) {‖ε_{L - 1, q}‖}^{2}

(4)

where

q = 1, 2, \dots, m; r \in (0, 1)

,

h_{L}

denotes the output of the

L

-th node in the implicit layer. Let

w_{j}

and

b_{j}

represent the candidate parameters of the

L

-th node. Given a non-negative real number sequence

\{μ_{L}\}

with

\lim_{L \to + \infty} ‖f - f_{L}‖ = 0

and

μ_{L} \leq 1 - r

. The parameter of the candidate node that satisfies the condition

ξ_{L} = \sum_{q = 1}^{m} ξ_{L, q} \geq 0

with the maximum value is used as the parameter for the

L

node.

The output weight is determined through a least squares evaluation, as follows:

β = a r g \min_{β} {| | H β - Y | |}^{2} = H^{+} Y

(5)

where

{‖\cdot‖}_{F}^{2}

denotes the Frobenius norm, while

H^{+}

represents the Moore–Penrose generalized inverse matrix of

H

.

2.1.3. Universal Approximation Theorem

Let

Γ = \{g_{1}, g_{2}, g_{3}, \dots\}

represent a set of real-valued functions, and let

s p a n (Γ)

denote the function space spanned by

Γ

. Assume that

s p a n (Γ)

is dense in the

L_{2}

and

\forall g \in Γ, 0 < ‖g‖ < b_{g}, b_{g} \in R^{+}

.

For

L = 1, 2, 3, \dots

, define

δ_{L} = \sum_{q = 1}^{m} δ_{L, q}, δ_{L, q} = (1 - r - μ_{L}) {‖e_{L - 1, q}‖}^{2}

, where,

0 < r < 1

;

μ_{L} \leq 1 - r, \lim_{L \to + \infty} μ_{L} = 0

.

For the random basis function

g_{L}

, the following inequality constraints are satisfied:

{〈e_{L - 1, q}, g_{L}〉}^{2} \geq b_{g}^{2} δ_{L, q}, q = 1, 2, \dots, m

(6)

The output weights of the hidden layer nodes are as follows:

β = [β_{1}, β_{2}, \dots, β_{L}] = a r g m i n ‖f - \sum_{j = 1}^{L} β_{j} g_{j}‖

(7)

Then,

\lim_{L \to + \infty} ‖f - f_{L}‖ = 0

.

2.2. Long Short-Term Memory (LSTM)

LSTM is a specialized type of Recurrent Neural Network (RNN) architecture that excels in time series analysis and forecasting. LSTM’s ability to learn long-term dependencies enables it to outperform traditional RNNs in capturing such dependencies within sequential data. LSTM was originally designed to address the issues of gradient vanishing and explosion in traditional RNNs when processing long sequences. By introducing a special memory unit, LSTM can retain information from previous states.

The LSTM cell incorporates three gating mechanisms: the forget gate, the input gate, and the output gate. The role of the forget gate is to determine whether information should be discarded or retained in the cell state, based on the hidden state from the previous time step and the current input. The input gate consists of a sigmoid layer and a tanh layer. The sigmoid layer decides which values to update, and the tanh layer generates a new vector of candidate values for updating the cell state. The output gate determines the value of the next hidden state. The interaction of these three gates enables LSTM to efficiently capture long-term dependencies in time series data while mitigating the effects of noise. The LSTM unit is depicted in Figure 3. The mathematical representation of the LSTM is presented below.

\{\begin{matrix} \begin{matrix} f_{t} = σ (W_{f} \cdot [{h_{t - 1}, x}_{t}] + b_{f}) \\ i_{t} = σ (W_{i} \cdot [{h_{t - 1}, x}_{t}] + b_{i}) \end{matrix} \\ \begin{matrix} C_{t}^{'} = t a n h (W_{C} \cdot [{h_{t - 1}, x}_{t}] + b_{C}) \\ C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot C_{t}^{'} \\ O_{t} = σ (W_{O} \cdot [{h_{t - 1}, x}_{t}] + b_{O}) \end{matrix} \\ h_{t} = O_{t} \cdot t a n h (C_{t}) \end{matrix}

(8)

2.3. LSTM-SCN

The LSTM-SCN model proposed in this paper consists of two main components: the LSTM, which acts as a feature extractor, and the SCN, which performs load forecasting using these features. The structure of the model is illustrated in Figure 4.

Data are preprocessed before being input into the model. Missing values, outliers, and noise in the data can affect the accuracy of the results, necessitating preprocessing operations. Additionally, data normalization can improve the performance of the LSTM model and prevent issues with vanishing or exploding gradients.

The proposed model employs an LSTM network for feature extraction. The internal architecture of the LSTM network consists of two layers [27]: the first layer contains 1024 units, and the second layer contains 256 units. To mitigate overfitting, dropout is applied between the two LSTM layers. The output from the second layer, representing the extracted features, serves as the final output of the model.

This paper dispenses with the fully connected layers commonly used in most models, opting instead for an innovative approach that employs the SCN for forecasting. In traditional models that use fully connected layers for forecasting, it is necessary to frequently set and adjust the number of hidden layers. The SCN allows for a relatively large number of hidden layer nodes, enabling the model to autonomously identify the optimal network structure within the permissible range of node configurations. This process not only consumes time but also requires significant computational resources. In contrast, the SCN structure is determined by setting a maximum number of hidden nodes and a tolerance level. The number of hidden nodes in the SCN will incrementally increase until the specified tolerance is achieved or the maximum node limit is reached. The SCN is set with the following parameters: the maximum number of hidden layers

L_{m a x}

= 200, weights scale sequence Lambdas = [0.5, 1, 5, 10, 30, 50, 100, 150, 200, 250], training tolerance ε = 0.001, contractive sequence r = [0.9, 0.99, 0.999, 0.9999, 0.99999, 0.999999], and maximum number of candidate nodes

T_{m a x}

= 100.

3. Experimental Confirmation of SCN’s Universal Approximation Property

In this section, we validate the universal approximation property of the SCN through experiments. The common predictive methods employed in existing models for power load data processing include Support Vector Regression (SVR) [41], Random Vector Functional Link Networks (RVFLs) [29], and Fully Connected Layers (FCs) [27]. In addition, we include Linear Regression (LR), Extreme Gradient Boosting (XGBoost), and Gradient Boosting Regression Tree (GBRT) models for comparison. Wang et al. [30] have demonstrated the universal approximation property of the SCN in their studies on three regression datasets: Stock, Concrete, and Compactiv. These three datasets are all from the KEEL (KEEL. Available online: http://www.keel.es/ (accessed on 22 June 2024)) database. In this study, we validate the universal approximation property of the SCN on the Stock and Concrete regression datasets, and compare the performance of various models on these datasets. Specifically, we randomly selected 80% of the samples in each dataset as the training set, and the remaining 20% as the test set.

3.1. Evaluation Metric

The performance of the proposed model is evaluated using three commonly utilized metrics: the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). These metrics are widely recognized for assessing the accuracy of forecasting models. In the fourth part of the experiment, these metrics are also applied to further evaluate the model’s performance. The RMSE quantifies the magnitude of discrepancies between predicted and actual values, the MAE calculates the mean of absolute differences, and the MAPE measures percentage errors. The formulas for these metrics are as follows:

R M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(9)

M A E = \sqrt{\frac{1}{n} \sum_{I = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(11)

3.2. Parameter Settings

In this section, we use the sigmoid activation function for the SCN, RVFL, and FC models. The parameter settings for each model are as follows [30]: for the FC model, we set the number of iterations to 200 and specify 200 nodes in the hidden layer. For the RVFL model, we also set the number of iterations to 200 and use 200 hidden nodes. For SVR, we determine the optimal parameter combinations through a grid search method, and then apply these optimal parameters to perform regression analysis. For GBRTs, we set the number of trees to 200.

3.3. Experimental Results

After conducting 50 independent experiments on each model, we obtained the average performance of these models on the two datasets, as shown in Table 2. The performance of the SVR model is based on the optimal parameters obtained by the grid search method.

Table 2 presents the average performance of various models across two datasets, obtained after 50 independent experiments, with the performance values for the SVR model derived from the optimal parameters obtained through the grid search method. The experimental results demonstrate that on the Stock dataset, the SCN model achieves the best performance in both RMSE and MAPE metrics, being 0.101 and 0.166% lower, respectively, than those of the second-best model. In terms of the MAE metric, the SCN ranks second, with a value only 0.054 higher than that of the top-performing model. On the Concrete dataset, the SCN achieves the best performance in terms of the MAE and MAPE metrics. In conclusion, the superior forecasting performance of the SCN demonstrates the rationality and innovativeness of the LSTM-SCN model proposed in this paper.

To visually highlight the advantages of the SCN in forecasting, we present the prediction plots, performance indicator boxplots, and loss function plots for the SCN and other models on the Stock dataset. Figure 5 compares the predicted and actual values of the SCN with those of the second-best performing model, SVR, in the test set. It is evident that the SCN forecasts the data more accurately than SVR.

Figure 6 presents the box plots of the RMSE, MAE, and MAPE results for each model over 50 experiments, as shown in Figure 6a–c. Due to poor performance, the RVFL results are not displayed. The plots show that the SCN consistently achieves the lowest values across all metrics, indicating superior predictive accuracy and strong stability with an approximately symmetric distribution of its values.

Among all the models, the SCN, FC, and RVFL require iterative training to achieve optimal performance, and these three models are commonly used in hybrid electric load forecasting models. Therefore, we compared the training errors of the SCN with those of FC and RVFL. Figure 7 illustrates the error plots for the SCN, FC, and RVFL on the training and testing sets. The SCN model optimizes its network structure by progressively increasing the number of hidden layer nodes, where each addition of a hidden node corresponds to a training iteration. Therefore, for the SCN, the x-axis represents the number of hidden layer nodes. The experimental results indicate that the SCN and FC models exhibit roughly comparable convergence speeds during training, but the SCN achieves a lower loss function value at the end of training. This suggests that the SCN is more effective in finding the optimal network structure. In contrast, while the RVFL model has a simpler training process, it converges more slowly and ends with a higher loss function value than the SCN and FC models.

4. Experiments

This section utilizes the Queensland electricity load dataset (Australia Load. Available online: https://github.com/weiran4/AustraliaData (accessed on 22 June 2024)) to evaluate the performance of the proposed model. The effectiveness of the model in predicting the electricity load is also compared against other established baseline models.

4.1. Experimental Setup

The experiments presented in this paper were conducted on the Ubuntu 22.04 LTS platform within an experimental environment that included Python 3.10 and PyTorch 2.1.0. The precise specifications of the experimental setup are detailed in Table 3.

4.2. Dataset

The dataset contains electrical load data from Queensland, Australia, spanning from 1 January, 2006, to 31 December, 2010. Sampled at 30 min intervals, the dataset provides 48 sampling points per day and contains 87,649 entries. The dataset includes tariff, temperature, humidity, and power load information, as detailed in Table 4. In this section, data from 1 January, 2006, to 30 June, 2009, are used as the training set; data from 1 July, 2009, to 31 December, 2009, are used as the validation set; and data from the year 2010 are used as the test set.

4.3. Data Preprocessing

We preprocessed the dataset to ensure the accuracy and reliability of subsequent analyses. This involved cleaning the data to remove inconsistencies or errors, extracting time series features to capture temporal patterns, and normalizing the data to standardize the scale of variables.

4.3.1. Data Cleaning

In the data cleaning section, we inspected the data to confirm that there were no missing values in the dataset. Additionally, we identified potential outliers by calculating the interquartile range (IQR) and applying the 1.5 × IQR rule. For detected outliers, we replaced them with the mean of the preceding and succeeding data points. This approach reduces the impact of outliers on the overall data distribution while maintaining data continuity and stability.

4.3.2. Time Series Feature Construction

It is recognized that power load data exhibit significant time series characteristics and interdependence between neighboring data points. Therefore, in addition to selecting environmental factors such as weather and humidity as features, we employ a sliding window method to construct features from the power load data. Specifically, we define a sliding window of 48 time units [28] that moves step by step along the time series. Consequently, the feature vector at each time point includes data from the previous 48 time units, capturing the dynamic changes in the historical data at that point. This process is illustrated in Figure 8. For instance, the feature vector for the nth data point is

α_{n} = [{L o a d}_{n - 48}, {L o a d}_{n - 47}, \dots, {L o a d}_{n - 1}]

. Using the sliding window method to construct features enhances the model’s ability to capture temporal dependencies and reveals potential patterns and trends in the data. This approach provides richer and more accurate feature information for electricity load forecasting. The feature vectors for the first 48 data points are incomplete due to missing values within the sliding window. Therefore, we remove the first 48 records from the dataset after feature construction to prevent any potential impact of these missing values on subsequent data analysis and model training. The 48 load features we constructed, combined with factors such as weather and humidity, result in a total of 55 features for each electricity load value.

4.3.3. Data Normalization

For the power load data, we normalize both the training and test sets before inputting them into the model. We use the min–max normalization method to scale the data to the [0, 1] range with the following formula:

X_{n o r m} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(12)

The normalized value

X_{n o r m}

is calculated from the original value

X

, where

X

is the original value,

X_{m a x}

is the maximum value, and

X_{m i n}

is the minimum value in the dataset.

In this section, we back-normalize the model forecasts to revert the data to their original scale. All model performance metrics are calculated on the back-normalized data, and the visualization of results is also based on these data. Performing the inverse normalization provides a more intuitive and clearer view of the model’s forecasting performance. The inverse normalization formula is as follows:

X = X_{n o r m} \times (X_{m a x} - X_{m i n}) + X_{m i n}

(13)

4.4. Experimental Results

We compared the proposed LSTM-SCN model with other baseline models, and Table 5 presents the performance metrics of each model on the forecasting set. The LSTM-SCN model demonstrated the best performance across all metrics, with an RMSE of 56.970, an MAE of 43.033, and a MAPE of 0.492%. The RMSE and MAPE values of the LSTM-SCN model are reduced by 6.016 and 0.053%, respectively, compared to the second-best performing CNN-LSTM model. Additionally, the MAE of the LSTM-SCN model is decreased by 8.846 compared to the second-best performing GRU model. These results demonstrate the superior performance of the LSTM-SCN model compared to the others.

In the field of electricity load forecasting, while most models achieve good forecasting performance in low-volatility regions, their true performance is often revealed in high-volatility regions. To fully evaluate LSTM-SCN’s performance, we selected five high-volatility periods from the last month of the test set, each with 40 data points. Figure 9 and Table 6 detail the selection of these regions and their data ranges, respectively. This detailed analysis enables us to more accurately assess the model’s applicability and accuracy in real-world scenarios with high volatility.

Table 7 displays the performance metrics for each model in the five selected high-volatility regions. The analysis results indicate that the LSTM-SCN model demonstrates optimal performance across all five high-volatility regions. In Region 1, we compare the LSTM-SCN model to the second-best performing model. The LSTM-SCN model achieves reductions of 37.901 in RMSE, 29.781 in MAE, and 0.391% in MAPE, respectively. In Regions 2, 3, 4, and 5, the reductions in these three metrics for the LSTM-SCN model are as follows: 39.923 in RMSE, 31.502 in MAE, and 0.389% in MAPE for Region 2; 42.201 in RMSE, 29.990 in MAE, and 0.420% in MAPE for Region 3; 34.951 in RMSE, 24.734 in MAE, and 0.332% in MAPE for Region 4; and 34.552 in RMSE, 24.074 in MAE, and 0.325% in MAPE for Region 5. These results demonstrate that the LSTM-SCN model exhibits excellent performance in handling high-volatility regions, thereby highlighting the advantages of the LSTM-SCN model in the field of power load forecasting.

To explain why the LSTM-SCN model exhibits optimal performance across the five high-volatility regions, we offer the following analysis:

(1): Adaptive Dynamic Network Structure: The LSTM-SCN model relies on a supervision mechanism during prediction, whereas other models depend on the backpropagation mechanism of fully connected layers. Backpropagation requires gradient optimization and relies on a fixed network structure, while the supervision mechanism can dynamically adjust the network structure based on data and task requirements. This adaptability allows the model to construct an optimal network architecture during learning, avoiding overfitting and unnecessary complexity.
(2): Robustness to Gradient Issues: Backpropagation in deep networks often faces challenges like vanishing or exploding gradients, which hinder model convergence and learning efficiency. In contrast, the supervision mechanism does not rely entirely on gradients but adjusts through a feedback-driven process, making the model more robust to these issues.
(3): Enhanced Interpretability: The supervision mechanism offers higher interpretability. Its dynamic structural adjustments can reveal the relationship between data features and model behavior, providing a more transparent and efficient learning process.

Figure 10 illustrates the forecasting performance of each model in Region 1. In the forecast plots for Region 1, the LSTM-RVFL model exhibits a significant deficiency in forecasting performance compared to other models. The data in this region are highly volatile, which results in most models being unable to accurately predict the changing trend of the power load at the turning points. However, the LSTM-SCN model effectively captures the changes in power load, with its predicted curve closely matching the actual curve.

Figure 11 depicts the forecasting performance of each model in Region 2. Although the data volatility in Region 2 is lower than in Region 1, the forecast curves of all models except the LSTM-SCN deviate at the data turning points, indicating a deficiency in capturing the data trend. The forecasting curve of the LSTM-SCN model is essentially consistent with the actual curve, demonstrating its superior predictive capability.

Figure 12, Figure 13 and Figure 14 illustrate the forecasting performance of each model in regions 3, 4, and 5, each of which represents distinct high-volatility characteristics. In these regions, the LSTM-SCN model consistently demonstrates superior predictive performance, with its predicted curve closely aligned with the actual curve and effectively capturing the underlying data trends. This result corroborates findings from the first two regions, providing further evidence of the LSTM-SCN model’s robustness and dependability in accurately forecasting the power load in high-volatility regions. The model’s ability to reliably track fluctuations across diverse regions reinforces its applicability and effectiveness in handling complex, volatile power load scenarios.

5. Conclusions

With the growing demand for higher accuracy in STLF in power systems, particularly in high-volatility regions, this study introduces a hybrid forecasting model based on the LSTM-SCN. Abandoning the traditional fully connected layer forecasting method, this study innovatively adopts the SCN, known for its universal approximation property, for the forecasting process. This approach not only reduces the time and effort required to set up the network structure by eliminating the need for repeated attempts but also enables the identification of the optimal network structure among various configurations. The LSTM-SCN model utilizes the LSTM to extract data features, which are then used as inputs to the SCN for forecasting. Before commencing the experiments, we validated the Universal Approximation Property of the SCN on two regression datasets. In this study, we selected the Australian Electricity Load dataset for the experiments and used three metrics—the RMSE, MAE, and MAPE—to evaluate the model’s performance. The RMSE, MAE, and MAPE values of the LSTM-SCN model on the Australian dataset are 56.970, 43.033, and 0.492%, respectively, outperforming other models. To evaluate the predictive performance of the LSTM-SCN model in high-volatility regions, five high-volatility areas were selected from the test set for analysis. The LSTM-SCN model achieved the best performance metrics across all selected regions. In Region 1, the RMSE, MAE, and MAPE values for the LSTM-SCN model were 5.465, 4.851, and 0.062%, respectively. Compared to the second-best model, the LSTM-SCN model achieved reductions in errors of 37.901, 29.781, and 0.391% for the RMSE, MAE, and MAPE, respectively. Similarly, in Region 2, the LSTM-SCN model obtained RMSE, MAE, and MAPE values of 7.285, 5.095, and 0.062%, respectively, with reductions in errors of 39.923, 31.502, and 0.389% compared to the second-best model. For the remaining three regions, the LSTM-SCN model consistently outperformed all other models across the three evaluation metrics.

In summary, the proposed LSTM-SCN hybrid model demonstrates strong performance in short-term power load forecasting, particularly in capturing data variation trends in high-volatility regions, thereby improving forecasting accuracy in these challenging areas. Additionally, the LSTM-SCN model offers a novel solution for STLF, with significant practical application value and promising development potential. This model has the potential to drive new breakthroughs in the field of power load forecasting.

Author Contributions

Conceptualization, B.T. and C.Z.; methodology, B.T and Q.B.; software, B.T. and Q.B; validation, B.T., C.Z., and M.Y.; data curation, B.T.; writing—original draft preparation, B.T.; writing—review and editing, B.T. and J.H.; visualization, B.T.; supervision, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guizhou Provincial Science and Technology Fund (Qian Kehe Basic-ZK [2021] General 337), supported by the Fund of the State Key Laboratory of Public Big Data, Guizhou University (No. PBD2023-35), and the Graduate Program of Guizhou University of Finance and Economics (2022ZXSY036).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zeng, P.; Jin, M.; Elahe, M.d.F. Short-Term Power Load Forecasting Based on Cross Multi-Model and Second Decision Mechanism. IEEE Access 2020, 8, 184061–184072. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Q.; Zhang, N.; Wang, Y. Conditional Residual Modeling for Probabilistic Load Forecasting. IEEE Trans. Power Syst. 2018, 33, 7327–7330. [Google Scholar] [CrossRef]
Mei, T.; Si, Z.; Yan, J.; Lu, L. Short-Term Power Load Forecasting Study Based on IWOA Optimized CNN-BiLSTM. In Proceedings of the International Conference on Intelligent Computing, Tianjin, China, 5–8 August 2024. [Google Scholar]
Li, K.; Huang, W.; Hu, G.; Li, J. Ultra-Short Term Power Load Forecasting Based on CEEMDAN-SE and LSTM Neural Network. Energy Build. 2023, 279, 112666. [Google Scholar] [CrossRef]
Wan, A.; Chang, Q.; AL-Bukhaiti, K.; He, J. Short-Term Power Load Forecasting for Combined Heat and Power Using CNN-LSTM Enhanced by Attention Mechanism. Energy 2023, 282, 128274. [Google Scholar] [CrossRef]
Wan, C.; Zhao, J.; Song, Y.; Xu, Z.; Hu, Z. Photovoltaic and Solar Power Forecasting for Smart Grid Energy Management. CSEE J. Power Energy Syst. 2015, 1, 38–46. [Google Scholar] [CrossRef]
Krstonijević, S. Adaptive Load Forecasting Methodology Based on Generalized Additive Model with Automatic Variable Selection. Sensors 2022, 22, 7247. [Google Scholar] [CrossRef]
Shi, T.; Mei, F.; Lu, J.; Lu, J.; Pan, Y.; Zhou, C.; Wu, J.; Zheng, J. Phase Space Reconstruction Algorithm and Deep Learning-Based Very Short-Term Bus Load Forecasting. Energies 2019, 12, 4349. [Google Scholar] [CrossRef]
Barta, G.; Nagy, G.; Papp, G.; Simon, G. Forecasting Framework for Open Access Time Series in Energy. In Proceedings of the 2016 IEEE International Energy Conference (ENERGYCON), Leuven, Belgium, 4–8 April 2016; IEEE: Leuven, Belgium, 2016; pp. 1–6. [Google Scholar]
Wijaya, T.K. Forecasting Uncertainty in Electricity Demand. In Proceedings of the AAAI-15 Workshop on Computational Sustainability, Austin, TX, USA, 26 January 2015. [Google Scholar]
Farrag, T.A.; Elattar, E.E. Optimized Deep Stacked Long Short-Term Memory Network for Long-Term Load Forecasting. IEEE Access 2021, 9, 68511–68522. [Google Scholar] [CrossRef]
Wen, Z.; Xie, L.; Fan, Q.; Feng, H. Long Term Electric Load Forecasting Based on TS-Type Recurrent Fuzzy Neural Network Model. Electr. Power Syst. Res. 2020, 179, 106106. [Google Scholar] [CrossRef]
Guan, Y.; Li, D.; Xue, S.; Xi, Y. Feature-Fusion-Kernel-Based Gaussian Process Model for Probabilistic Long-Term Load Forecasting. Neurocomputing 2021, 426, 174–184. [Google Scholar] [CrossRef]
Kazemzadeh, M.-R.; Amjadian, A.; Amraee, T. A Hybrid Data Mining Driven Algorithm for Long Term Electric Peak Load and Energy Demand Forecasting. Energy 2020, 204, 117948. [Google Scholar] [CrossRef]
Andriopoulos, N.; Magklaras, A.; Birbas, A.; Papalexopoulos, A.; Valouxis, C.; Daskalaki, S.; Birbas, M.; Housos, E.; Papaioannou, G. Short Term Electric Load Forecasting Based on Data Transformation and Statistical Machine Learning. Appl. Sci. 2020, 11, 158. [Google Scholar] [CrossRef]
Qinwei, D.; Xiangzhen, H.; Zhu, C.; Xuchen, T.; Zugang, L. Short-Term Power Load Forecasting Based on Sparrow Search Algorithm-Variational Mode Decomposition and Attention-Long Short-Term Memory. Int. J. Low-Carbon Technol. 2024, 19, 1089–1097. [Google Scholar]
Pavlatos, C.; Makris, E.; Fotis, G.; Vita, V.; Mladenov, V. Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network. Electronics 2023, 12, 4652. [Google Scholar] [CrossRef]
Shi, H.; Xu, M.; Li, R. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-Term Load Forecasting With Deep Residual Networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef]
Qiu, X.; Zhao, Q.; Wang, Y.; Tian, J.; Ding, H.; Zhang, J.; Zhao, H. Load Transfer Analysis of Regional Power Grid Based on Expert System Theory. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; pp. 557–561. [Google Scholar]
Kandil, M.S.; El-Debeiky, S.M.; Hasanien, N.E. Long-Term Load Forecasting for Fast Developing Utility Using a Knowledge-Based Expert System. IEEE Power Eng. Rev. 2002, 22, 78. [Google Scholar] [CrossRef]
Arora, S.; Taylor, J.W. Short-Term Forecasting of Anomalous Load Using Rule-Based Triple Seasonal Methods. IEEE Trans. Power Syst. 2013, 28, 3235–3242. [Google Scholar] [CrossRef]
Song, X.; Wang, Z.; Wang, H. Short-Term Load Prediction with LSTM and FCNN Models Based on Attention Mechanisms. J. Phys. Conf. Ser. 2024, 2741, 012026. [Google Scholar] [CrossRef]
Liyun, P.; Wenjun, Z.; Sining, W.; Lu, H. Short-Term Load Forecasting Based on DenseNet-LSTM Fusion Model. In Proceedings of the 2021 IEEE International Conference on Energy Internet (ICEI), Southampton, UK, 27–29 September 2021; IEEE: Southampton, UK, 2021; pp. 84–89. [Google Scholar]
Ma, L.; Wang, L.; Zeng, S.; Zhao, Y.; Liu, C.; Zhang, H.; Wu, Q.; Ren, H. Short-Term Household Load Forecasting Based on Attention Mechanism and CNN-ICPSO-LSTM. Energy Eng. 2024, 121, 1473–1493. [Google Scholar] [CrossRef]
Goh, H.H.; He, B.; Liu, H.; Zhang, D.; Dai, W.; Kurniawan, T.A.; Goh, K.C. Multi-Convolution Feature Extraction and Recurrent Neural Network Dependent Model for Short-Term Load Forecasting. IEEE Access 2021, 9, 118528–118540. [Google Scholar] [CrossRef]
Chen, X.; Chen, W.; Dinavahi, V.; Liu, Y.; Feng, J. Short-Term Load Forecasting and Associated Weather Variables Prediction Using ResNet-LSTM Based Deep Learning. IEEE Access 2023, 11, 5393–5405. [Google Scholar] [CrossRef]
Zhou, R.; Zhang, X. Short-Term Power Load Forecasting Based on ARIMA-LSTM. J. Phys. Conf. Ser. 2024, 2803, 012002. [Google Scholar] [CrossRef]
Shin, S.M.; Rasheed, A.; Kil-Heum, P.; Veluvolu, K.C. Fast and Accurate Short-Term Load Forecasting with a Hybrid Model. Electronics 2024, 13, 1079. [Google Scholar] [CrossRef]
Wang, D.; Li, M. Stochastic Configuration Networks: Fundamentals and Algorithms. IEEE Trans. Cybern. 2017, 47, 3466–3479. [Google Scholar] [CrossRef] [PubMed]
Broomhead, D.S.; Lowe, D. Multivariable Functional. Interpolation and Adaptative Networks. Complex Syst. 1988, 2, 321–355. [Google Scholar]
Pao, Y.-H.; Takefuji, Y. Functional-Link Net Computing: Theory, System Architecture, and Functionalities. Computer 1992, 25, 76–79. [Google Scholar] [CrossRef]
Dai, W.; Li, D.; Zhou, P.; Chai, T. Stochastic Configuration Networks with Block Increments for Data Modeling in Process Industries. Inf. Sci. 2019, 484, 367–386. [Google Scholar] [CrossRef]
Li, J.; Wang, D. 2D Convolutional Stochastic Configuration Networks. Knowl.-Based Syst. 2024, 300, 112249. [Google Scholar] [CrossRef]
Wang, D. Editorial: Randomized Algorithms for Training Neural Networks. Inf. Sci. Int. J. 2016, 100, 126–128. [Google Scholar] [CrossRef]
Wang, D.; Cui, C. Stochastic Configuration Networks Ensemble with Heterogeneous Features for Large-Scale Data Analytics. Inf. Sci. 2017, 417, 55–71. [Google Scholar] [CrossRef]
Wang, D.; Li, M. Robust Stochastic Configuration Networks with Kernel Density Estimation for Uncertain Data Regression. Inf. Sci. 2017, 412–413, 210–222. [Google Scholar] [CrossRef]
Li, M.; Wang, D. Insights into Randomized Algorithms for Neural Networks: Practical Issues and Common Pitfalls. Inf. Sci. 2017, 382–383, 170–178. [Google Scholar] [CrossRef]
Gorban, A.N.; Tyukin, I.Y.; Prokhorov, D.V.; Sofeikov, K.I. Approximation with Random Bases: Pro et Contra. Inf. Sci. 2016, 364–365, 129–145. [Google Scholar] [CrossRef]
Wang, D.; Dang, G. Recurrent Stochastic Configuration Networks for Temporal Data Analytics. arXiv 2024, arXiv:2406.16959. [Google Scholar]
Fan, G.F.; Peng, L.L.; Hong, W.C.; Sun, F. Electric Load Forecasting by the SVR Model with Differential Empirical Mode Decomposition and Auto Regression. Neurocomputing 2016, 173, 958–970. [Google Scholar] [CrossRef]
Alghamdi, M.A.; AL–Malaise AL–Ghamdi, A.S.; Ragab, M. Predicting Energy Consumption Using Stacked LSTM Snapshot Ensemble. Big Data Min. Anal. 2024, 7, 247–270. [Google Scholar] [CrossRef]
Wu, J.; Tang, X.; Zhou, D.; Deng, W.; Cai, Q. Application of Improved DBN and GRU Based on Intelligent Optimization Algorithm in Power Load Identification and Prediction. Energy Inform. 2024, 7, 36. [Google Scholar] [CrossRef]

Figure 1. Structure of the SCN network.

Figure 2. Flowchart of SCN construction.

Figure 3. Cell structure of the LSTM model.

Figure 4. Flowchart of LSTM-SCN construction.

Figure 5. Forecasting plots for each model on the stock dataset.

Figure 6. Boxplots of model performance indicators. (a) Box plot of RMSE metrics for each model. (b) Box plot of MAE metrics for each model. (c) Box plot of MAPE metrics for each model.

Figure 7. Error plots for each model on the test set.

Figure 8. Feature acquisition using the sliding window method.

Figure 9. Selection process for five high-volatility regions.

Figure 10. Forecasting performance in Region 1.

Figure 11. Forecasting performance in Region 2.

Figure 12. Forecasting performance in Region 3.

Figure 13. Forecasting performance in Region 4.

Figure 14. Forecasting performance in Region 5.

Table 1. Glossary of symbols and abbreviations.

Symbol/ Abbreviation	Definition
$w_{j}$	$The input weight of the j$ -th node in the hidden layer
$b_{j}$	$The input bias of the j$ -th node in the hidden layer
$f_{L - 1} (X)$	$The network output result when the number of hidden layers is L$ − 1
$β_{j}$	$The output weight corresponding to the hidden layer node j$
$g (\cdot)$	Activation function
$ε_{L - 1}$	$The network residual vector when the number of hidden layers is L$ − 1
ε	Training tolerance
$L_{m a x}$	The maximum number of nodes in the hidden layer
$h_{L}$	$The output of the hidden layer node L$
$T_{m a x}$	The number of candidates in the hidden layer
SVR	Support Vector Regression
RVFL	Random Vector Functional Link Networks
FC	Fully Connected Layer
LR	Linear Regression
XGBoost	Extreme Gradient Boosting
GBRT	Gradient Boosting Regression Trees
SCN	Stochastic Configuration Network
LSTM	Long Short-Term Memory
BiLSTM	Bidirectional Long Short-Term Memory
GRU	Gated Recurrent Unit

Table 2. Evaluation indicator values for each model.

Dataset	Model	RMSE	MAE	MAPE
Stock	SVR	0.641	0.486	1.042%
	RVFL	32.291	30.521	65.303%
	FC	1.645	1.645	2.759%
	LR	2.335	1.795	3.857%
	XGBoost	0.783	0.585	1.256%
	GBRT	0.900	0.685	1.461%
	SCN(our)	0.540	0.540	0.876%
Concrete	SVR	9.610	6.245	22.595%
	RVFL	10.793	8.345	30.080%
	FC	6.612	4.925	16.943%
	LR	2.615	2.032	4.411%
	XGBoost	0.893	0.655	1.412%
	GBRT	0.963	0.742	1.599%
	SCN(our)	1.698	0.194	0.568%

Table 3. Experimental environment configuration.

Experimental Environment	Experimental Setup
OS	Ubuntu 22.04
Development Environment	VS code
Experimental Setup	Intel(R)Xeon^® Silver 4116 CPU @ 2.10 GHz
Graphics Card Model	NVIDIA GeForce RTX 3090, RTX(24GB)
Programming Language	Python3.10
Deep Learning Framework	Pytorch

Table 4. Dataset characteristics.

Feature Name	Unit	Description
Power Load	MW	The rate of electrical energy consumption of electrical devices
Dry Bulb Temperature	°C	The air temperature measured by a conventional thermometer
Dew Point Temperature	°C	The temperature at which water vapor in the air condenses into dew
Wet Bulb Temperature	°C	Wet bulb temperature indicates humidity and cooling potential
Humidity	%	The amount of water vapor in the air
Price	$/MWh	The cost per unit of electricity consumed by the user per hour

Table 5. Performance metrics for each model on the forecasting set.

Model	RMSE	MAE	MAPE
LSTM [42]	93.797	72.175	0.819%
BiLSTM [17]	124.220	100.635	1.179%
GRU [43]	69.620	51.879	0.593%
SCN	97.626	74.071	0.847%
CNN-LSTM [5]	62.986	57.829	0.545%
LSTM-RVFL [29]	187.695	167.114	1.866%
LSTM-SCN (our)	56.970	43.033	0.492%

Table 6. Data ranges for selected high-volatility regions.

Region Name	Region 1	Region 2	Region 3	Region 4	Region 5
Data Range	[200, 240)	[540, 580)	[875, 915)	[1165, 1205)	[1310, 1350)

Table 7. Comparative performance indicators of models across different regions.

Region Name	Model	RMSE	MAE	MAPE
Region 1	LSTM	69.503	56.357	0.723%
	BiLSTM	43.366	34.632	0.453%
	GRU	49.503	40.009	0.511%
	SCN	82.894	69.422	0.894%
	CNN-LSTM	48.532	36.188	0.454%
	LSTM-RVFL	120.833	111.067	1.391%
	LSTM-SCN (our)	5.465	4.851	0.062%
Region 2	LSTM	77.431	55.698	0.666%
	BiLSTM	63.474	46.654	0.566%
	GRU	47.208	36.597	0.451%
	SCN	58.148	47.428	0.591%
	CNN-LSTM	52.001	41.473	0.517%
	LSTM-RVFL	158.725	149.760	1.853%
	LSTM-SCN (our)	7.285	5.095	0.062%
Region 3	LSTM	66.255	48.467	0.632%
	BiLSTM	48.305	36.094	0.477%
	GRU	53.502	43.131	0.573%
	SCN	86.465	81.428	0.794%
	CNN-LSTM	53.183	41.249	0.542%
	LSTM-RVFL	154.871	146.592	1.947%
	LSTM-SCN (our)	6.104	6.104	0.057%
Region 4	LSTM	109.402	93.140	1.262%
	BiLSTM	56.048	37.506	0.506%
	GRU	71.450	54.467	0.733%
	SCN	93.136	82.856	1.492%
	CNN-LSTM	42.403	30.666	0.412%
	LSTM-RVFL	156.803	148.207	2.012%
	LSTM-SCN (our)	7.452	5.932	0.080%
Region 5	LSTM	95.331	84.480	1.166%
	BiLSTM	41.727	30.050	0.406%
	GRU	60.292	46.030	0.625%
	SCN	79.458	69.475	0.916%
	CNN-LSTM	58.118	45.542	0.617%
	LSTM-RVFL	112.343	104.097	1.418%
	LSTM-SCN (our)	7.175	5.976	0.081%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, B.; Hu, J.; Yang, M.; Zhang, C.; Bai, Q. Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models. Appl. Sci. 2024, 14, 11606. https://doi.org/10.3390/app142411606

AMA Style

Tang B, Hu J, Yang M, Zhang C, Bai Q. Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models. Applied Sciences. 2024; 14(24):11606. https://doi.org/10.3390/app142411606

Chicago/Turabian Style

Tang, Bingbing, Jie Hu, Mei Yang, Chenglong Zhang, and Qiang Bai. 2024. "Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models" Applied Sciences 14, no. 24: 11606. https://doi.org/10.3390/app142411606

APA Style

Tang, B., Hu, J., Yang, M., Zhang, C., & Bai, Q. (2024). Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models. Applied Sciences, 14(24), 11606. https://doi.org/10.3390/app142411606

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Short-Term Load Forecasting Accuracy in High-Volatility Regions Using LSTM-SCN Hybrid Models

Abstract

1. Introduction

2. Methodologies

2.1. Stochastic Configuration Network (SCN)

2.1.1. Related Work

2.1.2. Network Model Construction

2.1.3. Universal Approximation Theorem

2.2. Long Short-Term Memory (LSTM)

2.3. LSTM-SCN

3. Experimental Confirmation of SCN’s Universal Approximation Property

3.1. Evaluation Metric

3.2. Parameter Settings

3.3. Experimental Results

4. Experiments

4.1. Experimental Setup

4.2. Dataset

4.3. Data Preprocessing

4.3.1. Data Cleaning

4.3.2. Time Series Feature Construction

4.3.3. Data Normalization

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI