Figure 1.
Architecture of a BPNN that contains L layers. Typically, Layer 1 is the input layer, which inputs data into the neural network, and the last layer, Layer L, is the output layer, which outputs the predicted values. The W between every pair of layers is the weight, which is the knowledge of the network.
Figure 1.
Architecture of a BPNN that contains L layers. Typically, Layer 1 is the input layer, which inputs data into the neural network, and the last layer, Layer L, is the output layer, which outputs the predicted values. The W between every pair of layers is the weight, which is the knowledge of the network.
Figure 2.
Architecture of a simple RNN: the inputs connect with each hidden neuron, and the hidden neurons have connections between every other neuron and have outputs.
Figure 2.
Architecture of a simple RNN: the inputs connect with each hidden neuron, and the hidden neurons have connections between every other neuron and have outputs.
Figure 3.
Unrolled architecture of an RNN: it contains t time steps. The neurons at each time step constitute one layer of the unrolled RNN. This unrolled RNN contains t layers, and at each time step t, the output is .
Figure 3.
Unrolled architecture of an RNN: it contains t time steps. The neurons at each time step constitute one layer of the unrolled RNN. This unrolled RNN contains t layers, and at each time step t, the output is .
Figure 4.
A cell of LSTM. The cell receives external input and cell state and outputs cell state and current output . The cell contains three gates: input gate , output gate and forget gate .
Figure 4.
A cell of LSTM. The cell receives external input and cell state and outputs cell state and current output . The cell contains three gates: input gate , output gate and forget gate .
Figure 5.
The cell of gate-RNN. It contains two gates to control the input information and history information. It uses G to control input information and to control history information.
Figure 5.
The cell of gate-RNN. It contains two gates to control the input information and history information. It uses G to control input information and to control history information.
Figure 6.
The distribution of raw data values.
Figure 6.
The distribution of raw data values.
Figure 7.
The numbers of training data and test data distribution at different time scales.
Figure 7.
The numbers of training data and test data distribution at different time scales.
Figure 8.
The complete structures of BP neural network and RNNs. (a) the structure of BP neural network, the input layer contains 10 neurons, hidden layer contains 64 neurons and output layer contains only one neuron; (b) the structure of RNNs, the input contains one neuron that input the current load value, the hidden layer contains four units and output contains one neuron which output the predict load value. If each unit in the right figure is neuron, it is the structure of RNN, if each unit is LSTM cell, then it is the structure of LSTM and if each unit is gate-RNN cell, then it is the structure of gate-RNN.
Figure 8.
The complete structures of BP neural network and RNNs. (a) the structure of BP neural network, the input layer contains 10 neurons, hidden layer contains 64 neurons and output layer contains only one neuron; (b) the structure of RNNs, the input contains one neuron that input the current load value, the hidden layer contains four units and output contains one neuron which output the predict load value. If each unit in the right figure is neuron, it is the structure of RNN, if each unit is LSTM cell, then it is the structure of LSTM and if each unit is gate-RNN cell, then it is the structure of gate-RNN.
Figure 9.
The predict mechanism of the neural network.
Figure 9.
The predict mechanism of the neural network.
Figure 10.
Visualization of the results of four types of neural networks and other four comparative approach in 30-min time scale: (a) results of ARIMA; (b) results of DT; (c) results of RF; (d) results of SVR; (e) results of BP; (f) results of RNN; (g) results of LSTM; (h) results of gate-RNN. The blue line indicates the outputs of the neural network and the orange line is the true value of the load.
Figure 10.
Visualization of the results of four types of neural networks and other four comparative approach in 30-min time scale: (a) results of ARIMA; (b) results of DT; (c) results of RF; (d) results of SVR; (e) results of BP; (f) results of RNN; (g) results of LSTM; (h) results of gate-RNN. The blue line indicates the outputs of the neural network and the orange line is the true value of the load.
Figure 11.
All eight methods’ prediction error distributions in 30-min time scale. (a) distribution of ARIMA; (b) distribution of DT; (c) distribution of RF; (d) distribution of SVR; (e) distribution of BP; (f) distribution of RNN; (g) distribution of LSTM; (h) distribution of gate-RNN.
Figure 11.
All eight methods’ prediction error distributions in 30-min time scale. (a) distribution of ARIMA; (b) distribution of DT; (c) distribution of RF; (d) distribution of SVR; (e) distribution of BP; (f) distribution of RNN; (g) distribution of LSTM; (h) distribution of gate-RNN.
Figure 12.
Neural networks’ losses on the train and test datasets for 5-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 12.
Neural networks’ losses on the train and test datasets for 5-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 13.
Neural networks’ losses on the train and test datasets for 20-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 13.
Neural networks’ losses on the train and test datasets for 20-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 14.
Neural networks’ losses on the train and test datasets for 30-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 14.
Neural networks’ losses on the train and test datasets for 30-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 15.
Neural networks’ losses on the train and test datasets for 40-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 15.
Neural networks’ losses on the train and test datasets for 40-min time scales: (a) loss of BPNN; (b) loss of RNN; (c) loss of LSTM; and (d) loss of gate-RNN. The blue line is the train loss curve and the orange line is the test loss curve.
Figure 16.
Visualization of the three criteria for the neural networks for different time scales: (a) RMSE of the neural networks for different time scales; (b) MAE of the neural networks for different time scales; and (c) MAPE of the neural networks for different time scales.
Figure 16.
Visualization of the three criteria for the neural networks for different time scales: (a) RMSE of the neural networks for different time scales; (b) MAE of the neural networks for different time scales; and (c) MAPE of the neural networks for different time scales.
Table 1.
The data size of different time scales.
Table 1.
The data size of different time scales.
Dataset | 5 min | 20 min | 30 min | 40 min |
---|
train | 78,624 | 19,656 | 13,104 | 9828 |
test | 26,496 | 6624 | 4416 | 3312 |
Table 2.
Results of different hidden layer sizes in 30-min time scale test set.
Table 2.
Results of different hidden layer sizes in 30-min time scale test set.
Number of Hidden Neurons | RMSE | MAPE | MAE |
---|
BP_32 | 585.22 | 2.38 | 439.96 |
BP_64 | 529.23 | 2.12 | 393.11 |
BP_128 | 581.59 | 2.21 | 408.03 |
GATE_2 | 524.75 | 2.02 | 374.16 |
GATE_4 | 530.09 | 2.00 | 372.40 |
GATE_8 | 520.05 | 1.83 | 337.12 |
LSTM_2 | 1323.30 | 5.58 | 1041.50 |
LSTM_4 | 1443.63 | 6.16 | 1161.02 |
LSTM_8 | 1800.25 | 8.17 | 1467.79 |
RNN_2 | 518.58 | 1.97 | 365.87 |
RNN_4 | 506.58 | 1.85 | 341.61 |
RNN_8 | 518.02 | 1.87 | 343.78 |
Table 3.
Results of eight methods on the 5-min time scale in a three months’ test dataset.
Table 3.
Results of eight methods on the 5-min time scale in a three months’ test dataset.
Methods | RMSE | MAPE (%) | MAE |
---|
SVR [22] | 913.61 | 4.61 | 766.82 |
DT [9] | 145.26 | 0.63 | 133.68 |
ARIMA [48] | 131.42 | 0.53 | 131.43 |
RF [49] | 159.95 | 0.68 | 122.67 |
BP | 137.69 | 0.59 | 108.62 |
RNN | 130.69 | 0.55 | 100.12 |
LSTM | 667.49 | 2.75 | 507.67 |
gate-RNN | 116.94 | 0.49 | 89.36 |
Table 4.
Results of eight methods on the 20-min time scale in three months’ test dataset.
Table 4.
Results of eight methods on the 20-min time scale in three months’ test dataset.
Methods | RMSE | MAPE (%) | MAE |
---|
SVR [22] | 880.66 | 4.04 | 729.80 |
DT [9] | 386.30 | 1.63 | 301.47 |
ARIMA [48] | 402.06 | 1.47 | 269.82 |
RF [49] | 310.2 | 1.69 | 427.38 |
BP | 451.72 | 1.81 | 334.30 |
RNN | 403.14 | 1.65 | 304.58 |
LSTM | 828.09 | 3.37 | 626.32 |
gate-RNN | 337.00 | 1.31 | 242.75 |
Table 5.
Results of eight methods on the 30-min time scale in three months’ test dataset.
Table 5.
Results of eight methods on the 30-min time scale in three months’ test dataset.
Methods | RMSE | MAPE (%) | MAE |
---|
SVR [22] | 830.00 | 3.80 | 682.88 |
DT [9] | 517.70 | 2.14 | 394.55 |
ARIMA [48] | 670.24 | 2.41 | 442.21 |
RF [49] | 646.02 | 2.41 | 447.30 |
BP | 529.23 | 2.12 | 393.11 |
RNN | 506.58 | 1.85 | 341.61 |
LSTM | 1443.63 | 6.16 | 1161.02 |
gate-RNN | 530.09 | 2.00 | 372.40 |
Table 6.
Results of eight methods on the 40-min time scale in three months’ test dataset.
Table 6.
Results of eight methods on the 40-min time scale in three months’ test dataset.
Methods | RMSE | MAPE (%) | MAE |
---|
SVR [22] | 1155.91 | 5.65 | 979.10 |
DT [9] | 701.78 | 3.02 | 553.725 |
ARIMA [48] | 818.07 | 3.43 | 628.24 |
RF [49] | 943.62 | 3.48 | 647.44 |
BP | 715.68 | 2.85 | 527.99 |
RNN | 762.60 | 2.91 | 537.40 |
LSTM | 2056.35 | 8.49 | 1598.33 |
gate-RNN | 792.15 | 3.23 | 601.51 |
Table 7.
Results of the RMSE for different time scales.
Table 7.
Results of the RMSE for different time scales.
Methods | 5 min | 20 min | 30 min | 40 min |
---|
BP | 137.69 | 451.72 | 529.23 | 715.68 |
RNN | 130.69 | 403.14 | 506.58 | 762.60 |
LSTM | 667.49 | 828.09 | 1443.63 | 2056.35 |
gate-RNN | 116.94 | 337.00 | 530.09 | 792.15 |
Table 8.
Results of the MAPE for different time scales.
Table 8.
Results of the MAPE for different time scales.
Methods | 5 min | 20 min | 30 min | 40 min |
---|
BP | 0.59 | 1.81 | 2.12 | 2.85 |
RNN | 0.55 | 1.65 | 1.85 | 2.91 |
LSTM | 2.75 | 3.37 | 6.16 | 8.49 |
gate-RNN | 0.49 | 1.31 | 2.00 | 3.23 |
Table 9.
Results of the MAE for different time scales.
Table 9.
Results of the MAE for different time scales.
Methods | 5 min | 20 min | 30 min | 40 min |
---|
BP | 108.62 | 334.3 | 393.11 | 527.99 |
RNN | 100.12 | 304.58 | 341.61 | 537.40 |
LSTM | 507.67 | 626.32 | 1161.02 | 1598.33 |
gate-RNN | 89.36 | 242.75 | 372.40 | 601.51 |