1. Introduction
Since the beginning of the 21st century, global warming has gradually become one of the major issues terrorizing humanity. Greenhouse gases (GHG) are critical factors contributing to global warming [
1]. Excessive emissions of GHG have brought about many problems, such as glacier melting, sea level rising, species extinction, significant crop production decline, and the frequent occurrence of extreme weather [
2]. It can therefore be seen that the harm caused by global warming influences every aspect of human society. Thus, it is urgent and necessary to control greenhouse gas emissions, especially carbon emissions [
3].
The Kyoto protocol, which came into force in 2005, aimed at stabilizing the atmosphere’s GHG content at a suitable level to combat climate change. In the same year, the Emission Trading Scheme (ETS) was issued by the European Union (EU) [
4]. As the largest carbon dioxide emitter, China should take responsibility for curbing the tension of global warming. At the 75th United Nations General Assembly, China made a solemn commitment that the peak of carbon emissions will be achieved by 2030 and the carbon neutrality will be realized by 2060 [
5]. Carbon trading has been verified to be a significant way to propel carbon emission reduction [
6]. Since 2011, China has successfully conducted eight carbon trading projects in Beijing, Shanghai, Guangdong, Tianjin, Shenzhen, Fujian, Chongqing, and Hubei, which contribute to reducing greenhouse gas emissions. During the operation of carbon trading markets, the competition among market participants will inevitably occur. Therefore, the precise forecasting of CTP is significant in the formulation of competition strategies. Governments and policy makers can draw up scientific policies and reasonable trading mechanisms in accordance with the changes in CTP so as to promote the sustainable development of carbon trading markets [
7]. Enterprises need to assign an appropriate trading proportion of carbon emissions to participate in various market transactions based on CTP prediction results. Accurate CTP prediction can maximize revenue and decrease business risks. Therefore, it is particularly significant to increase the prediction accuracy of CTP.
Various methodologies have been employed to improve CTP prediction accuracy, which can be generally categorized into two kinds: econometric methods and artificial intelligence models. The econometric methods are simple, convenient, and fast, primarily containing the linear regression model, the autoregressive integrated moving average (ARIMA), the vector autoregression (VAR), the generalized autoregressive conditional heteroskedasticity (GARCH), etc. [
4,
8,
9]. Lin and Peng established a new CTP prediction model, which conducted the linear and nonlinear forecasting, employing ARIMA and stochastic forest algorithm to the disintegrated CTP data [
10]. Fu proposed the ARMA-EGARCH-SGED framework to estimate the impact elements of fluctuations, and the VAR method was employed to discuss results [
11].
The artificial intelligence models are superior in self-learning, generalization, associative memory, and calculation speed than the econometric methods, which are more adaptable for solving the nonlinear nexus of CTP data sequences. The commonly used artificial intelligence models in CTP prediction include the backpropagation artificial neural network (BPANN) [
2], the radial basis function neural network (RBFNN) [
12], the multilayer perceptron network (MLP) [
13], the extreme learning machine (ELM) (Seifert et al., 2007; Sun and Zhang, 2018), least-squares support vector machine (LSSVM) [
14], long short-term memory (LSTM) [
2], etc. These methods are also widely employed in climate change studies [
15], crop sciences [
16,
17], GHGs studies [
18], and renewable energy generation prediction studies [
19].
By summarizing the previous literature related to prediction research, we found that the ELM method is widely employed in this field. The ELM model is proposed on the basis of the feedforward neuron network, which is adaptable to supervised and unsupervised learning issues. At the training stage, we can obtain the only optimal solution by supposing the amount of hidden layer neurons [
20,
21]. Compared with the BPANN, the ELM model is superior in learning speed and generalization. While the ELM feature mapping is statistical and sensitive to noise, the stability and generalization ability of the ELM are decreased. Considering this, the MKELM is put forward by leading into the multi-kernel approach and regularization coefficient into the ELM model, the impact of stochastic mapping on the ELM are then effectively reduced, and the forecasting precision can be enhanced [
22]. Simultaneously, when employing the MKELM model, the parameter values in the model need to be determined. Setting the values of these parameters by subjective judgment always leads to low forecasting precision. Hence, this investigation employs a new bionic intelligent optimization algorithm, namely the SSA, to optimally select the parameters in the MKELM. To enhance the convergence ability of SSA, an ISSA is established in our research. ISSA brings the dynamic inertia weight and chaotic local searching method into the SSA to increase the searching speed and avert the plunging into the locally optimization.
Additionally, with the continuous exploration in CTP prediction, researchers found that the high complexity and nonlinear characteristics of CTP data sequences can greatly affect the prediction accuracy. Therefore, a few data decomposition methods are employed in preprocessing the initial CTP data sequence. The empirical mode decomposition (EMD) method was verified to be a valid method to decompose the initial CTP data with nonlinear and strong volatility features into several stable series, which can greatly improve the forecasting accuracy. Zhu preprocessed the initial CTP time sequence by EMD before forecasting, and the empirical results demonstrated that EMD could integrate the nonlinear and fluctuated initial CTP data into subsequences with high stability to improve the forecasting precision [
23]. Nevertheless, although EMD can effectively enhance the forecasting precision, it can still be greatly improved. Variational mode decomposition (VMD), ensemble empirical mode decomposition (EEMD), and CEEMDAN are the improved methods based on EMD to further enhance the prediction precision.
On the basis of the previous research, this paper established a hybrid system integrating the CEEMDAN, ISSA, and MKELM model to predict CTP. Firstly, the initial CTP data sequence is decomposed and constructed into several series by CEEMDAN and the sample entropy (SE) method. Then, the decomposed series are, respectively, fed into the MKELM model optimized by ISSA. At last, the CTP forecasting results are obtained by summing up all the decomposed series forecasting results. The primary contributions of our investigation are as below:
- (1)
Considering the complexity and fluctuation of the initial CTP data, the CEEMDAN and SE methods are applied to decompose and reconstruct the initial data in this investigation.
- (2)
In order to handle the issues of local optimization and low convergence speed, the ISSA is put forward by bringing the inertia weight mechanism and chaotic local searching method into the SSA.
- (3)
The MKELM is established by constituting the RBF and the poly kernel functions so as to combine the outstanding learning and the generalization abilities of them.
- (4)
The established hybrid CTP prediction model CEEMDAN-SE-ISSA-MKELM can significantly improve the CTP forecasting accuracy compared with 16 selected comparison models.
The rest of our investigation is outlined below. The methods combined in this research, including CEEMDAN, SE, ISSA, and MKELM, are introduced in
Section 2.
Section 3 summarizes the process of the proposed CEEMDAN-SE-ISSA-MKELM model in CTP prediction. Results are obtained in
Section 4. A comparison analysis is presented in
Section 5 through comparing the prediction performance of 16 selected comparison models, and conclusions are discussed in
Section 6.
2. Methodology
The proposed hybrid CTP prediction model consists of CEEMDAN, SE, ISSA, and MKELM methods. The principles of these methods are introduced in this section.
2.1. CEEMDAN and SE Method
In 1998, Huang et al. first established the EMD method (EMD), which is a self-adaptive orthogonal basis time-frequency signal processing method [
24]. Aiming at preprocessing nonlinear and nonstable time series, EMD can directly resolve such data sequences into multiple IMFs. However, once there exists the intermittent signal in the actual signal, a modal aliasing phenomenon occurs. Specifically, there either exists multiple scale elements in one IMF, or one scale element exists in plural IMFs simultaneously [
25]. To manage this problem, the EEMD method introducing the white Gaussian noise into the signal has been proposed [
26].
Even if the EEMD method can greatly decrease the emergence of mode mixing, it also leads to new problems. The EEMD with the white Gaussian noise should be repeatedly averaged, and the error mostly relies on the amount of integrations [
27]. Hence, the CEEMDAN method has been proposed by Torres et al., which can overcome this problem through adding adaptive white noise in finite times at each period [
28]. The procedure of CEEMDAN is illustrated below.
Step 1: Add a sequence of self-adaptive white Gaussian noise into the initial sequence
:
where
indicates the initial data sequence,
is the noise coefficient,
demonstrates the
-th addition of white Gaussian noise,
implies the time series after adding the
-th white Gaussian noise, and
represents the integration’s number.
Step 2: Decompose
by the EMD method and take the average value of the first IMF component
:
where
represents the first IMF component.
Remove
from the initial series
to obtain the first residual sequence (RS):
where
refers to the first residual sequence.
Step 3: Proceed to disintegrate
by the EMD method to obtain the second RS:
where
demonstrates the
j-th IMF disintegrated by the EMD approach.
Step 4: Conduct the below process repeatedly to obtain the rest of the IMFs:
where
K indicates the total amount of modes.
The approach ends when the RS is unable to be further disintegrated. The final RS is written as:
Finally, several decomposed sequences can be obtained. However, if these decomposed sequences are directly fed into the forecasting model, it will be time consuming as they are trained one by one. Therefore, Pincus proposed the approximate entropy (ApEn) method in 1991 to measure the correlation degree of two distinct time series and reconstruct the disintegrated sequences into new sequences [
29]. Nevertheless, the ApEn approach depends on the data length and the consistency of the result cannot be ensured [
30]. Thus, a new method called sample entropy (SE) was proposed to solve the above problems [
31]. The process of the SE methodology is described below.
Step 1: The series
is formed into an
m-dimensional vector represented as
:
where the vector demonstrates
m consecutive values of
x starting from point
i.
Step 2: The distance between
and
is defined and calculated in:
where
means the distance between
and
.
Step 3: The total number of
for every
i value is counted and the ratio with
N −
m + 1 is computed to attain
. Afterward, the average value of
is computed as:
where
is the ratio of the sum amount of
for every
value accounting for
, and
is the mean value of
.
Step 4: The dimension to
m + 1 is increased and steps 1–3 are repeatedly conducted to obtain the average value of
:
where
is the average value of
after increasing the dimension to
.
Step 5: The SE can be written as:
The value of SE is calculated as below when the time series length is
N:
In accordance with relevant references, m is supposed to be 2 and r is supposed at the 0.2 standard deviation range in this research.
2.2. Improved SSA Method
A new optimization algorithm named the SSA is established by Seyedali Mirjalili in 2017 [
32]. This method is carried out by simulating the behavior of the salp chain group in searching food.
For the SSA method, the initial population is classified into leader and follower. The salp chain is led by the leader. The concrete procedure of the SSA is described below:
- (1)
Parameter setting
In the SSA, the original population amount (Agents_no), the maximum number of iteration (Max_iteration), the variables amount (dim), the lower bound of variables (, and the upper bound of variables ( need to be set.
- (2)
Population initialization
The location of the original population is set stochastically, and the location matrix of the salp swarm S is shown as:
where
demonstrates the value of the j-th variable of the i-th salp,
,
.
are computed by Equation (16) with random distribution:
where
is the value of the
i-th row and
j-th column in the matrix
,
rand (
i,
j) demonstrates the random matrix,
ub(
i) and
lb(
i) are the upper and the lower bounds of the
i-th salp, respectively.
- (3)
Fitness function establishment
The fitness function is established as follows:
In the matrix OS, the variable F will be assigned to the salp with the best fitness value.
- (4)
Iteration
In the SSA, aiming at performing global search and avoiding local optimum, all salps will change their locations. Leaders adjust the position of food sources according to the following equation:
where
implies the location of the leader in the
j-th dimension,
represents the food source location,
and
illustrate the upper and lower bounds, respectively, and
c1,
c2, and
c3 are random constants.
c2 and
c3 are randomly generated in the range of [0, 1], and
c1 is a significant parameter which is depicted as:
where
l is the current iterations and
L is the maximum iterations.
The SSA utilizes Newton’s law of motion to renovate the follower’s location as:
where
represents the location of the
i-th follower in the
j-th dimension, t indicates the time, and
implies the initial velocity and
.
To enhance the optimization results of the SSA, this investigation brings dynamic inertia weight and chaotic local searching mechanism into the SSA, called improved SSA (ISSA). Therefore, the optimization searching speed of SSA can be greatly improved.
The inertia weight method is added based on Equation (18), and the new equation is written as:
where
w represents the inertia weight, controlling the impact of parent location on the new population, so that the searching trend can be adjusted and the searching scope can be expanded. Using a greater weight at the beginning of the iteration will be conducive to determine the boundary of the optimal solution quickly, and the convergence speed will be accelerated via decreasing the weight in the following procedure. The difficulties of making optimal adjustments in accordance with the overall optimization and iteration through employing a fixed inertia weight resulted in the dynamic inertia weight with nonlinear time-varying renovating to be utilized in our investigation to enhance the SSA.
where
t represents the iteration steps number of the current group.
With the increasing number of iterations, the inertia weight among individuals reduces from the original value in the following steps of optimization and expedites the overall convergence speed of the SSA.
When the SSA steps into the later stage, the group prefers to plunge into the local optimization. Chaotic motion is superior in regularity, which can traverse all states within a particular range without repeating. Hence, the chaotic local searching method is added to the SSA to decrease the disadvantage of plunging into local optimization and enhance the searching speed. The concrete procedure of adding the chaotic method into the SSA is shown below:
- (1)
The is transferred to chaotic variables as:
where
represents the chaotic variable, and
and
are the minimum and maximum values of the j-th variable, respectively.
- (2)
The chaotic sequence is obtained by logic mapping as:
- (3)
The chaotic sequence is transformed into the searching space of the target solution to produce a new location. A better value is then selected to update and finish the local searching procedure. This process is expressed as:
Thus, the basis of SSA has been improved, which has a superior convergence precision and better optimization result. Additionally, it can cause the MKELM forecasting framework to be able to easily find the optimal parameters to enhance the prediction precision.
2.3. MKELM Method
The ELM, a machine learning approach, is presented by Huang in 2006 [
33]. It is a single hidden-layer feedforward network and merely requires the assumption of the amount of hidden layer nodes without updating the bias values and input weights.
Suppose that there are N training samples, where
is the input value of the CTP samples,
is the expected output value of the CTP samples, and the number of hidden nodes of the ELM is
L, then, the output can be calculated by:
where
is the output weight from the hidden node to output node,
G illustrates the activation function,
and
demonstrate the parameters of the
i-th hidden layer, and
is the output vector of the hidden layer.
Moreover, it is written by a matrix expression of linear system as:
in Equation (27):
In the ELM, supposing that the activation function is infinitely derivable, it is unnecessary to change the bias value and input weight of the structure. Then, the ELM training issue is turned into handling with
. The least-square method is commonly utilized to compute the output weight of the ELM:
where
illustrates the Moore Penrose produced inverse of the hidden layer output matrix
.
When solving
, a multicollinearity problem may exist in the sample dataset, which will lead to non-singularity and then exert a negative influence on the forecasting results. Taking this into account,
1/
C is brought into diagonal matrix
to make the characteristic root of
diverge from zero. Thus, the weights
of the ELM is calculated via Formula (30):
The ELM model only needs to train a small amount of parameters in the training procedure, which can greatly prevent the issues of slow computation speed and the overfitting of the conventional neural network. Nevertheless, the procedure of choosing the amount of hidden nodes is complex and time-wasting, which requires countless experiments to choose the optimal amount of neurons, leading to the decrease in forecasting performance.
The KELM model is proposed to solve the above problems, which can avoid selecting a hidden layer. Hence, the output random fluctuation can be decreased, and the generalization ability and stability of the model can be enhanced [
33].
The expression of the kernel matrix in the KELM model can be written as:
where
implies the kernel matrix of the KELM model and
represents the kernel function.
Then, Equation (30) is written as:
Thus, the output of the KELM model is:
The KELM model contains multiple kinds of kernel functions, including poly kernel, Gaussian RBF kernel, etc. Selecting an optimal kernel function can enhance the forecasting precision of the CTP. Therefore, this investigation combines various kernel functions and makes the best of the merits of these functions in learning and generalization. Thus, a MKELM model is constructed.
Kernel functions can be classified into local kernel (such as poly kernel function) and global kernel (such as RBF kernel function), which can be expressed as:
where
and
represent the kernel parameters of the RBF and poly kernel function, respectively.
The RBF kernel function can make the near data influence the kernel value, and has a superior performance in local learning. The poly kernel function can distinguish the impact of remote data on the kernel value, and it has good performance in global generalization [
33]. As a combination of multiple affecting factors, the CTP is a dynamic evolution procedure of multiple data characteristics containing trend, volatility, and chaotic features. Hence, it is difficult to employ a single kernel function to conduct a high-accuracy regression forecasting. Hence, this research integrates the RBF and the poly kernel functions to utilize their outstanding learning and generalization abilities. The integrated kernel function is established as:
where
and
imply the weight coefficient of
and
in the integrated kernel function, resepctively.
Parameter values, including C, δ, d, and ρ, are determined by the ISSA optimization algorithm.
2.4. Evaluation Tests
To measure the forecasting results, several evaluation indicators are selected, which are the mean absolute percentage error (MAPE), the root-mean-square error (RMSE), and the coefficient of determination (R2). MAPE indicates the mean forecasting precision of the model, RMSE implies the divergence between forecasting values and actual values, and R2 demonstrates the degree of fitting to the actual values ranging from 0 to 1. Models with smaller MAPE and RMSE values have better forecasting performance. Additionally, the closer to 1 the R2 value, the more superior the prediction precision of the model. Their calculation equations are below:
where
,
, and
, respectively, represent the actual CTP, predictive CTP, and average CTP values, and
n is the length of the prediction.
3. The Process of the CEEMDAN-SE-ISSA-MKELM Model
The calculation procedure of the CEEMDAN-SE-ISSA-MKELM model is depicted in
Figure 1, which is conducted by MATLAB software. As can be seen from
Figure 1, specific steps of the established model are as below:
Step 1: Initial CTP data sequence decomposition.
The parameters of the CEEMDAN method are set as: and the initial CTP data can be disintegrated into several data sequences.
Step 2: Decomposed data sequence reconstruction.
The SE value of each decomposed data sequence needs to be calculated. According to the SE values, some of the decomposed data sequences can be reconstructed into new series, which will be input into the ISSA-MKELM prediction model.
Step 3: Input data sequence normalization.
The ELM method is sensitive to data scale; thus, the input data sequences obtained based on SE values will be normalized as Equation (40). Therefore, they are calculated in the same dimension:
Step 4: Parameter setting of the ISSA and the optimal value determination for the parameters in the MKELM.
Parameters in the ISSA need to be set, including the original population amount, Agents_no, the maximum number of iteration, Max_iteration, the variables amount, dim, the lower bound of variables, , and the upper bound of variables, . This paper sets Agents_no = 100, Max_iteration = 1000, dim = 2, , and . Then, the parameters, including C, δ, d and ρ, in the MKELM model can be optimally determined through employing the ISSA optimization algorithm. The RMSE values of the forecasting results are employed as the fitness function of the ISSA, which is expressed as Equation (38).
Additionally, the group position of the ISSA can be replaced based on Equation (21). If the termination condition is realized, the process is finished, and the optimal values of parameters are finally identified.
Step 5: CTP forecasting
The decomposed data sequences obtained according to SE values will be divided into training sample and testing sample. The training sample will be substituted into the MKELM model, and then the forecasting results can be obtained. Adding all the forecasting results can finally result in the CTP prediction values.
Step 6: Prediction performance evaluation.
After obtaining the prediction results, the tests for MAPE, RMSE, and R2 are employed to measure the forecasting performance, which can be calculated according to Equations (37)–(39).
4. Numerical Analysis
4.1. Data Sources
Up until now, eight pilot carbon markets have been established in Guangdong, Hubei, Tianjin, Shenzhen, Shanghai, Beijing, Chongqing, and Fujian. Among them, Guangdong has ranked first in China’s carbon trading market five times since it was officially established in 2013.
Figure 2 illustrates the trading turnover structure and trading volume structure of China’s carbon market. As can be seen in
Figure 2, the carbon trading turnovers of the carbon markets in Guangdong, Shanghai, and Hubei occupy about 60% and the carbon trading volumes account for approximately 70% of the total market up until 31 December 2019. Hence, these three carbon markets can better represent the circumstances of China’s carbon market. Furthermore, as is shown in
Figure 3, the trend of the transaction prices of Guangdong, Shanghai, and Hubei’s carbon trading markets present a high degree of nonlinearity and fluctuation. Thus, we employ the transaction prices in the carbon trading markets in Guangdong, Shanghai, and Hubei to verify the application and feasibility of the established CEEMDAN-SE-ISSA-MKELM model in CTP prediction.
Daily transaction prices of these markets are selected for numerical analysis from 7 January 2019 to 31 December 2019. The selected data are classified into two sets; one is the training set from 7 January 2019 to 11 October 2019 used to construct the prediction model. The other part is a testing set from 14 October 2019 to 31 December 2019 applied to prove the forecasting performance of the established model. All the relevant data are collected from
http://k.tanjiaoyi.com/ (accessed on 1 January 2021).
4.2. Data Decomposition and Reconstruction
Since the original CTP data sequences of Guangdong, Shanghai, and Hubei are nonlinear and nonstationary, to improve prediction accuracy, they are decomposed into several series employing the CEEMDAN approach, and the decomposed series are reconstructed into several new sequences through the SE value of each sequence. The decomposed CTP data sequences of Guangdong, Shanghai, and Hubei are illustrated in
Figure 4, which are obtained according to Equations (1)–(7). As is shown in
Figure 4, each sequence after disintegration indicates strong regularity, and its nonstationarity is much more apparent, hence resulting in a large amount of IMFs after decomposing the original data sequence. If the ISSA-MKELM model is employed to predict each decomposed sequence, the calculation scale will be increased and time-consuming. Aiming at predicting the CTP effectively and to save time, the principle of SE is employed to analyze the complexity of each IMF. Additionally, the SE values of each decomposed data sequence are listed in
Figure 5, which are obtained in accordance with Equations (8)–(14).
As can be seen from
Figure 5, according to the approximate SE values, the decomposed data sequence IMFs are reconstructed into new IMFs [
34,
35]. Taking the SE value of each decomposed data sequence of original CTP data in Guangdong as an example, IMF1–IMF3 and IMF4–IMF5 have similar complicacy according to SE values, as shown in
Figure 5, so that they are reconstructed as new IMFs. Similarly, the reconstruction of new IMFs are demonstrated in
Table 1. Thus, the problem of over-decomposition and time-consuming computation can be solved, and the new reconstructed IMFs can be fed into the ISSA-MKELM prediction model.
4.3. Forecasting Results of the Established CEEMDAN-SE-ISSA-MKELM
After obtaining the reconstructed new IMFs, the input data sequences into the ISSA-MKELM can be determined. Then, the forecasting results from 14 October 2019 to 31 December 2019 can be obtained. To verify the forecasting performance of the established CEEMDAN-SE-ISSA-MKELM model, the forecasting CTP results should be compared with the actual CTP data. Evaluation indicators, including MAPE, RMSE, and R2, are selected to verify the prediction accuracy, which are obtained via Equations (37)–(39), and the results are illustrated in
Table 2. As listed in
Table 2, the values of R2 for Guangdong, Shanghai, and Hubei’s carbon markets are all 0.99, which implies that the established forecasting model has a superior prediction performance. The values of MAPE of Guangdong, Shanghai, and Hubei’s carbon markets are 0.76%, 1.65%, and 1.33%, respectively. According to the measuring criterion of MAPE [
36], as the values of the MAPE of three carbon markets are all smaller than 10%, and the performance of the established model in CTP forecasting is excellent. The values of RMSE of Guangdong, Shanghai, and Hubei carbon markets are 0.53, 0.77, and 0.67, respectively, which are relatively high prediction accuracy values.
5. Comparison Analysis
To further verify the validity of the established CEEMDAN-SE-ISSA-MKELM model in CTP forecasting, the single model, including Markov forecasting model (MFM) [
37], ARIMA [
38], ELM [
39], LSSVM [
40], LSTM [
38], poly kernel ELM (pELM), RBF kernel ELM (rELM), and MKELM (combining poly kernel and RBF kernel), as well as the combined model, including CEEMDAN-MKELM, EMD-MKELM [
41], CEEMD-MKELM [
42], CEEMDAN-SE-MKELM, EMD-SE-MKELM, CEEMD-SE-MKELM, CEEMDAN-SE-FOA-MKELM [
43], CEEMDAN-SE-PSO-MKELM [
44], and CEEMDAN-SE-SSA-MKELM, are selected as the comparison model. The values of MAPE, RMSE, and R2 for the forecasted results in Guangdong’s carbon market are illustrated in
Figure 6. Moreover, the values of MAPE, RMSE, and R2 for the forecasted results in Shanghai’s carbon market are illustrated in
Figure 7. Values of MAPE, RMSE, and R2 for the forecasted results in Hubei’s carbon market are depicted in
Figure 8.
With the aim of comparing the role of CEEMDAN and the SE method in improving the forecasting accuracy of CTP, a comparison analysis among MKELM, CEEMDAN-MKELM, EMD-MKELM, and CEEMD-MKELM has been discussed. Moreover, the results imply that the performance of the decomposed data sequences are better than those of undecomposed data. Furthermore, in order to simplify the input sequences and save prediction time, the SE method is applied to reconstruct the decomposed data sequences to obtain new IMFs, which will be fed into the prediction model. Through a comparison of the performance of CEEMDAN-MKELM, EMD-MKELM, CEEMD-MKELM, CEEMDAN-SE-MKELM, EMD-SE-MKELM, and CEEMD-SE-MKELM, the superiority of the SE approach can be verified, as the combined models with SE approach have relatively low MAPE and RMSE values and high R2 values. Therefore, it is necessary to employ the disintegration method to preprocess the initial CTP data sequences to enhance the prediction precision and actual fitting ability. Additionally, the CEEMDAN-SE approach can greatly improve the prediction precision.
By comparing the prediction performance of single models, including MFM, ARIMA, ELM, LSSVM, LSTM, pELM, rELM, and MKELM, it can be found that the forecasting performance of the MKELM model is superior than other single models, of which the values of MAPE, RMSE, and R2 in Guangdong’s carbon market are 2.01%, 2.96, and 0.9701, respectively. Thus, the MKELM is selected as the basic model instead of other models in this investigation.
Additionally, the parameters of single MKELM model can also be optimally determined by optimization algorithms. In this investigation, by comparing the forecasting performance among CEEMDAN-SE-MKELM, CEEMDAN-SE-FOA-MKELM, CEEMDAN-SE-PSO-MKELM, and CEEMDAN-SE-SSA-MKELM models, it can be discovered that the combined models with optimization algorithms have a higher prediction precision than that without the optimization algorithm. Specifically, the MAPE, RMSE, and R2 values of the CEEMDAN-SE-SSA-MKELM model in Guangdong’s carbon market are 1.65%, 1.33, and 0.9886, respectively, while the MAPE, RMSE, and R2 values of CEEMDAN-SE-PSO-MKELM model in Guangdong’s carbon market are 1.79%, 2.01, and 0.9789, respectively, and the MAPE, RMSE, and R2 values of the CEEMDAN-SE-MKELM model in Guangdong’s carbon market are 1.88%, 2.81, and 0.9766, respectively. Furthermore, aiming at further improving the optimization performance of the basic SSA, the chaotic local searching strategy and inertia weight mechanism are added to the SSA to establish the ISSA model. Through a comparison of the forecasting performance of CEEMDAN-SE-SSA-MKELM and CEEMDAN-SE-ISSA-MKELM models, we can verify that the established model with ISSA (for Guangdong’s carbon market, MAPE = 0.76%, RMSE = 0.53, and R2 = 0.9967) is superior than that with the basic SSA method. Therefore, the ISSA method proposed in this investigation can further improve the forecasting performance in CTP.
For the forecasting results of Guangdong, Shanghai, and Hubei’s carbon markets, the established CEEMDAN-SE-ISSA-MKELM model has the best prediction precision, with the least MAPE values of 0.76%, 1.65%, and 1.33%, respectively, the smallest RMSE values of 0.53, 0.77, and 0.67, respectively, and the largest R2 values of 0.9967, 0.9934, and 0.9956, respectively. This demonstrates that the complicated hybrid model established in this investigation has high-prediction accuracy, is feasible, and is superior in CTP forecasting. However, the performance of the ARIMA model is the worst, with the highest MAPE values of 5.67%, 6.02%, and 5.38%, respectively, the highest RMSE values of 6.23, 4.52, and 4.96, respectively, and the lowest R2 values of 0.9537, 0.9412, and 0.9547, respectively.
Through multi-angles comparison, the respective contributions of CEEMDAN, SE, ISSA, and MKELM models in CTP prediction are verified. Additionally, the established CEEMDAN-SE-ISSA-MKELM model has a superior feasibility in CTP forecasting in different carbon trading markets.
6. Conclusions
Since China has promised to realize the peak of carbon emissions by 2030 and carbon neutrality by 2060, carbon trading attracts much attention as it has been verified as a valid way to promote carbon emission reduction. Considering that the competition among market participants will inevitably occur through the operation of carbon trading, the accurate prediction of CTP has become a primary factor in the competition strategy formulation by all market participants. Therefore, this investigation proposed a hybrid CTP prediction model combining CEEMDAN, SE, ISSA, and MKELM methods to enhance the prediction precision. By analyzing the initial CTP data sequences, it can be determined that the CTP data are volatile and unstable. Hence, the CEEMDAN method has been employed to disintegrate the initial CTP data sequences into several IMFs and a residual sequence. In order to save calculation time, the SE method has been utilized to reconstruct the IMFs and the residual sequence into new IMFs. Then, the new IMFs are fed into the MKELM model, combining RBF and the poly kernel functions to utilize the outstanding learning and generalization abilities of them. The parameters of the MKELM model are optimized by ISSA, which combines dynamic inertia weight and chaotic local searching mechanism into the SSA to enhance the searching speed, convergence precision, and global searching ability of the basic SSA.
CTP data in Guangdong, Shanghai, and Hubei are selected to prove the robustness of the established CEEMDAN-SE-ISSA-MKELM model, as the carbon trading turnovers and volumes of the carbon markets in Guangdong, Shanghai, and Hubei occupy more than 60% of the total market up until the end of December 2019, which is representative of these three carbon markets. The single model, including MFM, ARIMA, ELM, LSSVM, LSTM, pELM, rELM, and MKELM (combining poly kernel and RBF kernel), as well as the combined model, including CEEMDAN-MKELM, EMD-MKELM, CEEMD-MKELM, CEEMDAN-SE-MKELM, EMD-SE-MKELM, CEEMD-SE-MKELM, CEEMDAN-SE-FOA-MKELM, CEEMDAN-SE-PSO-MKELM, and CEEMDAN-SE-SSA-MKELM, are selected as the multi-angle comparison models. Additionally, by comparing the values of MAPE, RMSE, and R2, the following conclusions are drawn:
- (1)
The established CEEMDAN-SE-ISSA-MKELM model performs the best among all other comparison models, which demonstrates that the integration of these methods can remarkably increase the forecasting accuracy of CTP.
- (2)
The forecasting performance of the MKELM model is superior than other single models. Thus, the MKELM is selected as the basic model instead of other models in this investigation.
- (3)
The performance of the decomposed data sequences are better than those of undecomposed data. Therefore, it is necessary to employ a decomposition approach to preprocess the initial data with high uncertainty and volatility. Moreover, the comparison analysis has proven that the performance of the CEEMDAN approach exceeds other decomposition methods. Additionally, the SE method is applied to rebuild the decomposed data sequences, which can save calculation time and improve prediction accuracy.
- (4)
The optimization performance of the SSA method is better than other optimization algorithms, including FOA and PSO. Furthermore, the ISSA has great improvement in searching speed, convergence precision, and global searching ability, which further improves the forecasting accuracy. Therefore, the ISSA proposed in this research is of great significance.
- (5)
The carbon markets selected for the experiment herein represent the circumstances of China’s current carbon trading markets; the established CEEMDAN-SE-ISSA-MKELM model has high universality and generality in CTP prediction. The established CTP prediction model can not only be widely applied in CTP prediction, but also offer reference and guidance for decision-makers in reality.
However, this research only forecasted future CTP by historical data. CTP can be influenced by other factors, including oil price, coal price, gas price, related policy, etc. In future study, a further investigation will be conducted to research the influences of these factors on the CTP. Moreover, the development of an improved Markov forecasting method to better predict the carbon trading price is another avenue of research which we plan to explore in the future.