1. Introduction
PM
2.5 is one of the main factors of air pollution in China and is also a major problem that needs to be solved for the prevention and control of air pollution, the protection of public health, the promotion of ecological civilization construction, and the sustainable development of economy and society [
1]. With increasing attention being paid to environmental issues in China, air pollution control work in various places has become an important topic. PM
2.5 concentration prediction research can help provide timely and complete environmental quality information, enabling the government to formulate corresponding strategies and take action in time for environmental changes, which has great significance [
2].
With the development of computer technology, deep learning models can provide higher numerical prediction accuracy due to their nonlinear mapping and adaptive advantages; among them, recurrent neural networks are widely used for air pollutant concentration prediction because of their ability to save and learn information from existing time data. Many scholars have used long short-term memory network (LSTM), gated neural unit (GRU), and other models to perform short-term prediction of PM
2.5 in various places [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. For instance, Zhao Yanming constructed an LSTM model with spatiotemporal correlation to predict PM
2.5 concentration for 36 h in the Beijing–Tianjin–Hebei region [
5]. Some scholars have also constructed combined models to attempt long-term prediction [
13,
14] based on this. For instance, Huang Jie used a stacking strategy to build an RNN–CNN (Recurrent Neural Network-Convolutional Neural Networks) model and predicted PM
2.5 concentration for 2 months [
14], but the time scale was too broad; only eight time points were predicted, and the effect was general. Some scholars have introduced transformer architecture-based time-series models [
15] into PM
2.5 concentration prediction. For instance, Liu Enhai et al. combined the transformer model to predict PM
2.5 concentration for 23 h in the city of Shijiazhuang [
16], and Dong Hao et al. used the informer model to predict PM
2.5 concentration for 1–6 h in the city of Beijing [
17].
The above research methods have improved the accuracy, processing efficiency and feature expression of PM2.5 concentration prediction; however, most of the research focuses on short-term prediction (a few hours), which makes it difficult to show the future trends of PM2.5 changes, and has limitations. The long-term prediction of PM2.5 concentration has its necessity and value for research. Long-term prediction (weeks, months) can be achieved by learning from longer-term-series data and, on this basis, ensuring the refinement and accuracy of the time scale. It can provide sufficient time to cope with air pollution in advance, make reasonable environmental management decisions, evaluate the current and future air quality conditions and monitor and warn about the occurrence of heavy pollution events. At the same time, it can show the long-term trend and periodicity of PM2.5 concentration changes; analyze their complex relationship with various factors, such as meteorological factors, pollution sources, spatial characteristics, etc.; and provide a scientific basis for exploring PM2.5 formation mechanisms and control strategies.
However, long-term-series data have a large period, refined time scales, and large data volumes; they are also highly cyclical, with obvious seasonality and large variations in local data characteristics. In addition, it is not reliable to detect temporal dependencies directly from a long-time series, because the dependencies may be masked by entangled temporal patterns. Long-time-series data are also highly unstable, and capturing their overall patterns with conventional periodic and seasonal fitting methods is difficult [
18]. This makes it more difficult to learn and predict a long time series using conventional models. Therefore, there are fewer studies on the long-time-series prediction of PM
2.5 concentration; some experimental results involving long-time-series prediction studies are general; most time-series prediction models are constrained by the complexity of the characteristics of a long time series, and the quantity of data is difficult to apply to long-time-series prediction.
Modal decomposition algorithms, due to their adaptive characteristics for non-stationary sequences, can effectively reduce the complexity and instability of long-term data, and some scholars have applied them in the field of air pollution control and have achieved good results. For example, Xiao used the ICEEDAN (Improved complete ensemble empirical mode decomposition with adaptive noise) algorithm to fully extract the complex characteristics of short-term atmospheric pollutant concentration on different time scales [
19]. Song decomposed an N
2O and CO time series monitored through experiments, and the decomposed bands had complete regularity, except for the residual band [
20]. This method is also widely used in feature extraction and signal-denoising fields. For example, Ali used a modal decomposition method to decompose the significant wave height in the eastern coastal area of Australia [
21], while Liang used the ICEEDAN modal decomposition method to decompose the time series of air passenger transport [
22]. Due to the decomposition ability and autocorrelation mechanism of autoformer [
23], which is good at dealing with the global characteristics of longer sequences and seasonal and cyclic characteristics, this model can solve the problem of long-time sequences with a long period and strong cyclicity, and the longer the time, the more obvious the effect is, which is very appropriate for the needs of long-time-sequence prediction of PM
2.5. Therefore, to address the difficulty of the long-term prediction problem of PM
2.5 concentration, the ICEEMDAN modal decomposition algorithm [
24] and autoformer model were introduced into this study of the long-term prediction of PM
2.5, and the former was integrated into the autoformer model to improve the model, thus the modal autoformer model was established. To test the method, a multi-factor information flow causal analysis was added to screen the relevant influencing factors to increase its rationality. The feasibility of the method for long-term prediction was explored by taking three representative cities (Zhengzhou, Luoyang, and Zhumadian) in Henan Province as examples, and a control validation model was constructed to quantitatively compare the effect of long-term prediction in the above regions, based on which a qualitative analysis was further carried out, and it was concluded that the constructed model overcame the prediction difficulties of long time series and realized the long-term prediction of PM
2.5 concentration. Thus, the constructed model is highly relevant.
4. Implementation and Verification of Modal Autoformer
4.1. Optimized Multi-Step Time-Series Decomposition to Implement Modal Autoformer
Based on the processing method in the previous section, this paper organizes the normalized original data into a dataset consisting of the screened atmospheric pollutant factors, meteorological-related influencing factors, and the PM2.5 sequence.
This paper proposes an optimized multi-step time-series decomposition method to deal with the complex characteristics of the dataset, which combines the ICEEMDAN modal decomposition algorithm and the autoformer model. The ICEEMDAN modal decomposition algorithm is integrated into the autoformer model as a module, and a modal autoformer model is constructed, which can fully utilize and enhance the decomposition effect of both algorithms. Unlike the traditional ICEEMDAN modal decomposition algorithm, which decomposes the original data outside the model at the data processing stage and inputs each component into the model for prediction and then superimposes them, this paper embeds the ICEEMDAN modal decomposition algorithm into the data loading part of the autoformer model and optimizes the processing of each component according to their different characteristics. The structure of the modal autoformer model is shown in
Figure 6.
The ICEEMDAN modal decomposition algorithm needs to set the parameters, Nstd is the signal-to-noise ratio, ranging from 0 to 1; NR is the number of noise additions ranging from 50 to 100 times, and MaxIter is the maximum number of iterations. In the module of this paper, MaxIter is set to inf, i.e., the decomposition process of the algorithm is not interrupted to achieve the optimal decomposition effect. To ensure that the subsequent secondary decomposition (autoformer implementation) extracts the characteristics and regularity of the sequence, the primary decomposition (ICEEDAN implementation) needs to generate as many regular components as possible, so the other two parameters by the minimum permutation entropy cost (the smaller the value of the regularity) of the fitness function is determined using the calculations of the Grey Wolf optimization algorithm [
27], as shown in
Figure 7.
The model design and its operation process are as follows: the data loading part of the model, after receiving the input-screened PM2.5 composite dataset, reads the column names to split the PM2.5 data columns of the dataset from the other relevant influencing factor data, uses the variable XG (A matrix that originally stored the time series of impact factors) to temporarily store the relevant influence factor data, and inputs the PM2.5 data columns, as well as the NR and Nstd parameters obtained by the parameter optimization algorithm, into the ICEEDAN module for modal decomposition, and the post-nominalized generated bands and the number of bands m are recorded sequentially. For each band, perform waveform recognition, record the number of peaks and troughs and the size difference of local extremes, and generate a specific key value sequence. The sequence reserves two positions for storing two key values when initializing. The first value is used to match the batch size, and the second value is used to match the moving average window. Each processed band is spliced horizontally at the end column of XG, and m new components are obtained and input to the main function with nested loops in order.
For each waveform generated, waveform identification is performed, and the number of peaks and valleys and the size difference of local extremes are recorded, from which a specific key value sequence corresponding to each waveform is generated, the sequence is initialized by reserving two positions to store two key values, the first one is used to match the batch size, and the second one is used to match the moving average window. Each processed band is spliced horizontally into the XG tail column to obtain m new components with the shape of (the number of correlation factor + 1, length of PM2.5 sequence), which is then sequentially fed into the main function nested in a loop, and at the same time, the corresponding key-value sequences of the bands are also put into the main function. For the training of each component, an automatic learning rate optimization method is also written inside the model to cope with different waveforms of components during training, which fully demonstrates the advantages of decomposition and step-by-step processing.
After the main function of the autoformer is executed, the main function will put the kth component (the initial value is 1 when it is called for the first time) and the corresponding key value sequence into the internal parameter module of the main function; in the internal parameter module of the main function, the batch size and the moving average window are the pre-set dictionaries that can be matched with the key value sequence, and when the two of them acquire the key value sequence, they will be matched. Generate a new batch size and moving average window values to be added to the model parameters; when the model obtains all the required parameters it can be initialized for training, the kth component is divided into training samples, and prediction occurs of the true value in the proportion of 85% and 15%. During the training process, the model learns by constantly iterating the comparison between its own predicted values and labels and fully extracts its seasonal features and periodic patterns by decomposing the components in multiple decomposition modules to achieve the expected effect of secondary decomposition. After the completion of model training and prediction, the generated predicted component value k is recorded and numbered, named (predicted component k), and evaluated; if the number of the encoded number and the number of components m are not equal, the current value of k is assigned to k + 1 and re-entered into the main function to carry out the next round of the cycle; If the number of numbers encoded is equal to the number of components m, the loop ends. Then, the recorded predicted component values k are superimposed to obtain the final prediction result of the modal Autoformer.
For the training of each component, an auto-tuning learning rate method is also written inside the model to cope with the different waveforms of the components during training, taking full advantage of the decomposition and step-by-step processing.
4.2. Result Verification
To verify the effectiveness and feasibility of the method, this paper takes Zhengzhou as an example and applies the above method to obtain the prediction comparison results for Zhengzhou, and draws them as shown in
Figure 8 and
Figure 9.
As can be seen from the figures, the modal autoformer can better predict the trend in the PM
2.5 series. To further illustrate the prediction accuracy, this paper uses the mean absolute error (MAE), mean absolute percentage error (MAPE), coefficient of determination (R
2), and root mean square error (RMSE) as indicators for verification. The results are shown in
Table 3.
5. Results and Analysis
5.1. Comparison of Indicators in Each Study Area
To verify whether the method in this paper has superiority and generalization in long-term prediction, the common GRU model, informer model (which shows excellent short-term prediction) [
28], the autoformer model, and empirical mode decomposition combined with the GRU model (referred to as modal GRU) were used as control groups in the comparative experiments for prediction analysis.
In this paper, the completed dataset is divided into 85% training dataset and 15% test dataset. For the model of the non-modal decomposition method and the model using modal decomposition, this paper incorporates the function of automatic parameter adjustment. The difference is that the adjustment of the model of the non-modal decomposition method occurs after each complete prediction process, while the adjustment of the model using modal decomposition occurs after the prediction of each component, and the initial parameter settings are based on the characteristics of different models to set the hyperparameters. The initial hyperparameters are set according to the characteristics of different models and fine-tuned according to the model training and validation results, and the initial hyperparameters of the five finally adopted models are shown in
Table 4.
The five models of GRU, informer, autoformer, modal GRU, and modal autoformer were trained using multi-feature data from 1 January 2016 to 12 June 2020 in Zhengzhou, Luoyang, and Zhumadian, respectively. The PM
2.5 concentrations in the three cities from 12 June 2020 to 5 December 2020 were predicted and the MAE, MAPE, R
2, and RMSE indicators were calculated for accuracy evaluation, as shown in
Figure 10. The results are shown in
Table 5,
Table 6 and
Table 7.
In Zhengzhou, the modal autoformer has the best indicators. Among other models, the modal GRU has excellent indicators, second only to the modal autoformer; the MAPE of the autoformer is slightly higher, and the MAPE and R2 performance of the informer are relatively poor. The predictive effect of the GRU is too weak, and it does not have any value in terms of application.
In Luoyang, the performance of modal autoformer is far superior to other models; the performance of autoformer is comparable to that of modal GRU, with a certain gap with modal autoformer; among the performance indicators of informer, MAE is the closest to the above models, indicating that the real error gap is not large, but the other indicators differ greatly, indicating the instability of the model in the city of Luoyang and the poor fitting effect. The predictive effect of GRU is too poor and does not have application value.
In Zhumadian, modal autoformer and modal GRU performed well, but the MAPE of modal GRU was higher than that of modal autoformer. The accuracy of autoformer and informer was similar to the above two models, and the performance of autoformer was slightly better than that of informer. The prediction effect of GRU was too low, and it had no practical value.
The above experimental results show that the ICEEMDAN modal decomposition algorithm can play an obvious role in long-term prediction, and the modal autoformer and modal decomposition GRU using this method have a great improvement in accuracy. The GRU model is not applicable in the long-term prediction of the three regions, and the accuracy of informer has a great scope for improvement. The performance of autoformer and modal GRU fluctuates in each region.
In contrast, the training effect of modal autoformer is the most stable in the three research areas, and the accuracy evaluation index results are the best in the long-term prediction of the three research areas, indicating the effectiveness and feasibility of modal autoformer in solving the problem of long-term prediction.
5.2. Detailed Contrast Experiments of Examples
Because of the quantitative comparison performance of each model mentioned above, to verify the effectiveness of the model improvement by using the method in this paper in detail, taking the city of Zhengzhou as an example, further experiments are conducted to provide a more visualized comparison of PM
2.5 prediction effects of each model, to better demonstrate the prediction ability of each model. The training and prediction effects of each model in the city of Zhengzhou are shown in
Figure 11 and
Figure 12.
Compared to the informer, the modal autoformer can better predict the whole period, and its overall prediction effect is better than informer. However, the informer can better predict densely distributed intervals of PM2.5 concentration, and the prediction effect of the sparsely distributed intervals is poor. This also reflects the differences in structure and attention mechanism between the two. The modal autoformer using a decomposition structure and autocorrelation mechanism can extract the trend of time series and adjust the season, which makes it better at predicting the part with large changes affected by characteristics of time and season. The informer using a sparse attention mechanism can learn the periodic change characteristics, but its learning ability of the large fluctuations in the long-term prediction is insufficient. Compared with the autoformer, the modal autoformer introducing the modal decomposition algorithm can greatly improve the obvious over-prediction and over-fitting phenomena in the autoformer predictions, and shows flexible adaptability in the interval of different waveforms, reducing the impact of the strong time complexity and multiple influencing factors of the PM2.5 sequence in the long-term prediction. Compared with modal GRU, the prediction effect of modal autoformer is closer to the true values, except for some local extreme values that cannot be accurately predicted, the others are consistent with the true values; the two models using modal decomposition algorithms have good adaptability to long-term prediction, but in contrast, modal GRU has a certain time lag in its predictions, and the trend in some periods may not be consistent with the true values.
To verify the differences between the modal autoformer and the modal GRU in time-series processing, the PM
2.5 interval with large fluctuations in the prediction time range is selected to analyze the prediction effect of the two, as shown in
Figure 13.
As shown in
Figure 13, the prediction values of modal autoformer have small deviations from the true values, and there is almost no prediction lag phenomenon, which reflects its ability to decompose and extract features of different components on the time feature level. On the contrary, due to the lack of decomposition and feature extraction ability of the modal GRU on the time feature level, in many small intervals, its prediction values have large deviations from the true values and prediction lag phenomenon.
To verify the different effects of the modal autoformer and the modal GRU on the components after performing modal decomposition on the original sequence, the prediction results of their intrinsic mode functions components are compared, and the four components with the largest differences are selected for analysis, which are IMF
11, IMF
12, IMF
13 and a residual component. The results are shown in
Figure 10.
As can be seen from
Figure 14, the modal autoformer can predict well for both the regular IMF
11 and IMF
12 components, the fluctuating IMF
13 component and the smooth residual component, and the predicted values are consistent with the true values, reflecting its adaptability to different feature components. In contrast, due to the lack of model adjustment adaptability to different component features, the modal GRU’s prediction trends for IMF
11, IMF
12 and IMF
13 components are roughly in line, but the predicted values of IMF
11 and IMF
12 are slightly lower than the true values overall, and the predicted values of IMF
13 show ups and downs and have a certain degree of deviation from the true values. The predicted values of the residual component are far from the true values, and the predicted trend does not match the actual change over time, resulting in a significant deviation.
In summary, the modal autoformer applies modal decomposition method to perform preliminary decomposition on the original sequence and adjusts and changes the model accordingly for different features of each component during processing. Additionally, the model’s decomposition structure is used to further decompose each component on the time series. The modal autoformer has successfully addressed the limitations of traditional models in long-term prediction problems, leading to significant improvements. This breakthrough has enabled the solution of complex long-term problems.
6. Conclusions
This paper integrates the ICEEMDAN modal decomposition algorithm into the autoformer to construct a modal autoformer model and verifies its feasibility and effectiveness with three representative cities in Henan Province (Zhengzhou, Luoyang, and Zhumadian) as examples. As for the relevant factor selection method, the multi-factor information flow method is adopted to select appropriate relevant factors for each region and verify their reliability. To verify the superiority and robustness of the method proposed in this paper, GRU, informer, autoformer, and the modal GRU models are constructed to compare and analyze the long-term prediction of PM2.5 concentration in three cities. The results show that:
The optimized multi-step time-series decomposition method can handle the variable features of PM2.5 long sequence well, decomposing the complex single band into a collection of multiple bands with strong regularity, strong periodicity or single change trend and smoothness, which can effectively reduce the difficulty of model learning and prediction. Both the decomposition of the sequence into multiple components and the decomposition of the sequence on the time feature level have good effects.
By integrating the modal decomposition algorithm into the autoformer model and improving the model matching, the decomposition ability of both can be fully utilized and combined, and the stability of the model can be improved.
Modal autoformer can solve the complex time pattern, large data volume, strong periodicity and seasonality problems of PM2.5 long-term sequence prediction well, and has strong spatial adaptability. It can maintain the stability of the prediction effect in different regions.
The long-term series prediction of PM2.5 concentration has a wide range of application value and research potential, and the rapid development of deep learning also provides objective conditions for it, which greatly improves the feasibility and accuracy of the model through the organic combination of more targeted algorithms and models. Therefore, the accuracy tends to be more and more close to the expected level. Given the strong spatial correlation of the PM2.5 concentration series, we will enhance the model’s reliability by including additional relevant influencing factors, particularly spatial ones. We will also conduct experiments in a broader study area to verify the model’s generalizability. In the methodological part of the future study, we will optimize the methodology and research techniques, and introduce new techniques for the long-term prediction of PM2.5 concentration series.