Long-Term Prediction of Particulate Matter2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm

Zhou, Shiyu; Zhang, Xinjia; Liu, Jianzhong; Zhang, Yinbao; Wei, Pengzhi; Wang, Yalin; Zhang, Jingwei

doi:10.3390/atmos15010004

Open AccessArticle

Long-Term Prediction of Particulate Matter_2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm

by

Shiyu Zhou

¹,

Xinjia Zhang

^1,*,

Jianzhong Liu

¹,

Yinbao Zhang

¹,

Pengzhi Wei

²

,

Yalin Wang

¹ and

Jingwei Zhang

¹

School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China

²

Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2024, 15(1), 4; https://doi.org/10.3390/atmos15010004

Submission received: 22 November 2023 / Revised: 16 December 2023 / Accepted: 16 December 2023 / Published: 20 December 2023

(This article belongs to the Section Air Pollution Control)

Download

Browse Figures

Versions Notes

Abstract

:

To overcome the limitations of long-term prediction of PM_2.5 concentration, a multi-factor information flow causality analysis method is used to screen suitable meteorological and air pollutant-related factors and concatenate them with a PM_2.5 sequence as the dataset. A modal decomposition algorithm is used as a module to be integrated into the autoformer (transformer improved with autocorrelation mechanism) model to improve it, and the modal autoformer (empirical modal decomposition combined with autoformer) is proposed. The constructed model decomposes the sequence into several components by using the modal decomposition module and uses the self-correlation mechanism and decomposition structure to decompose and extract features of different components at the time-feature level. Based on the matching method, the model is adjusted for different component features to improve the long-term prediction effect. The model is applied to three cities in Henan Province, Zhengzhou, Luoyang, and Zhumadian, as examples for experiments, and gated neural unit (GRU), informer, autoformer, and modal GRU (empirical modal decomposition combined with GRU model) are constructed for comparative verification. The results show that the modal autoformer can better cope with the complex characteristics of long-term prediction of the PM_2.5 time series, has strong spatial adaptability and that its various indicators are optimal for the three cities, with R² values being all above 0.96, where the highest is 0.987 in Zhengzhou; MAPE (Mean absolute percentage error) values all being less than 10, where the best is 7.602 in Zhumadian; and MAE (Mean absolute error) values all being less than 4. The prediction effect is stable enough, showing its feasibility and adaptability in long-term prediction.

Keywords:

long-term prediction of PM_2.5 concentration; deep learning; optimized multi-step time-series decomposition; modal autoformer

1. Introduction

PM_2.5 is one of the main factors of air pollution in China and is also a major problem that needs to be solved for the prevention and control of air pollution, the protection of public health, the promotion of ecological civilization construction, and the sustainable development of economy and society [1]. With increasing attention being paid to environmental issues in China, air pollution control work in various places has become an important topic. PM_2.5 concentration prediction research can help provide timely and complete environmental quality information, enabling the government to formulate corresponding strategies and take action in time for environmental changes, which has great significance [2].

With the development of computer technology, deep learning models can provide higher numerical prediction accuracy due to their nonlinear mapping and adaptive advantages; among them, recurrent neural networks are widely used for air pollutant concentration prediction because of their ability to save and learn information from existing time data. Many scholars have used long short-term memory network (LSTM), gated neural unit (GRU), and other models to perform short-term prediction of PM_2.5 in various places [3,4,5,6,7,8,9,10,11,12]. For instance, Zhao Yanming constructed an LSTM model with spatiotemporal correlation to predict PM_2.5 concentration for 36 h in the Beijing–Tianjin–Hebei region [5]. Some scholars have also constructed combined models to attempt long-term prediction [13,14] based on this. For instance, Huang Jie used a stacking strategy to build an RNN–CNN (Recurrent Neural Network-Convolutional Neural Networks) model and predicted PM_2.5 concentration for 2 months [14], but the time scale was too broad; only eight time points were predicted, and the effect was general. Some scholars have introduced transformer architecture-based time-series models [15] into PM_2.5 concentration prediction. For instance, Liu Enhai et al. combined the transformer model to predict PM_2.5 concentration for 23 h in the city of Shijiazhuang [16], and Dong Hao et al. used the informer model to predict PM_2.5 concentration for 1–6 h in the city of Beijing [17].

The above research methods have improved the accuracy, processing efficiency and feature expression of PM_2.5 concentration prediction; however, most of the research focuses on short-term prediction (a few hours), which makes it difficult to show the future trends of PM_2.5 changes, and has limitations. The long-term prediction of PM_2.5 concentration has its necessity and value for research. Long-term prediction (weeks, months) can be achieved by learning from longer-term-series data and, on this basis, ensuring the refinement and accuracy of the time scale. It can provide sufficient time to cope with air pollution in advance, make reasonable environmental management decisions, evaluate the current and future air quality conditions and monitor and warn about the occurrence of heavy pollution events. At the same time, it can show the long-term trend and periodicity of PM_2.5 concentration changes; analyze their complex relationship with various factors, such as meteorological factors, pollution sources, spatial characteristics, etc.; and provide a scientific basis for exploring PM_2.5 formation mechanisms and control strategies.

However, long-term-series data have a large period, refined time scales, and large data volumes; they are also highly cyclical, with obvious seasonality and large variations in local data characteristics. In addition, it is not reliable to detect temporal dependencies directly from a long-time series, because the dependencies may be masked by entangled temporal patterns. Long-time-series data are also highly unstable, and capturing their overall patterns with conventional periodic and seasonal fitting methods is difficult [18]. This makes it more difficult to learn and predict a long time series using conventional models. Therefore, there are fewer studies on the long-time-series prediction of PM_2.5 concentration; some experimental results involving long-time-series prediction studies are general; most time-series prediction models are constrained by the complexity of the characteristics of a long time series, and the quantity of data is difficult to apply to long-time-series prediction.

Modal decomposition algorithms, due to their adaptive characteristics for non-stationary sequences, can effectively reduce the complexity and instability of long-term data, and some scholars have applied them in the field of air pollution control and have achieved good results. For example, Xiao used the ICEEDAN (Improved complete ensemble empirical mode decomposition with adaptive noise) algorithm to fully extract the complex characteristics of short-term atmospheric pollutant concentration on different time scales [19]. Song decomposed an N₂O and CO time series monitored through experiments, and the decomposed bands had complete regularity, except for the residual band [20]. This method is also widely used in feature extraction and signal-denoising fields. For example, Ali used a modal decomposition method to decompose the significant wave height in the eastern coastal area of Australia [21], while Liang used the ICEEDAN modal decomposition method to decompose the time series of air passenger transport [22]. Due to the decomposition ability and autocorrelation mechanism of autoformer [23], which is good at dealing with the global characteristics of longer sequences and seasonal and cyclic characteristics, this model can solve the problem of long-time sequences with a long period and strong cyclicity, and the longer the time, the more obvious the effect is, which is very appropriate for the needs of long-time-sequence prediction of PM_2.5. Therefore, to address the difficulty of the long-term prediction problem of PM_2.5 concentration, the ICEEMDAN modal decomposition algorithm [24] and autoformer model were introduced into this study of the long-term prediction of PM_2.5, and the former was integrated into the autoformer model to improve the model, thus the modal autoformer model was established. To test the method, a multi-factor information flow causal analysis was added to screen the relevant influencing factors to increase its rationality. The feasibility of the method for long-term prediction was explored by taking three representative cities (Zhengzhou, Luoyang, and Zhumadian) in Henan Province as examples, and a control validation model was constructed to quantitatively compare the effect of long-term prediction in the above regions, based on which a qualitative analysis was further carried out, and it was concluded that the constructed model overcame the prediction difficulties of long time series and realized the long-term prediction of PM_2.5 concentration. Thus, the constructed model is highly relevant.

2. Research Area, Data, and Method Overview

2.1. Overview of the Research Area

Henan Province is located in the central region of China, belonging to the middle and lower reaches of the Yellow River basin, and is an important comprehensive transportation hub and center of human beings, material, and information flows in the country; it has a unique geographical location and terrain structure, high emission intensity, and high population density. Its main climate type is temperate monsoon climate–subtropical monsoon climate, which causes a great difference in the climatic characteristics of the four seasons in this region. It is also dry and windy. The above spatial- and temporal-related factors such as geographical location, energy emissions, climate change, etc., have made the air pollution situation in Henan Province more severe in recent years, especially PM_2.5 pollution. Its PM_2.5 data reflect frequent fluctuations in the long time series, making prediction difficult. Zhengzhou is located in the central part of Henan Province, Luoyang is located in the western part of Henan Province, and Zhumadian City is located in the southern part of Henan Province. This paper selected these cities as the research areas, because they can represent the PM_2.5 changes in different spatial areas of Henan Province, and can also test the predictive ability and regional adaptability of the model for PM_2.5 long time series.

2.2. Research Data

The influencing factor data used in this paper can be mainly divided into two categories: air pollutant data and meteorological-related factor data. The geographical area is three prefecture-level cities in Henan Province (Zhengzhou, Luoyang, Zhumadian), the time was from January 2016 to December 2020, and the time scale was hourly. The indicators of air pollutant-related factors are PM₁₀, SO₂, O₃, NO₂ (source: China Environmental Monitoring Station: http://www.cnemc.cn/ accessed on 1 January 2023). The indicators of meteorological-related factors are temperature, dew point temperature, sea level pressure, wind speed, wind direction, and other data (source: NCDC (National Climatic Data Center of the United States): ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/ accessed on 1 January 2023). The missing data were completed by averaging the values of adjacent points.

2.3. Research Method

2.3.1. Multi-Factor Information Flow Causality Analysis Method

Multi-factor information flow causality analysis method [25,26] is a causal inference method that can eliminate the interference of noise from complex multivariate sequences, mine the causal correlation between multiple variables in a certain period, and has high efficiency and accuracy of calculation which can be standardized and tested. The multi-factor information flow causality analysis method complements the defect that the previous information flow causality analysis method can only verify the existence of correlation but cannot calculate the degree of correlation and is appropriate for the selection of related influencing factors. Multi-factor information flow causality analysis method is a long-term multivariate time-series causal inference formed by the extension of the bivariate time-series analysis theory, and its core formula is shown in Equations (1) and (2):

τ_{2 \to 1} = \frac{T_{2 \to 1}}{Z}

(1)

Z = |\hat{(\frac{d H_{1}^{*}}{d t})}| + \sum_{j = 2}^{d} |{\hat{T}}_{j \to 1}| + |(\frac{\hat{d H_{1}^{noise}}}{d t})|

(2)

where

Z

is the normalized multivariate time-series causal relationship;

(\hat{\frac{d H_{1}^{noise}}{d t}}) = \frac{1}{2} \frac{{\hat{g}}_{11}}{C_{11}}

,

{\hat{g}}_{11}

is the maximum likelihood estimator;

C_{11}

is the corresponding sample covariance;

|\hat{(\frac{d H_{1}^{*}}{d t})}|

represents the contribution of the marginal entropy change itself to the maximum likelihood estimation;

{\hat{T}}_{j \to 1}

is the maximum likelihood estimation from sequence j to sequence 1;

{\hat{g}}_{11}, |\hat{(\frac{d H_{1}^{*}}{d t})}|, {\hat{T}}_{j \to 1}

can be estimated using formulas that have been proven. Compared with other influencing factor screening methods, it can take into account the mutual influence of multiple variables, fully consider the time characteristics of the sequence, and can property represent the time variation in the correlation, which is more in line with the needs of PM_2.5 long-term prediction.

2.3.2. ICEEMDAN Modal Decomposition Method

The ICEEMDAN modal decomposition algorithm is an improved algorithm of the empirical modal decomposition algorithm, which can decompose complex time-series. It has a strong ability to extract different scale feature components contained in non-stationary complex time series, perform local adaptive analysis on time series, and study the nonlinear system characteristics of time series. The core formula is as follows:

I M F_{k} = r_{k - 1} - ⟨M (r_{k - 1} + β_{k - 1} E_{k} (w^{i}))⟩

(3)

where

I M F_{k}

represents the kth intrinsic mode function component;

w^{i}

is a zero-mean unit variance white noise; M

(\cdot)

is the local average value of the envelope line that satisfies the IMF (Intrinsic Mode Function) screening threshold. Applied in the field of time-series prediction, it can improve the learning effect by decomposing and then superimposing each component separately. Based on this method, the non-stationary long-term PM_2.5 data can be decomposed into multiple components with relatively gentle trends and low noise, reducing the impact of the unstable characteristics of the long term on prediction.

2.3.3. Autoformer

Autoformer is a deep learning model based on transformer. It includes an internal sequence decomposition unit, a self-correlation mechanism, and a decomposition architecture improved by the encoder–decoder structure. The internal sequence decomposition unit extracts and decomposes the seasonal factors of time-series data based on the sliding average idea. The autoformer uses the self-correlation mechanism to replace the traditional multi-head attention mechanism, which is based on the inherent periodicity of time-series and can provide serial connections for multi-dimensional sequences, making it more appropriate for the processing and global learning of a long time series. In the decomposition architecture of the autoformer model, the sliding average is an important part of its solution to long-term prediction problems. The objective of this model is to analyze the complex time patterns of a long time series and introduces moving average lines to reduce periodic fluctuations and highlight long-term trends. The sliding average effect can be achieved by adjusting the size of the sliding average window. The structure of the autoformer model is shown in Figure 1.

3. Multiple Impact Factor Screening

3.1. Data Analysis

To understand the evolution of PM_2.5 in the research area in recent years before examining the influencing factors, in this paper an overall trend analysis was conducted after standardizing the data and calculates the moving average line of the overall change in PM_2.5 concentration in the research area since 2016. Taking Zhengzhou as an example, the curve diagram drawn is shown in Figure 2.

To obtain a more intuitive comparison of the year-by-year situation and the trend of the seasonal changes within the year, the PM_2.5 concentration data are divided by year, and the monthly average of the PM_2.5 concentration for each year is estimated to simplify the time scale, and the concentration change line graph is drawn on a unified time scale axis, as shown in Figure 3.

From Figure 3, it can be seen that the PM_2.5 concentration in Zhengzhou is higher in January and December of each year, and lower in June and July, which indicates that the PM_2.5 concentration is greatly affected by seasonal factors. At the same time, from 2016 to 2020, the PM_2.5 concentration in Zhengzhou showed a downward trend overall, especially in 2019 and 2020, the PM_2.5 concentration was significantly lower than in previous years, which may be related to the air quality improvement measures taken by Zhengzhou in recent years.

3.2. Multi-Factor Information Flow Causality Analysis Method

Aiming at the complex characteristics of PM_2.5 concentration analyzed in the previous section, this paper uses the multi-factor information flow causality analysis method to mine the deep regional time correlation between the multivariate influencing factor data and the PM_2.5 concentration and selects the appropriate multivariate influencing factors for this region according to the results, which are used as the input parameters of the long-term prediction model.

Taking the city of Zhengzhou as an example, the cleaned time-series data of each air pollutant factor and PM_2.5 time-series data are spliced and inputted, and the hourly air pollutant factor is taken as the pending cause (X), the corresponding PM_2.5 data are taken as the pending effect (Y), and an M-dimensional time-series matrix is constructed, where M depends on the number of factors, and 1 hour is taken as the step size into the application of the multifactorial information flow causal analysis method formula and iteratively calculated. The formula is as follows:

X (n) = 0.5 X (n - 1) + a_{2} Y (n - 1) + 0.7 Z (n - 1) + N_{1} e_{1} (n)

(4)

Y (n) = b_{1} X (n - 1) + 0.9 Y (n - 1) + 0.2 Z (n - 1) + N_{2} e_{2} (n)

(5)

Z (n) = 0.2 Z (n - 1) + e_{3} (n) .

(6)

The other parameters in the formula are calculated from the core equation. The resulting values of the degree of correlation between each air pollutant factor and the PM_2.5 concentration over time can be derived and a significance test is performed on them, as shown in Table 1; the complete time correlation trend is obtained by resampling and iterative calculation, and plotted as a curve diagram, as shown in Figure 4.

The same method is used to calculate the time correlation between PM_2.5 concentration and five meteorological-related factors, and the representative values and significance of the correlation degree of each meteorological-related factor and PM_2.5 with time change are shown in Table 2. The corresponding time correlation curve trend is calculated and plotted as shown in Figure 5.

From the analysis of Figure 4 and Table 1, it is not difficult to conclude that in the period from 2016 to 2020, among the meteorological-related factors, the PM_2.5 concentration in Zhengzhou is significantly correlated with temperature, dew point temperature, wind speed, air pressure, and cloud cover. Therefore, this paper selects temperature, dew point temperature, wind speed, air pressure, and cloud cover as the meteorological-related influencing factors for PM_2.5 concentration prediction in Zhengzhou.

The analysis of Figure 5 and Table 2 shows that the correlation between atmospheric pollutant factors and PM_2.5 concentration in Zhengzhou has strong fluctuations in long time series. From the beginning of 2016 to the end of 2020, there were significant differences in the correlation between various atmospheric pollutant factors and PM_2.5 concentration in Zhengzhou. Among them, the PM_2.5 concentration was significantly correlated with PM₁₀, CO, NO₂ and O₃. Therefore, this paper selects PM₁₀, CO, NO₂ and O₃ as the atmospheric pollutant influencing factors for PM_2.5 concentration prediction in Zhengzhou.

Using the same method, the meteorological-related influencing factors and atmospheric pollutant influencing factors for PM_2.5 concentration prediction in Luoyang and Zhumadian can be obtained.

4. Implementation and Verification of Modal Autoformer

4.1. Optimized Multi-Step Time-Series Decomposition to Implement Modal Autoformer

Based on the processing method in the previous section, this paper organizes the normalized original data into a dataset consisting of the screened atmospheric pollutant factors, meteorological-related influencing factors, and the PM_2.5 sequence.

This paper proposes an optimized multi-step time-series decomposition method to deal with the complex characteristics of the dataset, which combines the ICEEMDAN modal decomposition algorithm and the autoformer model. The ICEEMDAN modal decomposition algorithm is integrated into the autoformer model as a module, and a modal autoformer model is constructed, which can fully utilize and enhance the decomposition effect of both algorithms. Unlike the traditional ICEEMDAN modal decomposition algorithm, which decomposes the original data outside the model at the data processing stage and inputs each component into the model for prediction and then superimposes them, this paper embeds the ICEEMDAN modal decomposition algorithm into the data loading part of the autoformer model and optimizes the processing of each component according to their different characteristics. The structure of the modal autoformer model is shown in Figure 6.

The ICEEMDAN modal decomposition algorithm needs to set the parameters, Nstd is the signal-to-noise ratio, ranging from 0 to 1; NR is the number of noise additions ranging from 50 to 100 times, and MaxIter is the maximum number of iterations. In the module of this paper, MaxIter is set to inf, i.e., the decomposition process of the algorithm is not interrupted to achieve the optimal decomposition effect. To ensure that the subsequent secondary decomposition (autoformer implementation) extracts the characteristics and regularity of the sequence, the primary decomposition (ICEEDAN implementation) needs to generate as many regular components as possible, so the other two parameters by the minimum permutation entropy cost (the smaller the value of the regularity) of the fitness function is determined using the calculations of the Grey Wolf optimization algorithm [27], as shown in Figure 7.

The model design and its operation process are as follows: the data loading part of the model, after receiving the input-screened PM_2.5 composite dataset, reads the column names to split the PM_2.5 data columns of the dataset from the other relevant influencing factor data, uses the variable XG (A matrix that originally stored the time series of impact factors) to temporarily store the relevant influence factor data, and inputs the PM_2.5 data columns, as well as the NR and Nstd parameters obtained by the parameter optimization algorithm, into the ICEEDAN module for modal decomposition, and the post-nominalized generated bands and the number of bands m are recorded sequentially. For each band, perform waveform recognition, record the number of peaks and troughs and the size difference of local extremes, and generate a specific key value sequence. The sequence reserves two positions for storing two key values when initializing. The first value is used to match the batch size, and the second value is used to match the moving average window. Each processed band is spliced horizontally at the end column of XG, and m new components are obtained and input to the main function with nested loops in order.

For each waveform generated, waveform identification is performed, and the number of peaks and valleys and the size difference of local extremes are recorded, from which a specific key value sequence corresponding to each waveform is generated, the sequence is initialized by reserving two positions to store two key values, the first one is used to match the batch size, and the second one is used to match the moving average window. Each processed band is spliced horizontally into the XG tail column to obtain m new components with the shape of (the number of correlation factor + 1, length of PM_2.5 sequence), which is then sequentially fed into the main function nested in a loop, and at the same time, the corresponding key-value sequences of the bands are also put into the main function. For the training of each component, an automatic learning rate optimization method is also written inside the model to cope with different waveforms of components during training, which fully demonstrates the advantages of decomposition and step-by-step processing.

After the main function of the autoformer is executed, the main function will put the kth component (the initial value is 1 when it is called for the first time) and the corresponding key value sequence into the internal parameter module of the main function; in the internal parameter module of the main function, the batch size and the moving average window are the pre-set dictionaries that can be matched with the key value sequence, and when the two of them acquire the key value sequence, they will be matched. Generate a new batch size and moving average window values to be added to the model parameters; when the model obtains all the required parameters it can be initialized for training, the kth component is divided into training samples, and prediction occurs of the true value in the proportion of 85% and 15%. During the training process, the model learns by constantly iterating the comparison between its own predicted values and labels and fully extracts its seasonal features and periodic patterns by decomposing the components in multiple decomposition modules to achieve the expected effect of secondary decomposition. After the completion of model training and prediction, the generated predicted component value k is recorded and numbered, named (predicted component k), and evaluated; if the number of the encoded number and the number of components m are not equal, the current value of k is assigned to k + 1 and re-entered into the main function to carry out the next round of the cycle; If the number of numbers encoded is equal to the number of components m, the loop ends. Then, the recorded predicted component values k are superimposed to obtain the final prediction result of the modal Autoformer.

For the training of each component, an auto-tuning learning rate method is also written inside the model to cope with the different waveforms of the components during training, taking full advantage of the decomposition and step-by-step processing.

4.2. Result Verification

To verify the effectiveness and feasibility of the method, this paper takes Zhengzhou as an example and applies the above method to obtain the prediction comparison results for Zhengzhou, and draws them as shown in Figure 8 and Figure 9.

As can be seen from the figures, the modal autoformer can better predict the trend in the PM_2.5 series. To further illustrate the prediction accuracy, this paper uses the mean absolute error (MAE), mean absolute percentage error (MAPE), coefficient of determination (R²), and root mean square error (RMSE) as indicators for verification. The results are shown in Table 3.

5. Results and Analysis

5.1. Comparison of Indicators in Each Study Area

To verify whether the method in this paper has superiority and generalization in long-term prediction, the common GRU model, informer model (which shows excellent short-term prediction) [28], the autoformer model, and empirical mode decomposition combined with the GRU model (referred to as modal GRU) were used as control groups in the comparative experiments for prediction analysis.

In this paper, the completed dataset is divided into 85% training dataset and 15% test dataset. For the model of the non-modal decomposition method and the model using modal decomposition, this paper incorporates the function of automatic parameter adjustment. The difference is that the adjustment of the model of the non-modal decomposition method occurs after each complete prediction process, while the adjustment of the model using modal decomposition occurs after the prediction of each component, and the initial parameter settings are based on the characteristics of different models to set the hyperparameters. The initial hyperparameters are set according to the characteristics of different models and fine-tuned according to the model training and validation results, and the initial hyperparameters of the five finally adopted models are shown in Table 4.

The five models of GRU, informer, autoformer, modal GRU, and modal autoformer were trained using multi-feature data from 1 January 2016 to 12 June 2020 in Zhengzhou, Luoyang, and Zhumadian, respectively. The PM_2.5 concentrations in the three cities from 12 June 2020 to 5 December 2020 were predicted and the MAE, MAPE, R², and RMSE indicators were calculated for accuracy evaluation, as shown in Figure 10. The results are shown in Table 5, Table 6 and Table 7.

In Zhengzhou, the modal autoformer has the best indicators. Among other models, the modal GRU has excellent indicators, second only to the modal autoformer; the MAPE of the autoformer is slightly higher, and the MAPE and R² performance of the informer are relatively poor. The predictive effect of the GRU is too weak, and it does not have any value in terms of application.

In Luoyang, the performance of modal autoformer is far superior to other models; the performance of autoformer is comparable to that of modal GRU, with a certain gap with modal autoformer; among the performance indicators of informer, MAE is the closest to the above models, indicating that the real error gap is not large, but the other indicators differ greatly, indicating the instability of the model in the city of Luoyang and the poor fitting effect. The predictive effect of GRU is too poor and does not have application value.

In Zhumadian, modal autoformer and modal GRU performed well, but the MAPE of modal GRU was higher than that of modal autoformer. The accuracy of autoformer and informer was similar to the above two models, and the performance of autoformer was slightly better than that of informer. The prediction effect of GRU was too low, and it had no practical value.

The above experimental results show that the ICEEMDAN modal decomposition algorithm can play an obvious role in long-term prediction, and the modal autoformer and modal decomposition GRU using this method have a great improvement in accuracy. The GRU model is not applicable in the long-term prediction of the three regions, and the accuracy of informer has a great scope for improvement. The performance of autoformer and modal GRU fluctuates in each region.

In contrast, the training effect of modal autoformer is the most stable in the three research areas, and the accuracy evaluation index results are the best in the long-term prediction of the three research areas, indicating the effectiveness and feasibility of modal autoformer in solving the problem of long-term prediction.

5.2. Detailed Contrast Experiments of Examples

Because of the quantitative comparison performance of each model mentioned above, to verify the effectiveness of the model improvement by using the method in this paper in detail, taking the city of Zhengzhou as an example, further experiments are conducted to provide a more visualized comparison of PM_2.5 prediction effects of each model, to better demonstrate the prediction ability of each model. The training and prediction effects of each model in the city of Zhengzhou are shown in Figure 11 and Figure 12.

Compared to the informer, the modal autoformer can better predict the whole period, and its overall prediction effect is better than informer. However, the informer can better predict densely distributed intervals of PM_2.5 concentration, and the prediction effect of the sparsely distributed intervals is poor. This also reflects the differences in structure and attention mechanism between the two. The modal autoformer using a decomposition structure and autocorrelation mechanism can extract the trend of time series and adjust the season, which makes it better at predicting the part with large changes affected by characteristics of time and season. The informer using a sparse attention mechanism can learn the periodic change characteristics, but its learning ability of the large fluctuations in the long-term prediction is insufficient. Compared with the autoformer, the modal autoformer introducing the modal decomposition algorithm can greatly improve the obvious over-prediction and over-fitting phenomena in the autoformer predictions, and shows flexible adaptability in the interval of different waveforms, reducing the impact of the strong time complexity and multiple influencing factors of the PM_2.5 sequence in the long-term prediction. Compared with modal GRU, the prediction effect of modal autoformer is closer to the true values, except for some local extreme values that cannot be accurately predicted, the others are consistent with the true values; the two models using modal decomposition algorithms have good adaptability to long-term prediction, but in contrast, modal GRU has a certain time lag in its predictions, and the trend in some periods may not be consistent with the true values.

To verify the differences between the modal autoformer and the modal GRU in time-series processing, the PM_2.5 interval with large fluctuations in the prediction time range is selected to analyze the prediction effect of the two, as shown in Figure 13.

As shown in Figure 13, the prediction values of modal autoformer have small deviations from the true values, and there is almost no prediction lag phenomenon, which reflects its ability to decompose and extract features of different components on the time feature level. On the contrary, due to the lack of decomposition and feature extraction ability of the modal GRU on the time feature level, in many small intervals, its prediction values have large deviations from the true values and prediction lag phenomenon.

To verify the different effects of the modal autoformer and the modal GRU on the components after performing modal decomposition on the original sequence, the prediction results of their intrinsic mode functions components are compared, and the four components with the largest differences are selected for analysis, which are IMF₁₁, IMF₁₂, IMF₁₃ and a residual component. The results are shown in Figure 10.

As can be seen from Figure 14, the modal autoformer can predict well for both the regular IMF₁₁ and IMF₁₂ components, the fluctuating IMF₁₃ component and the smooth residual component, and the predicted values are consistent with the true values, reflecting its adaptability to different feature components. In contrast, due to the lack of model adjustment adaptability to different component features, the modal GRU’s prediction trends for IMF₁₁, IMF₁₂ and IMF₁₃ components are roughly in line, but the predicted values of IMF₁₁ and IMF₁₂ are slightly lower than the true values overall, and the predicted values of IMF₁₃ show ups and downs and have a certain degree of deviation from the true values. The predicted values of the residual component are far from the true values, and the predicted trend does not match the actual change over time, resulting in a significant deviation.

In summary, the modal autoformer applies modal decomposition method to perform preliminary decomposition on the original sequence and adjusts and changes the model accordingly for different features of each component during processing. Additionally, the model’s decomposition structure is used to further decompose each component on the time series. The modal autoformer has successfully addressed the limitations of traditional models in long-term prediction problems, leading to significant improvements. This breakthrough has enabled the solution of complex long-term problems.

6. Conclusions

This paper integrates the ICEEMDAN modal decomposition algorithm into the autoformer to construct a modal autoformer model and verifies its feasibility and effectiveness with three representative cities in Henan Province (Zhengzhou, Luoyang, and Zhumadian) as examples. As for the relevant factor selection method, the multi-factor information flow method is adopted to select appropriate relevant factors for each region and verify their reliability. To verify the superiority and robustness of the method proposed in this paper, GRU, informer, autoformer, and the modal GRU models are constructed to compare and analyze the long-term prediction of PM_2.5 concentration in three cities. The results show that:

The optimized multi-step time-series decomposition method can handle the variable features of PM_2.5 long sequence well, decomposing the complex single band into a collection of multiple bands with strong regularity, strong periodicity or single change trend and smoothness, which can effectively reduce the difficulty of model learning and prediction. Both the decomposition of the sequence into multiple components and the decomposition of the sequence on the time feature level have good effects.
By integrating the modal decomposition algorithm into the autoformer model and improving the model matching, the decomposition ability of both can be fully utilized and combined, and the stability of the model can be improved.
Modal autoformer can solve the complex time pattern, large data volume, strong periodicity and seasonality problems of PM_2.5 long-term sequence prediction well, and has strong spatial adaptability. It can maintain the stability of the prediction effect in different regions.

The long-term series prediction of PM_2.5 concentration has a wide range of application value and research potential, and the rapid development of deep learning also provides objective conditions for it, which greatly improves the feasibility and accuracy of the model through the organic combination of more targeted algorithms and models. Therefore, the accuracy tends to be more and more close to the expected level. Given the strong spatial correlation of the PM_2.5 concentration series, we will enhance the model’s reliability by including additional relevant influencing factors, particularly spatial ones. We will also conduct experiments in a broader study area to verify the model’s generalizability. In the methodological part of the future study, we will optimize the methodology and research techniques, and introduce new techniques for the long-term prediction of PM_2.5 concentration series.

Author Contributions

Resources, J.L.; writing—original draft, S.Z.; writing—review and editing, X.Z., Y.Z. and P.W.; supervision, Y.W. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Major Project Fund, interdisciplinary research on the theory and method of geo-environmental analysis in the era of big data grant number (20&ZD138) and the APC was funded by (20&ZD138).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this paper can be found here: China Environmental Monitoring Station: http://www.cnemc.cn/ accessed on 1 January 2023 and NCDC (National Climatic Data Center of the United States): ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/ accessed on 1 January 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J.; Liu, H.; Lv, Z.; Zhao, R.; Deng, F.; Wang, C.; Qin, A.; Yang, X. Estimation of PM_2.5 mortality burden in China with new exposure estimation and local concentration-response function. Environ. Pollut. 2018, 243, 1710–1718. [Google Scholar] [CrossRef] [PubMed]
Fu, X.Y. Development and status of environmental air quality standards in China. Environ. Sustain. Dev. 2014, 39, 3. [Google Scholar]
Bai, S.N.; Shen, X.L. PM_2.5 prediction based on LSTM recurrent neural network. Comput. Appl. Softw. 2019, 36, 67–70+104. [Google Scholar]
Wang, S.; Xia, B.; Ren, Y. PM2.5 concentration prediction algorithm based on residual optimisation model. Comput. Simul. 2023, 40, 371–376. [Google Scholar]
Jia, J.; Chi, K.; Wu, Z. Improved particle swarm optimisation BP neural network for PM2.5 prediction. Comput. Eng. Des. 2021, 042, 3495–3501. [Google Scholar]
Wu, X.; Zhang, C.; Zhu, J. Research on PM_2.5 Concentration Prediction Based on the CE-AGA-LSTM Model. Appl. Sci. 2022, 12, 7009. [Google Scholar] [CrossRef]
Li, F.; Yang, C.; Zhao, J.; Wang, Z. A short-term prediction model for PM2.5 based on weather type clustering and LSTM. Hydropower Energy Sci. 2021, 39, 200–202. [Google Scholar]
Chen, B.; Ye, Y.; Lin, Y.; You, S.; Deng, J.; Yang, W.; Wang, K. Spatiotemporal estimation of PM2.5 using attention-based deep neural network. Natl. Remote Sens. Bull. 2022, 26, 1027–1038. [Google Scholar]
Chen, J.; Li, Y. PM_2.5 concentration prediction based on multimodal support vector regression. Environ. Eng. 2019, 6, 122–126+34. [Google Scholar] [CrossRef]
Xie, S.; Zhao, Y.; Li, G.; Zhou, A.; Huang, L. PM_2.5 concentration prediction based on WPA-WOA-BP neural network. Geod. Geodyn. 2021, 41, 5. [Google Scholar] [CrossRef]
Kashiwao, T.; Nakayama, K.; Ando, S.; Ikeda, K.; Lee, M.; Bahadori, A. A neural network-based local rainfall prediction system using meteorological data on the Internet: A case study using data from the Japan Meteorological Agency. Appl. Soft Comput. 2017, 56, 317–330. [Google Scholar] [CrossRef]
Zhao, Y.M. LSTM algorithm based on spatio-temporal correlation and PM_2.5 concentration prediction application. Comput. Appl. Softw. 2021, 38, 250–255. [Google Scholar]
Ye, R.S.; Wang, H.B. PM_2.5 concentration prediction method based on CNN-BiLSTM model. Math. Pract. Underst. 2022, 52, 8. [Google Scholar]
Huang, J. PM_2.5 Hourly Concentration Prediction Research Based on RNN-CNN Ensemble Deep Learning Model. Master’s Thesis, Zhejiang University, Hangzhou, China, 2018. [Google Scholar]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in Time Series: A Survey. arXiv 2022, arXiv:2202.07125. [Google Scholar] [CrossRef]
Liu, E.H.; Fu, Y.J.; Zhang, Z.; Li, Y.; Zhao, N.; Zhang, J. PM_2.5 concentration prediction network based on transformer attention mechanism. J. Saf. Environ. 2022, 23, 3760–3768. [Google Scholar]
Dong, H.; Sun, L.; Ouyang, F. PM_2.5 concentration prediction based on Informer. Environ. Eng. 2022, 40, 8. [Google Scholar]
Yang, C.; Nie, Q. A time series decomposition and machine learning fusion model for PM_2.5 prediction. J. Saf. Environ. 2022, 10, 1–11. [Google Scholar] [CrossRef]
Xiao, Y.J.; Wang, X.K.; Wang, J.Q.; Zhang, H.-Y. An adaptive decomposition and ensemble model for short-term air pollutant concentration forecast using ICEEMDAN-ICA. Technol. Forecast. Soc. Chang. 2023, 166, 120655. [Google Scholar] [CrossRef]
Song, Y.; Li, G.; Jia, L.; Dong, E.; Zhao, H.; Li, J.; Liu, Y.; Zhao, F.; Zhang, S. Enhanced performance for simultaneous atmospheric N₂O and CO measurements based on ICEEMDAN and GWO-LSSVM. Infrared Phys. Technol. 2023, 135, 104957. [Google Scholar] [CrossRef]
Ali, M.; Prasad, R. Significant wave height forecasting via an extreme learning machine model integrated with improved complete ensemble empirical mode decomposition. Renew. Sustain. Energy Rev. 2019, 104, 281–295. [Google Scholar] [CrossRef]
Liang, X.; Ye, Z.; Yang, M. Research on air passenger demand forecasting based on quadratic decomposition strategy and fuzzy time series model. China Manag. Sci. 2023, 28, 10. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Colominas, M.A.; Schlotthauer, G.; Torres, M.E. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomed. Signal Process. Control. 2014, 14, 19–29. [Google Scholar] [CrossRef]
Liang, X. The Liang-Kleeman Information Flow: Theory and Applications. Entropy 2013, 15, 327–360. [Google Scholar] [CrossRef]
Liang, X.S. Normalized Multivariate Time Series Causality Analysis and Causal Graph Reconstruction. Entropy 2021, 23, 679. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]

Figure 1. The structure of autoformer.

Figure 2. The PM_2.5 concentration distribution of Zhengzhou from 2016 to 2020.

Figure 3. The monthly average PM_2.5 concentration change curve of Zhengzhou from 2016 to 2020.

Figure 4. Time correlation curves of atmospheric pollutant factors and PM_2.5 Concentration.

Figure 5. Time correlation curves of meteorological factors and PM_2.5 concentration.

Figure 6. Modal autoformer workflow diagram.

Figure 7. Search for the best parameters using the Grey Wolf optimization algorithm.

Figure 8. Modal autoformer example validation effect.

Figure 9. Modal autoformer scatter density map in Zhengzhou.

Figure 10. Model index comparison bar chart.

Figure 11. Prediction effect of each model in Zhengzhou.

Figure 12. Scatter density map of each model in Zhengzhou.

Figure 13. Comparison of partial time-series prediction between modal GRU and modal autoformer.

Figure 14. Comparison of intrinsic mode function components between modal GRU and modal autoformer.

Table 1. Time correlation value of atmospheric pollutant factors and PM_2.5.

	PM₁₀	SO₂	CO	NO₂	O₃
Value	−0.0442 **	−0.0058 **	−0.0105 **	0.0344 **	0.0003 **

Note: ** indicates that the result passed the 99% significance test.

Table 2. Representative values of temporal correlation between meteorological factors and PM_2.5 concentration.

	Temperature	Dew Point Temperature	Air Pressure	Wind Direction	Wind Speed	Cloudage
Correlation	0.0041 **	0.0106 **	0.0007 **	0.0001 **	0.0005 **	0.0022 **

Note: ** indicates that the result passed the 99% significance test.

Table 3. Modal autoformer instance validation results.

Model	R²	MAPE/(μg/m³)	MAE/(μg/m³)	RMSE/(μg/m³)
Modal Autoformer	0.987	7.620	2.368	3.140

Among them, R² exceeds 0.98, MAE and RMSE are all less than 5, which is at an excellent level; MAPE is less than 15, which is also at a high level. This indicates that the method proposed in this paper has a certain degree of feasibility.

Table 4. Hyperparameters of each model.

Hyperparameters	GRU	Modal GRU	Informer	Autoformer	Modal Autoformer
Learning rate	0.0004	0.0004	0.0004	0.0004	0.0004
Activate function	gelu	gelu	gelu	gelu	gelu
Loss function	MSE	MSE	MSE	MSE	MSE
Batch size	64	64	64	64	64
Epochs	30	30	30	30	30

Table 5. Comparison of evaluation indicators for different models in Zhengzhou City.

Model	R²/(μg/m³)	MAPE/(μg/m³)	MAE/(μg/m³)	RMSE/(μg/m³)
GRU	0.762	60.107	14.962	20.028
Modal GRU	0.943	16.867	4.567	6.024
Autoformer	0.912	23.020	6.961	9.377
Informer	0.828	36.325	9.459	13.101
Modal Autoformer	0.987	7.620	2.368	3.140

Table 6. Comparison of evaluation indexes of each model in Luoyang.

Model	R²	MAPE/(μg/m³)	MAE/(μg/m³)	RMSE/(μg/m³)
GRU	0.722	56.707	15.317	19.506
Modal GRU	0.931	16.026	7.623	9.970
Autoformer	0.925	19.675	6.642	7.856
Informer	0.892	29.792	7.875	10.740
Modal Autoformer	0.966	9.403	3.537	5.816

Table 7. Comparison of evaluation indexes of each model in Zhumadian.

Model	R²	MAPE/(μg/m³)	MAE/(μg/m³)	RMSE/(μg/m³)
GRU	0.752	39.099	11.716	15.333
Modal GRU	0.944	14.956	5.185	6.185
Autoformer	0.909	19.363	5.529	7.383
Informer	0.896	22.619	6.089	8.22
Modal Autoformer	0.976	7.602	3.809	4.240

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, S.; Zhang, X.; Liu, J.; Zhang, Y.; Wei, P.; Wang, Y.; Zhang, J. Long-Term Prediction of Particulate Matter_2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm. Atmosphere 2024, 15, 4. https://doi.org/10.3390/atmos15010004

AMA Style

Zhou S, Zhang X, Liu J, Zhang Y, Wei P, Wang Y, Zhang J. Long-Term Prediction of Particulate Matter_2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm. Atmosphere. 2024; 15(1):4. https://doi.org/10.3390/atmos15010004

Chicago/Turabian Style

Zhou, Shiyu, Xinjia Zhang, Jianzhong Liu, Yinbao Zhang, Pengzhi Wei, Yalin Wang, and Jingwei Zhang. 2024. "Long-Term Prediction of Particulate Matter_2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm" Atmosphere 15, no. 1: 4. https://doi.org/10.3390/atmos15010004

APA Style

Zhou, S., Zhang, X., Liu, J., Zhang, Y., Wei, P., Wang, Y., & Zhang, J. (2024). Long-Term Prediction of Particulate Matter_2.5 Concentration with Modal Autoformer Based on Fusion Modal Decomposition Algorithm. Atmosphere, 15(1), 4. https://doi.org/10.3390/atmos15010004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu