An Intelligent Time Series Model Based on Hybrid Methodology for Forecasting Concentrations of Significant Air Pollutants
Abstract
:1. Introduction
- (1)
- Applied five variable selection methods to filter out the important variables for the collected datasets and used the integrated variable selection method (IVSM) to find the key variables.
- (2)
- Used four rule-based classifiers to classify air quality and generate classification rules, in which we found the top three pollutants (PM2.5, PM10, and O3) from the generated rules.
- (3)
- Deleted collinear variables and added lag periods of variables by the autoregressive distributed lag (ARDL) test.
- (4)
- Forecast concentrations of PM2.5, PM10, and O3 by using four intelligent time-series forecast methods based on IVSM-selected variables, ARDL-selected variables, and full variables.
- (5)
- Gave appropriate explanations to provide the results to stakeholders for reference and enact countermeasures for dynamic environmental factors.
2. Related Works
2.1. Air Pollution
2.2. Variable Selection
- (1)
- Correlation-based feature selection (CFS)
- (2)
- Correlation
- (3)
- Information gain (IG)
- (4)
- Gain ratio (GR)
- (5)
- ReliefF
2.3. Autoregressive Distributed Lag (ARDL) Model
2.4. Machine Learning Techniques
- (1)
- Decision tree (DT)
- (2)
- Random tree (RT)
- (3)
- Random forest (RF)
- (4)
- Extra tree (ET)
- (5)
- Support vector regression (SVR)
- (6)
- Multilayer perceptron regression (MLPR)
3. Proposed Method
3.1. Proposed Computational Procedure
- Step 1. Data collection
- Step 2. Preprocessing
- (1)
- Integrate pollutant and weather data into a single dataset.
- (2)
- Impute missing data.
- (3)
- Calculate AQI and set AQI classes.
- Step 3. Select variables.
- Step 4. Carry out classification and rule generation.
- Step 5. Perform ARDL test.
- Step 6. Construct the forecasting model.
3.2. Evaluation Metrics
- Accuracy: Accuracy is the most commonly used metric for classification performance [59] because it is easy to compute, has less complexity, and is easy for us to understand. The computational equation for accuracy is as follows:
- 2.
- AUC: AUC is the area under the receiver operating curve. From [60], classification performance is determined by the AUC. An excellent classifier has an AUC near 1.0.
- 3.
- Precision: This criterion is also called positive predictive value and is calculated as
- 4.
- Recall (sensitivity): This measures the proportion of correctly identified positives and is also called the true positive rate; it is calculated by Equation (6):
- 5.
- F1-score: This metric is the weighted average of precision and recall and is calculated by Equation (7). F1 is usually more useful than accuracy, especially in class imbalance data.
4. Experiment and Comparison
4.1. Experimental Environment and Parameter Setting
4.2. Finding Significant Pollutants
- (A)
- Select key variables by IVSM.
- (B)
- Classify air quality.
- (C)
- Generate rules.
- (D)
- Determine the significant pollutants.
4.3. Forecast and Evaluation
- (A)
- Collinearity diagnosis
- (B)
- ARDL test of variable lag periods
- (C)
- Forecasting of concentrations of main pollutants
4.4. Discussion
- (A)
- Finding air pollutants
- (1)
- Traffic air pollutants: Rapid economic development and urbanization have led to a rapid increase in vehicle ownership and usage, which has caused traffic-related air pollution problems. Vehicle emissions greatly impact CO, HC, THC, NOx, and PM, and these pollutants pose a serious threat to the environment and people’s health [64,65]. From the traffic monitoring station (Fengshan), as shown in Table 4, we found that the pollutants O3, PM2.5, PM10, CO, HC, THC, SO2, NO2, and NOx impact air quality, and these pollutants were covered in [64,65].
- (2)
- Industrial air pollutants: The mean AQI includes O3, PM10, PM2.5, NO2, SO2, and CO concentrations (Tan et al., 2021). In addition to the six pollutants used as air quality indicators, industrial pollutants include other pollutants, and NOx, SO2, PMs, CO, and CO2 are the most commonly released substances [66]. Based on the industrial monitoring station (Table 4), we found that PM2.5, PM10, O3, SO2, CO, NOx, and NO2 impact air quality, and these pollutants were listed in [19,66]. From the PM10 forecast of the Mailiao dataset (Table 8), AR(p) = 11.55 indicates the best performance, showing that the industrial air pollutant PM10 has nothing to do with climatic factors and related pollutants but is only related to its own lag period. Further, Figure 3 shows that PM10 is dependent on the day (weekday or weekend). This experimental result tells us that PM10 is caused by the operation of factories in industrial areas.
- (3)
- Differences between traffic and industrial pollutants: As with the traffic and industrial pollutants, the main difference in the current study is HC and THC, because they are produced by the incomplete combustion of substances from vehicles (mobile pollutants). That is, traffic monitoring stations have emissions of two more pollutants (HC and THC) than industrial monitoring stations.
- (B)
- Interaction of pollutants and related variables
- (1)
- From the patterns of PM2.5 and PM10 versus six key variables in the traffic dataset, we note the following: (a) there are lower PM2.5 and PM10 levels at 00:00–03:00, and the air quality is unhealthy at other times because Fengshan is a nightlife district; (b) there are lower PM2.5 levels in June–July, and lower PM10 levels in May–July (the lower PM2.5 level occurs in the third season, and the lower PM10 level occurs in the second season); and (c) the weather variables show that PM2.5 and PM10 are negatively correlated with wind speed, RH, and air temperature, indicating that these three variables can reduce the concentration of air pollutants.
- (2)
- Based on the patterns of PM2.5 and PM10 versus the six key variables in the industrial dataset, we find the following: (a) peak PM2.5 and PM10 levels occur during work hours (08:00–17:00) because Mailiao is a high-pollutant district with a naphtha cracking plant; (b) the lowest PM2.5 levels occur in July, the lowest PM10 levels occur in June (the lower PM2.5 and PM10 levels occur in the second season); (c) the weather variables also show that PM2.5 and PM10 levels are negatively correlated with wind speed, RH, and air temperature, indicating that these three variables can reduce the concentrations of air pollutants.
- (3)
- Every year in Taiwan, the northeast monsoon carries dust and haze from China, reducing the influence of the Pacific subtropical high pressure and vertical diffusion capacity of the atmosphere [68]. The severely polluted seasons are winter and spring, and high temperatures can reduce the concentration of air pollutants in the summer; these results are the same as those in [23,24]. The vertical convection in the atmosphere is enhanced, and the vertical diffusion capacity of the atmosphere is better, which reduces the amount of pollutants. Therefore, better air quality in Taiwan occurs in the summer season, which is consistent with pattern analysis points (1) and (2).
- (C)
- Forecast improvement
5. Conclusions
- (1)
- Due to previous research [25,26,27] did not integrate the selected features of different feature selection methods to obtain the key features; hence, we synthesized the key features using the proposed integrated variable selection method. For researchers, we can propose a novel method to improve the integrated variable selection methods.
- (2)
- The generated classification rules are based on DT with the best results, which shows that the top three pollutants (PM2.5, PM10, and O3) are determined. We suggest applying different algorithms to find the important air pollutants in future work.
- (3)
- We forecast the top three pollutants (PM2.5, PM10, and O3) based on IVSM selecting variables, deleting collinear variables, and ARDL test obtaining lag periods of dependent and independent variables, and the three screening variables methods can improve forecast performance. Therefore, a combined feature selection method is an important process for air quality prediction.
- (4)
- The advantage of ARDL-selected variables is that ARDL only runs one time for all variables, and the lag periods of all variables can be found, but ACF and PACF need to test lag periods 31 times because this study has 31 variables.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- International Energy Agency (IEA). Global Energy & CO2 Status Report, The LATEST Trends in Energy and Emissions in 2018, Flagship Report. March 2019. Available online: https://www.iea.org/reports/global-energy-co2-status-report-2019/emissions (accessed on 19 February 2021).
- TAQI. Taiwan Air Quality Annual Report. 2018. Available online: https://www.epa.gov.tw/DisplayFile.aspx?FileID=9FDF33456FA1DB1F (accessed on 24 February 2021).
- Taiwan PM2.5. Main Pollution Sources of PM2.5 in Taiwan, Reported on 14 September 2018. Available online: https://www.fpg.com.tw/tw/issue/1/115 (accessed on 19 February 2021).
- Leeuwen, F.X.R.V. A European perspective on hazardous air pollution. Toxicology 2002, 181, 355–359. [Google Scholar] [CrossRef]
- Nagel, G.; Stafoggia, M.; Pedersen, M.; Andersen, Z.J.; Galassi, C.; Munkenast, J.; Jaensch, A.; Sommar, J.; Forsberg, B.; Olsson, D.; et al. Air pollution and incidence of cancers of the stomach and the upper aerodigestive tract in the European Study of Cohorts for Air Pollution Effects (ESCAPE). Int. J. Cancer 2018, 143, 1632–1643. [Google Scholar] [CrossRef]
- WHO. Fact Sheet—Ambient Air Quality and Health. Updated May 2018. Available online: https://www.who.int/health-topics/air-pollution#tab=tab_1 (accessed on 19 February 2021).
- Hoek, G.; Krishnan, R.M.; Beelen, R.; Peters, A.; Ostro, B.; Brunekreef, B.; Kaufman, J.D. Long-term air pollution exposure and cardio- respiratory mortality: A review. Environ. Health 2013, 12, 43. [Google Scholar] [CrossRef] [Green Version]
- Brook, R.D.; Newby, D.E.; Rajagopalan, S. Air Pollution and Cardiometabolic Disease: An Update and Call for Clinical Trials. Am. J. Hypertens. 2017, 31, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Global Burden of Disease Study Risk Factors Collaborators. Global, regional, and national comparative risk assessment of 84 behavioural, environmental and occupational, and metabolic risks or clusters of risks for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018, 392, 1923–1994. [Google Scholar] [CrossRef] [Green Version]
- WHO. 2018. Available online: https://www.who.int/news/item/29-10-2018-more-than-90-of-the-worlds-children-breathe-toxic-air-every-day (accessed on 19 February 2021).
- Núñez-Alonso, D.; Pérez-Arribas, L.V.; Manzoor, S.; Caceres, J. Statistical Tools for Air Pollution Assessment: Multivariate and Spatial Analysis Studies in the Madrid Region. J. Anal. Methods Chem. 2019, 2019, 9753927. [Google Scholar] [CrossRef]
- Šimić, I.; Lovrić, M.; Godec, R.; Kröll, M.; Bešlić, I. Applying machine learning methods to better understand, model and estimate mass concentrations of traffic-related pollutants at a typical street canyon. Environ. Pollut. 2020, 263, 114587. [Google Scholar] [CrossRef]
- Akbal, Y.; Ünlü, K. A deep learning approach to model daily particular matter of Ankara: Key features and forecasting. Int. J. Environ. Sci. Technol. 2021, 19, 5911–5927. [Google Scholar] [CrossRef]
- Remeseiro, B.; Bolon-Canedo, V. A review of feature selection methods in medical applications. Comput. Biol. Med. 2019, 112, 103375. [Google Scholar] [CrossRef]
- Philinis, C.; Seinfeld, J.H. Development and evaluation of an Eulerian photochemical gas-aerosol model. Atmos. Environ. 1988, 22, 1985–2001. [Google Scholar] [CrossRef]
- Grace, R.K.; Manju, S. A comprehensive review of wireless sensor networks based air pollution monitoring systems. Wirel. Pers. Commun. 2019, 108, 2499–2515. [Google Scholar] [CrossRef]
- EPA. Report on the Environment, Outdoor Air Quality. 2019. Available online: https://www.epa.gov/report-environment/outdoor-air-quality (accessed on 24 February 2021).
- Heidarinejad, Z.; Kavosi, A.; Mousapour, H.; Daryabor, M.R.; Radfard, M.; Abdolshahi, A. Data on evaluation of AQI for different season in Kerman, Iran, 2015. Data Brief 2018, 20, 1917–1923. [Google Scholar] [CrossRef] [PubMed]
- Tan, X.; Han, L.; Zhang, X.; Zhou, W.; Li, W.; Qian, Y. A review of current air quality indexes and improvements under the multi-contaminant air pollution exposure. J. Environ. Manag. 2021, 279, 111681. [Google Scholar] [CrossRef] [PubMed]
- TEPA. 2021. Available online: https://airtw.epa.gov.tw/CHT/TaskMonitoring/Traffic/TrafficIntro.aspx (accessed on 19 February 2021).
- Yao, X.; Chan, C.K.; Fang, M.; Cadle, S.; Chan, T.; Mulawa, P.; He, K.; Ye, B. The water-soluble ionic composition of PM2.5 in Shanghai and Beijing, China. Atmos. Environ. 2002, 36, 4223–4234. [Google Scholar] [CrossRef]
- Glavas, S.D.; Nikolakis, P.; Ambatzoglou, D.; Mihalopoulos, N. Factors affecting the seasonal variation of mass and ionic composition of PM2.5 at a central Mediterranean coastal site. Atmos. Environ. 2008, 42, 5365–5373. [Google Scholar] [CrossRef]
- Arnfield, A.J. Two decades of urban climate research: A review of turbulence, exchanges of energy and water, and the urban heat island. Int. J. Climatol. 2003, 23, 1–26. [Google Scholar] [CrossRef]
- Fallmann, J.; Forkel, R.; Emeis, S. Secondary effects of urban heat island mitigation measures on air quality. Atmos. Environ. 2016, 125, 199–211. [Google Scholar] [CrossRef] [Green Version]
- Sethi, J.K.; Mittal, M. A new feature selection method based on machine learning technique for air quality dataset. J. Stat. Manag. Syst. 2019, 22, 697–705. [Google Scholar] [CrossRef]
- Chen, B.; Zhu, G.; Ji, M.; Yu, Y.; Zhao, J.; Liu, W. Air Quality Prediction Based on Kohonen Clustering and ReliefF Feature Selection. Comput. Mater. Contin. 2020, 64, 1039–1049. [Google Scholar] [CrossRef]
- Kumar, K.; Pande, B.P. Air pollution prediction with machine learning: A case study of Indian cities. Int. J. Environ. Sci. Technol. 2022. [Google Scholar] [CrossRef]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Hall, M.A. Correlation Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1999. [Google Scholar]
- Ghiselli, E.E. Theory of Psychological Measurement; McGraw Hill: New York, NY, USA, 1964. [Google Scholar]
- Rodriguez-Lujan, I.; Huerta, R.; Elkan, C.; Cruz, C.S. Quadratic Programming Feature Selection. J. Mach. Learn. Res. 2010, 11, 1491–1516. [Google Scholar]
- Lai, C.M.; Yeh, W.C.; Chang, C.Y. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 2016, 218, 331–338. [Google Scholar] [CrossRef]
- Jadhav, S.; He, H.; Jenkins, K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl. Soft Comput. 2018, 69, 541–553. [Google Scholar] [CrossRef] [Green Version]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann Publishers: Burlington, ON, Canada, 2011. [Google Scholar]
- Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar] [CrossRef] [Green Version]
- Kira, K.; Rendell, L.A. A Practical Approach to Feature Selection. Mach. Learn. Proc. 1992, 1992, 249–256. [Google Scholar] [CrossRef]
- Robnik-Šikonja, M.; Kononenko, I. Theoretical and Empirical Analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef] [Green Version]
- Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2015. [Google Scholar]
- Judge, G.G.; Griffiths, W.E.; Hill, R.C.; Lütkepohl, H.; Lee, T.-C. The Theory and Practice of Econometrics; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
- Pesaran, H.; Shin, Y. An Autoregressive Distributed Lag Modeling Approach to Co-integration Analysis. In Econometrics and Economic Theory in the 20st Century: The Ragnar Frisch Centennial Symposium; Strom, S., Ed.; Cambridge University Press: Cambridge, UK, 1995; Volume 31. [Google Scholar]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers: San Francisco, CA, USA, 1993. [Google Scholar]
- Guggari, S.; Kadappa, V.; Umadevi, V. Non-sequential partitioning approaches to decision tree classifier. Future Comput. Inform. J. 2018, 3, 275–285. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Mishra, A.K.; Ratha, B.K. Study of Random Tree and Random Forest Data Mining Algorithms for Microarray Data Analysis. Int. J. Adv. Electr. Comput. Eng. 2016, 3, 5–7. [Google Scholar]
- Yarveicy, H.; Ghiasi, M.M. Modeling of gas hydrate phase equilibria: Extremely randomized trees and LSSVM approaches. J. Mol. Liq. 2017, 243, 533–541. [Google Scholar] [CrossRef]
- Pinto, A.; Pereira, S.; Rasteiro, D.; Silva, C.A. Hierarchical brain tumour segmentation using extremely randomized trees. Pattern Recognit. 2018, 82, 105–117. [Google Scholar] [CrossRef]
- Markuš, N.; Frljak, M.; Pandžić, I.S.; Ahlberg, J.; Forchheimer, R. Eye pupil localization with an ensemble of randomized trees. Pattern Recognit. 2014, 47, 578–587. [Google Scholar] [CrossRef] [Green Version]
- Shipway, N.J.; Barden, T.J.; Huthwaite, P.; Lowe, M.J.S. Automated defect detection for Fluorescent Penetrant Inspection using Random Forest. NDT E Int. 2018, 101, 113–123. [Google Scholar] [CrossRef]
- Khalilia, M.; Chakraborty, S.; Popescu, M. Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 2011, 11, 51. [Google Scholar] [CrossRef] [Green Version]
- Hu, Z.; Wang, Y.; Zhang, X.; Zhang, M.; Yang, Y.; Liu, X.; Zheng, H.; Liang, D. Super-resolution of PET image based on dictionary learning and random forests. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2019, 927, 320–329. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
- John, V.; Liu, Z.; Guo, C.; Mita, S.; Kidono, K. Real-time Lane Estimation Using Deep Features and Extra Trees Regression. Image Video Technol. 2016, 9431, 721–733. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Am. J. Psychol. 1963, 76, 705–707. [Google Scholar]
- Lee, S.-J.; Tseng, C.-H.; Lin, G.T.-R.; Yang, Y.; Yang, P.; Muhammad, K.; Pandey, H.M. A dimension-reduction based multilayer perception method for supporting the medical decision making. Pattern Recognit. Lett. 2020, 131, 15–22. [Google Scholar] [CrossRef]
- Fan, F.M.; Collischonn, W.; Meller, A.; Botelho, L.C.M. Ensemble streamflow forecasting experiments in a tropical basin: The São Francisco river case study. J. Hydrol. 2014, 519, 2906–2919. [Google Scholar] [CrossRef]
- Ballabio, D.; Grisoni, F.; Todeschini, R. Multivariate comparison of classification performance measures. Chemom. Intell. Lab. Syst. 2018, 174, 33–44. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar] [CrossRef]
- Melo, F. Area under the ROC Curve. In Encyclopedia of Systems Biology; Dubitzky, W., Wolkenhauer, O., Cho, K.-H., Yokota, H., Eds.; Springer: New York, NY, USA, 2013; pp. 38–39. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Kutner, M.H.; Nachtsheim, C.J.; Neter, J. Applied Linear Regression Models, 4th ed.; McGraw-Hill Irwin: New York, NY, USA, 2004. [Google Scholar]
- Kripfganz, S.; Schneider, D.C. ARDL: Estimating Autoregressive Distributed Lag and Equilibrium Correction Models. In Proceedings of the 2018 London Stata Conference, London, UK, 6–7 September 2018. [Google Scholar]
- Oduro, S.D.; Metia, S.; Duc, H.; Hong, G.; Ha, Q.P. Multivariate adaptive regression splines models for vehicular emission prediction. Vis. Eng. 2015, 3, 13. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Long, R.; Chen, H.; Geng, J. A review of China’s road traffic carbon emissions. J. Clean. Prod. 2019, 207, 569–581. [Google Scholar] [CrossRef]
- Eslami, S.; Sekhavatjou, M.S. Introducing an application method for industries air pollutants emission control planning by preparing environmental flow diagram maps. J. Clean. Prod. 2018, 178, 768–775. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Y.; Lu, J. Exploring the relationship between air pollution and meteorological conditions in China under environmental governance. Sci. Rep. 2020, 10, 14518. [Google Scholar] [CrossRef]
- Griffith, S.M.; Huang, W.S.; Lin, C.C.; Chen, Y.C.; Chang, K.-E.; Lin, T.H.; Wang, S.H.; Lin, N.H. Long-range air pollution transport in East Asia during the first week of the COVID-19 lockdown in China. Sci. Total Environ. 2020, 741, 140214. [Google Scholar] [CrossRef]
Actual Situation | |||
---|---|---|---|
True | False | ||
Prediction | Positive | True positive (tp) | False positive (fp) |
Negative | False negative (fn) | True negative (tn) |
Dataset | Class A: Good | Class B: Moderate | Class C: Unhealthy for Sensitive Groups | Class D: Unhealthy |
---|---|---|---|---|
Fengshan (traffic) | 1260 | 5023 | 1982 | 495 |
Mailiao (industry) | 2450 | 3584 | 2182 | 544 |
Algorithm | Parameter | Reference |
---|---|---|
RF | bagsize = 100; irerations = 100 | [44] |
RT | minimal variance proportion = 0.001 | [44] |
ET | irerations = 10 | [52] |
DT | confindence factor: 0.25 | [42] |
SVR | kernel function: RBF; epsilon = 0.001; gamma = 1/n (n = #variables); C = 1. | [61] |
MLPR | activation function: sigmoid; loss function: square error; learning rate = 0.001; hidden layer sizes = 200. | [56] |
Dataset | Key Variable | ||||||
---|---|---|---|---|---|---|---|
Fengshan Traffic Monitoring Station | PM2.5 | PM10 | TEMP | RH | O38 h | O3ontime | SO224 h-Ave |
O3 | CH4 | WS_HR | THC | month | PM2.5-ave | NO2ontime | |
NO2 | NOx | CO-8 h-ave | day | highPul | PM10-ave | season | |
Mailiao Industrial Monitoring Station | PM2.5 | PM10 | TEMP | RH | O38 h | O3ontime | SO224 h-Ave |
O3 | season | WS_HR | CO | month | PM2.5-ave | NO2ontime | |
NO2 | NOx | CO-8 h-ave | day | highPul | PM10-ave |
Dataset | Metrics | DT | RT | RF | ET |
---|---|---|---|---|---|
Fengshan (Full var.) | accuracy | 99.98 (0.06) | 96.56 (1.67) | 99.82 (0.15) | 99.65 (0.23) |
AUC | 1.00 (0.00) | 0.96 (0.02) | 1.00 (0.00) | 1.00 (0.00) | |
recall | 1.00 (0.00) | 0.94 (0.03) | 1.00 (0.00) | 1.00 (0.00) | |
precision | 1.00 (0.00) | 0.94 (0.03) | 0.99 (0.01) | 0.99 (0.01) | |
F1 | 1.00 (0.00) | 0.94 (0.03) | 1.00 (0.00) | 0.99 (0.00) | |
Fengshan (Selected var.) | accuracy | 99.98 (0.06) | 98.43 (1.03) | 99.93 (0.09) | 99.87 (0.13) |
AUC | 1.00 (0.00) | 0.98 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
recall | 1.00 (0.00) | 0.97 (0.02) | 1.00 (0.00) | 1.00 (0.00) | |
precision | 1.00 (0.00) | 0.97 (0.02) | 1.00 (0.00) | 1.00 (0.00) | |
F1 | 1.00 (0.00) | 0.97 (0.02) | 1.00 (0.00) | 1.00 (0.00) | |
Mailiao (Full var.) | accuracy | 99.92 (0.10) | 96.42 (1.66) | 99.68 (0.19) | 99.49 (0.26) |
AUC | 1.00 (0.00) | 0.98 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
recall | 1.00 (0.00) | 0.98 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
precision | 1.00 (0.00) | 0.98 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
F1 | 1.00 (0.00) | 0.98 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
Mailiao (Selected var.) | accuracy | 99.92 (0.10) | 98.41 (0.96) | 99.81 (0.14) | 99.76 (0.17) |
AUC | 1.00 (0.00) | 0.99 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
recall | 1.00 (0.00) | 0.99 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
precision | 1.00 (0.00) | 0.99 (0.01) | 1.00 (0.00) | 1.00 (0.00) | |
F1 | 1.00 (0.00) | 0.99 (0.01) | 1.00 (0.00) | 1.00 (0.00) |
Fengshan Dataset | Mailiao Dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DV: O3 | DV: PM2.5 | DV: PM10 | DV: O3 | DV: PM2.5 | DV: PM10 | ||||||
IV | VIF | IV | VIF | IV | VIF | IV | VIF | IV | VIF | IV | VIF |
month | 1.948 | month | 1.948 | month | 1.944 | month | 2.267 | month | 2.264 | month | 2.257 |
day | 1.042 | day | 1.042 | day | 1.041 | day | 1.037 | day | 1.037 | day | 1.037 |
TEMP | 2.771 | TEMP | 2.770 | TEMP | 2.736 | TEMP | 2.173 | TEMP | 2.173 | TEMP | 2.159 |
CH4 | 6.852 | CH4 | 6.841 | CH4 | 6.819 | CO | 4.976 | CO | 4.457 | CO | 4.975 |
NOx | 11.436 | NOx | 11.429 | NOx | 11.424 | NOx | 11.678 | NOx | 11.597 | NOx | 11.599 |
PM10 | 8.757 | PM10 | 7.002 | PM2.5 | 6.952 | PM10 | 4.107 | PM10 | 3.473 | PM2.5 | 6.272 |
PM2.5 | 8.693 | RH | 1.922 | RH | 1.927 | PM2.5 | 7.417 | RH | 1.951 | RH | 1.969 |
RH | 1.927 | THC | 10.335 | THC | 10.345 | RH | 1.971 | WS_HR | 2.105 | WS_HR | 2.084 |
THC | 10.356 | WS_HR | 1.547 | WS_HR | 1.549 | WS_HR | 2.111 | O38 h | 3.290 | O38 h | 3.254 |
WS_HR | 1.549 | O38 h | 2.579 | O38 h | 2.632 | O38 h | 3.307 | O3ontime | 3.783 | O3ontime | 3.687 |
O38 h | 2.643 | O3ontime | 2.954 | O3ontime | 3.009 | O3ontime | 3.790 | PM2.58 h | 2.892 | PM2.58 h | 7.406 |
O3ontime | 3.019 | PM2.58 h | 5.437 | PM2.58 h | 10.809 | PM2.58 h | 8.337 | PM108 h | 5.063 | PM108 h | 2.716 |
PM2.58 h | 12.486 | PM108 h | 11.611 | PM108 h | 5.856 | PM108 h | 5.363 | CO_8 h | 4.076 | CO_8 h | 4.338 |
PM108 h | 13.224 | CO_8 h | 2.998 | CO_8 h | 3.028 | CO_8 h | 4.349 | SO224 h | 1.496 | SO224 h | 1.499 |
CO_8 h | 3.028 | SO224 h | 1.386 | SO224 h | 1.381 | SO224 h | 1.500 | NO2ontime | 12.033 | NO2ontime | 12.080 |
SO224 h | 1.388 | NO2ontime | 13.302 | NO2ontime | 13.338 | NO2ontime | 12.170 | highPul | 1.621 | highPul | 1.620 |
NO2ontime | 13.382 | highPul | 2.448 | highPul | 2.445 | highPul | 1.621 | season | 1.699 | season | 1.700 |
highPul | 2.448 | season | 1.761 | season | 1.763 | season | 1.700 | ||||
season | 1.763 |
DV | Independent Variables and Lag Periods | |||
---|---|---|---|---|
Fengshan Dataset | ||||
O3 | O3 (1, 2, 3, 4, 5) | TEMP (0, 1, 3, 4, 5) | NO2 (0, 1, 2, 5) | NOx (0, 1) |
PM10 (0) | CH4 (1) | THC (0) | WS_HR (0) | |
O3-8 h (0, 1, 2, 3, 4) | CO-8 h (0, 1) | highPul (2, 3) | ||
PM2.5 | PM2.5 (1, 2, 3, 4, 5) | TEMP (0) | O3 (2, 4) | THC (2) |
PM10 (0, 1, 2, 3, 4, 5) | O3-8 h (2, 3) | PM2.5-8 h (0, 1, 2, 3, 4, 5) | CO-8 h (1, 2, 3) | |
PM10-8 h (0, 1, 2, 5) | ||||
PM10 | PM10 (1, 2, 3, 4, 5) | TEMP (3) | NO2 (0) | O3 (0) |
PM2.5 (0, 1, 2, 3, 4, 5) | RH (0) | THC (3) | THC (4) | |
WS_HR (2, 5) | highPul (2, 3) | PM2.5 (0, 1, 2, 3, 5) | PM10-8 h (0, 1, 2, 3, 4, 5) | |
CO-8 h (3, 4, 5) | season (4) | |||
Mailiao dataset | ||||
O3 | O3 (3, 4) | TEMP (1) | CO (1) | NO2 (0, 1) |
O3-8 h (0, 1) | O3-ontime (0, 3, 4) | NOx (0) | CO-8 h (5) | |
PM2.5 | PM2.5 (1, 2, 3, 4, 5) | TEMP (1, 5) | CO (1) | NOx (3) |
PM10 (0) | WS_HR (0) | RH (0, 1, 2) | O3-8 h (0, 1, 4, 5) | |
PM2.5-8 h (0, 1, 2, 4, 5) | PM10-8 h (0, 1, 2, 4, 5) | |||
PM10 | PM10 (1, 2, 3, 4, 5) | NOx (3) | PM2.5 (0, 1, 2, 3, 4, 5) | WS_HR (0) |
RH (0, 1) | O3-8 h (2) |
Dataset | Metric | Target | MLPR | RF | ET | SVR | AR(p) |
---|---|---|---|---|---|---|---|
Fengshan (ARDL) | RMSE | PM2.5 | 10.68 | 7.07 | 7.24 | 1.52 | 5.60 |
PM10 | 6.76 | 12.18 | 12.9 | 2.25 | 9.10 | ||
O3 | 11.14 | 4.54 | 5.09 | 2.29 | 6.90 | ||
Fengshan (IVSM) | PM2.5 | 39.94 | 9.04 | 9.56 | 8.61 | 5.60 | |
PM10 | 50.08 | 17.53 | 18.45 | 19.79 | 9.10 | ||
O3 | 0.42 | 1.58 | 1.90 | 0.10 | 6.90 | ||
Fengshan (Full) | PM2.5 | 47.8 | 9.61 | 10.16 | 8.46 | 5.60 | |
PM10 | 65.85 | 18.48 | 18.68 | 18.44 | 9.10 | ||
O3 | 0.37 | 2.30 | 2.98 | 0.06 | 6.90 | ||
Mailiao (ARDL) | RMSE | PM2.5 | 6.16 | 3.73 | 3.88 | 0.91 | 4.88 |
PM10 | 70.21 | 22.4 | 25.23 | 21.41 | 22.38 | ||
O3 | 0.61 | 0.65 | 1.22 | 0.09 | 12.03 | ||
Mailiao (IVSM) | PM2.5 | 43.73 | 5.16 | 5.61 | 5.09 | 4.88 | |
PM10 | 99.82 | 23.19 | 30.43 | 27.03 | 22.38 | ||
O3 | 0.94 | 0.83 | 1.24 | 0.08 | 12.03 | ||
Mailiao (Full) | PM2.5 | 44.2 | 5.07 | 5.88 | 5.04 | 4.88 | |
PM10 | 88.17 | 22.82 | 25.63 | 26.75 | 22.38 | ||
O3 | 0.75 | 1.54 | 1.78 | 0.12 | 12.03 | ||
Fengshan (ARDL) | MAE | PM2.5 | 2.03 | 5.06 | 5.3 | 1.14 | 4.23 |
PM10 | 2.67 | 8.06 | 8.78 | 1.65 | 6.32 | ||
O3 | 1.55 | 3.4 | 3.74 | 1.72 | 4.98 | ||
Fengshan (IVSM) | PM2.5 | 6.67 | 39.13 | 42.09 | 38.71 | 4.23 | |
PM10 | 15.66 | 36.24 | 38.38 | 39.14 | 6.32 | ||
O3 | 0.06 | 6.99 | 8.57 | 0.53 | 4.98 | ||
Fengshan (Full) | PM2.5 | 7.91 | 58.08 | 45.4 | 37.78 | 4.23 | |
PM10 | 19.01 | 54.86 | 40.58 | 40.05 | 6.32 | ||
O3 | 0.06 | 31.04 | 16.11 | 0.32 | 4.98 | ||
Mailiao (ARDL) | MAE | PM2.5 | 0.67 | 2.56 | 2.79 | 0.68 | 3.23 |
PM10 | 14.72 | 14.23 | 15.79 | 12.72 | 11.55 | ||
O3 | 0.04 | 0.45 | 0.84 | 0.07 | 9.36 | ||
Mailiao (IVSM) | PM2.5 | 4.26 | 32.9 | 35.87 | 33.75 | 3.23 | |
PM10 | 22.96 | 50.59 | 61.12 | 53.58 | 11.55 | ||
O3 | 0.09 | 6.35 | 9.1 | 0.51 | 9.36 | ||
Mailiao (Full) | PM2.5 | 4.35 | 67.56 | 41.78 | 35.84 | 3.23 | |
PM10 | 24.32 | 109.07 | 52.33 | 54.61 | 11.55 | ||
O3 | 0.06 | 22.36 | 14.6 | 0.95 | 9.36 |
Range | Mean | Standard Deviation | |
---|---|---|---|
PM2.5 | 120 | 32.52 | 15.63 |
PM10 | 540.4 | 80.89 | 43.42 |
O3 | 122 | 31.94 | 19.10 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, C.-H.; Tsai, M.-C. An Intelligent Time Series Model Based on Hybrid Methodology for Forecasting Concentrations of Significant Air Pollutants. Atmosphere 2022, 13, 1055. https://doi.org/10.3390/atmos13071055
Cheng C-H, Tsai M-C. An Intelligent Time Series Model Based on Hybrid Methodology for Forecasting Concentrations of Significant Air Pollutants. Atmosphere. 2022; 13(7):1055. https://doi.org/10.3390/atmos13071055
Chicago/Turabian StyleCheng, Ching-Hsue, and Ming-Chi Tsai. 2022. "An Intelligent Time Series Model Based on Hybrid Methodology for Forecasting Concentrations of Significant Air Pollutants" Atmosphere 13, no. 7: 1055. https://doi.org/10.3390/atmos13071055
APA StyleCheng, C. -H., & Tsai, M. -C. (2022). An Intelligent Time Series Model Based on Hybrid Methodology for Forecasting Concentrations of Significant Air Pollutants. Atmosphere, 13(7), 1055. https://doi.org/10.3390/atmos13071055