A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan
Abstract
:1. Introduction
2. Related Work and Contributions
2.1. Related Work
2.2. Research Trends and Contributions of This Paper
3. Proposed Methods
3.1. Framework Architecture
3.2. Feature Selection
3.3. Long-Term Learning
3.3.1. Cluster Linear Regression
3.3.2. Multilayer Perceptron
3.4. Short-Term Learning
3.4.1. Fourier Series Descriptor
3.4.2. Multilayer Perceptron
3.5. Multi-Model Integration Strategies
- Averaged strategy. This strategy simply determines the final forecast as the mean of the four individual forecasts as calculated by .
- Weighted strategy. The strategy considers the final forecast as the sum of weighted forecasts made by individual models. Let the prediction RMSE error estimated for the ith model during its training phase be . The normalized reciprocal error is adopted as the weight for the model, i.e., . The final forecast is then calculated by .
- Max_Avg_Min strategy. This strategy determines the final forecast according to the classified value range of the PM2.5. If the test instance is classified as in the high/low range, the maximal/minimal forecast value made by individual models is output as the final forecast. If it is classified in the middle range, the strategy outputs the same forecast value as that made by the Averaged strategy. In other words, the final prediction determined by the Max_Avg_Min strategy can be calculated as follows.
- Max_Wgt_Min strategy. This strategy resembles the Max_Avg_Min strategy by assigning the maximal or minimal forecast value made by individual models as the final forecast if the classified range is high or low. However, if the test instance is classified as in the middle range, the strategy outputs the same forecast value as that made by the weighted strategy, i.e.,
- Adpt_Wgt strategy. This strategy adopts the adaptive weighting scheme to produce the final forecast. In precise terms, the set of the individual forecast which falls in the classified PM2.5 range is identified. The final forecast is determined by calculating the sum of the weighted forecasts which are contained in the identified set. So the Adpt_Wgt strategy will adapt to the models which are validated by the random forest. The Adpt_Wgt strategy can be realized by the following formula.
- Lower bound. To realize how well our multi-model strategies work for combining multiple forecasts, a lower bound for the forecasting error is calculated for comparison. The lower bound is the best forecasting root mean square error (RMSE) or mean average error (MAE) that could be possibly obtained by selecting a model for each prediction. That is, for each instance in the test set, the best of the four model forecasts which is nearest to the actual PM2.5 is manually selected. After the best forecasts for all test instances have been selected, their RMSE and MAE are calculated and designated as the lower bounds. It is noted that here the lower bound is only referring to the optimal performance by model selection. It is not intended to indicate the global lower bound for any forms of multi-model hybridization.
4. Proposed Experimental Results and Comparative Performance
4.1. Dataset Description and Forecast Performance Measures
4.2. Performance of Single Models
4.3. Performance of Short-Term and Long-Term Learning Ensembles
4.4. Performance of Various Multi-Model Strategies
4.5. Comparative Performances on Delhi Dataset
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lee, W.C.; Shen, L.; Catalano, P.J.; Mickley, L.J.; Koutrakis, P. Effects of future temperature change on PM2.5 infiltration in the Greater Boston area. Atmos. Environ. 2017, 150, 98–105. [Google Scholar] [CrossRef]
- Liang, C.S.; Duan, F.K.; He, K.B.; Ma, Y.L. Review on recent progress in observations, source identifications and countermeasures of PM2.5. Environ. Int. 2016, 86, 150–170. [Google Scholar] [CrossRef] [PubMed]
- Hwang, S.L.; Lin, Y.C.; Guo, S.E.; Chi, M.C.; Chou, C.T.; Lin, C.M. Emergency room visits for respiratory diseases associated with ambient fine particulate matter in Taiwan in 2012: A population-based study. Atmos. Pollut. Res. 2017, 8, 465–473. [Google Scholar] [CrossRef]
- Song, C.; He, J.; Wu, L.; Jin, T.; Chen, X.; Li, R.; Ren, P.; Zhang, L.; Mao, H. Health burden attributable to ambient PM2.5 in China. Environ. Pollut. 2017, 223, 575–586. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.C.; Chiang, H.C.; Hsu, C.Y.; Yang, T.T.; Lin, T.Y.; Chen, M.J.; Chen, N.T.; Wu, Y.S. Ambient PM2.5-bound polycyclic aromatic hydrocarbons (PAHs) in Changhua County, Central Taiwan: Seasonal variation, source apportionment and cancer risk assessment. Environ. Pollut. 2016, 218, 372–382. [Google Scholar] [CrossRef] [PubMed]
- WHO Media Centre. Ambient (Outdoor) Air Quality and Health. 2016. Available online: http://www.who.int/mediacentre/factsheets/fs313/en/ (accessed on 16 December 2021).
- Di, Q.; Koutrakis, P.; Schwartz, J. A hybrid prediction model for PM2.5 mass and components using a chemical transport model and land use regression. Atmos. Environ. 2016, 131, 390–399. [Google Scholar] [CrossRef]
- Wang, P.; Zhang, H.; Qin, Z.; Zhang, G. A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos. Pollut. Res. 2017, 8, 850–860. [Google Scholar] [CrossRef]
- Zhang, B.; Li, X.; Zhao, Y.; Li, Y.; Wang, X. Air quality PM2.5 prediction based on multi-model fusion. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019. [Google Scholar]
- Pew, R.W.; Mavor, A.S. (Eds.) Human-System Integration in The System Development Process: A New Look; National Academy Press: Washington, DC, USA, 2007. [Google Scholar]
- Shylesh, S. A Study of Software Development Life Cycle Process Models; Elsevier SSRN: Amsterdam, The Netherlands, 2017. [Google Scholar]
- Vlachogianni, A.; Kassomenos, P.; Karppinen, A.; Karakitsios, S.; Kukkonen, J. Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki. Sci. Total Environ. 2011, 409, 1559–1571. [Google Scholar] [CrossRef]
- Cobourn, W.G. An enhanced PM2.5 air quality forecast model based on nonlinear regression and back-trajectory concentrations. Atmos. Environ. 2010, 44, 3015–3023. [Google Scholar] [CrossRef]
- Baker, K.R.; Foley, K.M. A nonlinear regression model estimating single source concentrations of primary and secondarily formed PM2.5. Atmos. Environ. 2011, 45, 3758–3767. [Google Scholar] [CrossRef]
- Yin, Q.; Wang, J.; Hu, M.; Wong, H. Estimation of daily PM2.5 concentration and its relationship with meteorological conditions in Beijing. J. Environ. Sci. 2016, 48, 161–168. [Google Scholar] [CrossRef]
- Guo, Y.; Tang, Q.; Gong, D.Y.; Zhang, Z. Estimation ground-level PM2.5 concentrations in Beijing using a satellite-based geographically and temporally weighted regression model. Remote Sens. Environ. 2017, 198, 140–149. [Google Scholar] [CrossRef]
- Moisan, S.; Herrera, R.; Clements, A. A dynamic multiple equation approach for forecasting PM2.5 pollution in Santiago, Chile. Int. J. Forecast. 2018, 34, 566–581. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Liu, P.; Sun, X.; Zhang, C.; Wang, M.; Xu, J.; Pu, S.; Huang, L. Application of an advanced spatiotemporal model for PM2.5 prediction in Jiangsu Province, China. Chemosphere 2020, 246, 125563. [Google Scholar] [CrossRef] [PubMed]
- Ni, X.Y.; Huang, H.; Du, W.P. Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data. Atmos. Environ. 2017, 150, 146–161. [Google Scholar] [CrossRef]
- Ausati, S.; Amanollahi, J. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5. Atmos. Environ. 2016, 142, 465–474. [Google Scholar] [CrossRef]
- Niu, M.; Gan, K.; Sun, S.; Li, F. Application of decomposition-ensemble learning paradigm with phase space reconstruction for day-ahead PM2.5 concentration forecasting. J. Environ. Manag. 2017, 196, 110–118. [Google Scholar] [CrossRef]
- Mao, X.; Shen, T.; Feng, X. Prediction of hourly ground level PM2.5 concentrations 3 days in advance using neural networks with satellite data in eastern China. Atmos. Pollut. Res. 2017, 8, 1005–1015. [Google Scholar] [CrossRef]
- Di, Q.; Amini, H.; Shi, L.; Kloog, I.; Silvern, R.; Kelly, J.; Sabath, M.B.; Choirat, C.; Koutrakis, P.; Lyapusting, A.; et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ. Int. 2019, 130, 104909. [Google Scholar] [CrossRef]
- Xiao, F.; Yang, M.; Fan, H.; Fan, G.; Al-Qaness, M.A.A. An improved deep learning model for predicting daily PM2.5 concentration. Sci. Rep. 2020, 10, 20988. [Google Scholar] [CrossRef]
- Qin, D.; Yun, J.; Zou, G.; Yong, R.; Zhao, Q.; Zhang, B. A novel combined prediction scheme based on CNN and LSTM for urban PM25 concentration. IEEE Access 2019, 7, 20050–20059. [Google Scholar] [CrossRef]
- Zhu, H.; Fan, L. PM2.5 forecasting based on artificial neural network and genetic algorithm. Int. J. Simul. Syst. Sci. Technol. 2015, 16, 10.1–10.5. [Google Scholar]
- Zhang, C.J.; Dai, L.J.; Ma, L.M. Rolling forecasting model of PM2.5 concentration based on support vector machine and particle swarm optimization. In Proceedings of the International Symposium on Optoelectronic Technology and Application 2016, Beijing, China, 9–11 May 2016; p. 101561I. [Google Scholar] [CrossRef]
- Sun, W.; Sun, J. Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J. Environ. Manag. 2017, 188, 144–152. [Google Scholar] [CrossRef]
- Dhyani, R.; Sharma, N.; Maity, A.K. Prediction of PM2.5 along urban highway corridor under mixed traffic conditions using CALINE4 model. J. Environ. Manag. 2017, 198, 24–32. [Google Scholar] [CrossRef] [PubMed]
- Tsai, Y.I.; Sopajaree, K.; Kuo, S.C.; Yu, S.P. Potential PM2.5 impacts of festival related burning and other inputs on air quality in an urban area of southern Taiwan. Sci. Total Environ. 2015, 527–528, 65–79. [Google Scholar] [CrossRef] [PubMed]
- Reff, A.; Eberly, S.I.; Bhave, P.V. Receptor modeling of ambient particulate matter data using positive matrix factorization: Review of existing methods. J. Air Waste Manag. Assoc. 2007, 57, 146–154. [Google Scholar] [CrossRef] [Green Version]
- Kumar, S.; Mishra, S.; Singh, S.K. A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon 2020, 6, e05618. [Google Scholar] [CrossRef]
- Boehm, B.W. A spiral model of software development and enhancement. IEEE Comput. 1988, 21, 61–72. [Google Scholar] [CrossRef]
- Boehm, B.W. Spiral Development: Experience, Principles, and Refinements; Special Report; CMU/SEI-2000-SR-008; Software Engineering Institute: Pittsburgh, PA, USA, 2000. [Google Scholar]
- Hsu, C.H.; Cheng, F.Y. Classification of weather patterns to study the influence of meteorological characteristics on PM2.5 concentrations in Yunlin County, Taiwan. Atmos. Environ. 2016, 144, 397–408. [Google Scholar] [CrossRef]
- Govindaraju, R.S. ASCE Task Committee on Application of Artificial Neural Networks in Hydrology 2000. Artificial neural networks in hydrology. II: Hydrology applications. J. Hydrol. Eng. 2000, 5, 124–137. [Google Scholar]
- Derrac, J.; García, S.; Herrera, F. A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 2010, 1, 60–92. [Google Scholar] [CrossRef] [Green Version]
Regression or Autoregression | Artificial Neural Network (ANN) | Machine Learning (ANN Excluded) | Meta-Evolution | Receptor Model | Time Units of Forecasting | Country of Studied Area | |
---|---|---|---|---|---|---|---|
Zhu and Fan, 2015 | ● | day | China | ||||
Tsai et al., 2015 | ● | day | Taiwan | ||||
Yin et al., 2016 | ● | day | China | ||||
Ausati and Amanollahi, 2016 | ● | day | Iran | ||||
Di et al., 2016 * | ● | ● | day | USA | |||
Zhang et al., 2016 | ● | hour | China | ||||
Ni et al., 2017 | ● | hour/day | China | ||||
Niu et al., 2017 | ● | day | China | ||||
Wang et al., 2017 * | ● | ● | hour | China | |||
Sun and Sun, 2017 | ● | day | China | ||||
Mao et al., 2017 | ● | hour | China | ||||
Dhyni et al., 2017 | ● | hour | India | ||||
Guo et al., 2017 | ● | day | China | ||||
Moisan et al., 2018 | ● | hour | Chile | ||||
Zhang et al., 2019 * | ● | ● | hour | China | |||
Di et al., 2019 | ● | day/season/year | USA | ||||
Qin et al., 2019 | ● | hour | China | ||||
Zhang et al., 2020 | ● | week/season/year | China | ||||
Xiao et al., 2020 | ● | day | China | ||||
Kumar et al., 2020 * | ● | ● | hour | India |
Initial Variables | Variable Descriptions | Remova Order | Finally Retained |
---|---|---|---|
(1) Periodic variables | |||
sin_d | Sine of ordinal hour in a day | 15 | √ |
cos_d | Cosine of ordinal hour in a day | 16 | √ |
sin_y | Sine of ordinal hour in a year | 3 | |
cos_y | Cosine of ordinal hour in a year | 6 | |
(2) Meteorological variables | |||
Temp | Temperature | 14 | √ |
RH | Relative humidity | 8 | |
Prep | Precipitation | 7 | |
WS | Wind speed | 10 | √ |
sin_w | Sine of wind direction | 2 | |
cos_w | Cosine of wind direction | 5 | |
(3) Short-term history meteorological variables | |||
ST_Temp | Mean temperature in prior six hours | 17 | √ |
ST_RH | Mean relative humidity in prior six hours | 19 | √ |
ST_Prep | Mean precipitation in prior six hours | 9 | √ |
ST_WS | Mean wind speed in prior six hours | 13 | √ |
sinw_WS | Sum of product of sin_w and WS in prior six hours | 1 | |
cosw_WS | Sum of product of cos_w and WS in prior six hours | 4 | |
ST_WB | Rooted sum of square sinw_WS and square cosw_WS | 18 | √ |
(4) Short-term history autoregression variables | |||
L_PM2.5 | Last hour PM2.5 in the preceding day | 20 | √ |
D1_PM2.5 | Mean hourly PM2.5 in the preceding day | 11 | √ |
D2_PM2.5 | Mean hourly PM2.5 in the day 24 h ahead | 12 | √ |
RMSE | MAE | MAPE | |
---|---|---|---|
MLP(8) | 8.63 | 6.36 | 0.42 |
MLP(10) | 7.84 | 5.74 | 0.40 |
MLP(12) | 7.82 | 5.71 | 0.38 |
MLP(12, 12, 12) | 8.14 | 5.84 | 0.40 |
RMSE | MAE | MAPE | |
---|---|---|---|
FSD(30) | 10.19 | 7.69 | 0.61 |
FSD(40) | 11.89 | 9.23 | 0.73 |
FSD(50) | 10.14 | 7.74 | 0.62 |
FSD(60) | 9.59 | 7.35 | 0.60 |
FSD(70) | 12.95 | 9.66 | 0.75 |
FSD(80) | 9.59 | 7.36 | 0.59 |
FSD(90) | 12.65 | 9.50 | 0.73 |
RMSE | MAE | MAPE | |
---|---|---|---|
CLR | 8.94 | 6.29 | 0.43 |
LMLP | 7.82 | 5.71 | 0.38 |
FSD | 9.59 | 7.35 | 0.60 |
SMLP | 10.73 | 8.04 | 0.60 |
Mean | 9.27 | 6.85 | 0.50 |
RMSE | MAE | MAPE | |
---|---|---|---|
Long-term ensemble | 7.86 | 5.67 | 0.38 |
Short-term ensemble | 10.42 | 7.83 | 0.58 |
Models | RMSE | MAE | MAPE | R2 |
---|---|---|---|---|
Averaged | 8.02 | 5.92 | 0.44 | 0.54 |
Weighted | 7.70 | 5.64 | 0.41 | 0.57 |
Max_Avg_Min | 8.01 | 5.91 | 0.44 | 0.54 |
Max_Wgt_Min | 7.69 | 5.63 | 0.41 | 0.57 |
Adpt_Wgt | 8.31 | 6.20 | 0.51 | 0.50 |
Mean | 7.95 | 5.86 | 0.44 | 0.54 |
Lower Bound | 4.73 | 2.92 | 0.21 | 0.84 |
LSTM | 8.09 | 5.50 | 0.40 | 0.59 |
CNN | 9.20 | 6.97 | 0.58 | 0.39 |
Sources | Models | RMSE | MAE | MAPE |
---|---|---|---|---|
Single models | CLR | 28.23 | 19.73 | 0.41 |
LMLP | 29.43 | 21.45 | 0.41 | |
FSD | 40.51 | 25.38 | 0.48 | |
SMLP | 29.32 | 19.12 | 0.40 | |
Ensembles | Long-term ensemble | 27.16 | 19.17 | 0.38 |
Short-term ensemble | 27.13 | 18.82 | 0.38 | |
Averaged | 26.75 | 17.81 | 0.34 | |
Weighted | 25.26 | 16.93 | 0.32 | |
Max_Avg_Min | 26.95 | 18.17 | 0.35 | |
Max_Wgt_Min | 25.52 | 17.34 | 0.33 | |
Adpt_Wgt | 25.36 | 17.38 | 0.34 | |
Lower Bound | 16.58 | 9.30 | 0.16 | |
Kumar et al. | Decision trees (DT) | 38.13 | 22.18 | − |
(2020) | Random forest (RF) | 25.83 | 15.21 | − |
Extra trees (ET) | 25.37 | 15.04 | − | |
DT + AdaBoost | 25.40 | 14.46 | − | |
RF + AdaBoost | 25.30 | 14.99 | − | |
ET + AdaBoost | 25.11 | 14.79 | − | |
LSTM | 28.97 | 16.66 | − |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, P.-Y.; Yen, A.Y.; Chao, S.-E.; Day, R.-F.; Bhanu, B. A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan. Appl. Sci. 2022, 12, 2484. https://doi.org/10.3390/app12052484
Yin P-Y, Yen AY, Chao S-E, Day R-F, Bhanu B. A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan. Applied Sciences. 2022; 12(5):2484. https://doi.org/10.3390/app12052484
Chicago/Turabian StyleYin, Peng-Yeng, Alex Yaning Yen, Shou-En Chao, Rong-Fuh Day, and Bir Bhanu. 2022. "A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan" Applied Sciences 12, no. 5: 2484. https://doi.org/10.3390/app12052484
APA StyleYin, P.-Y., Yen, A. Y., Chao, S.-E., Day, R.-F., & Bhanu, B. (2022). A Machine Learning-Based Ensemble Framework for Forecasting PM2.5 Concentrations in Puli, Taiwan. Applied Sciences, 12(5), 2484. https://doi.org/10.3390/app12052484