Next Article in Journal
PUT-Hand—Hybrid Industrial and Biomimetic Gripper for Elastic Object Manipulation
Next Article in Special Issue
Smart Image Enhancement Using CLAHE Based on an F-Shift Transformation during Decompression
Previous Article in Journal
Perceptual Metric Guided Deep Attention Network for Single Image Super-Resolution
Previous Article in Special Issue
EEG Self-Adjusting Data Analysis Based on Optimized Sampling for Robot Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Outdoor Particulate Matter Correlation Analysis and Prediction Based Deep Learning in the Korea

1
Department of Computer Science, Soonchunhyang University, Asan 31538, Korea
2
Department of Computer Software Engineering, Soonchunhyang University, Asan 31538, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(7), 1146; https://doi.org/10.3390/electronics9071146
Submission received: 26 May 2020 / Revised: 29 June 2020 / Accepted: 30 June 2020 / Published: 15 July 2020
(This article belongs to the Special Issue Smart Processing for Systems under Uncertainty or Perturbation)

Abstract

:
Particulate matter (PM) has become a problem worldwide, with many deleterious health effects such as worsened asthma, affected lungs, and various toxin-induced cancers. The International Agency for Research on Cancer (IARC) under the World Health Organization (WHO) has designated PM as a group 1 carcinogen. Although Korea Environment Corporation forecasts the status of outdoor PM four times a day, whichever is higher among PM10 and PM2.5. Korea Environment Corporation forecasts for the stages of PM. It remains difficult to predict the value of PM when going out. We correlate air quality and solar terms, address format, and weather data, and PM in the Korea. We analyzed the correlation between address format, air quality data, and weather data, and PM. We evaluated performance according to the sequence length and batch size and found the best outcome with a sequence length of 7 days, and a batch size of 96. We performed PM prediction using the Long Short-Term Recurrent Unit (LSTM), the Convolutional Neural Network (CNN), and the Gated Recurrent Unit (GRU) models. The CNN model suffered the limitation of only predicting from the training data, not from the test data. The LSTM and GRU models generated similar prediction results. We confirmed that the LSTM model has higher accuracy than the other two models.

1. Introduction

Recently, PM has become a problem worldwide. The causes of PM are industries, etc [1]. PM10 and PM2.5 mean that the length of the diameter is 10 μm or less and 2.5 μm or less, respectively. Air pollution causes 7 million deaths annually worldwide, including children [2]. Air pollution affects the respiratory symptoms, causes breathing problems, and decreases respiratory function [3,4,5,6,7,8,9,10,11,12]. In particular, PM worsens asthma and affect lung [13,14,15,16]. In 2013, the IARC under the WHO designated PM as a group 1 carcinogen. In 2018, to prevent the deterioration of health caused by PM, the Korea strengthened the definition for the level of PM to that of advanced countries (Table 1). PM is generated during the process of power plants and vehicles [17]. PM10 reduced through the filter of the power plant, but PM2.5 is not reduced through the filter of power plant [17]. The causes of PM in the Korea are domestic, and international from neighboring countries such as China [17,18,19]. The Government of South Korea estimates that it is 30 to 50% of ultrafine dust from China as a cause of air pollution [20]. Deputy Mayor for Climate & Environment in the Korea estimates that the ratio of ultrafine dust generated in Korea is 50–70% [21]. According to a report by the Korea Environmental Industry & Technology Institute (KEITI) [22], it is estimated that 51% of ultra-fine dust in Seoul is generated. A high concentration of PM occurs every day in the Korea [21].
In addition, Korea Environment Corporation forecasts the status of PM, whichever is higher among PM10 and PM2.5, for times a day (5 AM, 11 AM, 5 PM, and 11 PM) according to four stages: Good, Normal, Bad, and Very bad (Table 2). Korea Environment Corporation forecasts for the stages of PM. Therefore, it is difficult to predict the value of PM when going out. Recently, the prediction of PM in the Korea has been studied [23,24,25].
Yi at al [26] used air quality information and PM, and included air quality data, PM data, and location information in the image, and used location information using a grid pattern through the ConvLSTM model. However, it was applied to only one city, and only air quality data and PM data were used. The PM was correlated with weather data, but Yi et al. did not consider the weather data. Their study also suffered the limitation of needing to perform pre-processing to create a grid image based on time series data. Vong et al. [27] used SVM and MLM models to predict PM in Macau, China. Vong et al. used air quality data and weather data. When using 11 variables, the accuracy of the ELM model is approximately 81%, and the accuracy of the SVM model is approximately 79%. When using 52 variables, the accuracy of the ELM model is approximately 75%, and the accuracy of the SVM model is approximately 78%. PM2.5 had more adverse effects on health, but Vong predicted only PM10. Xayasouk et al [25] studied PM with Deep AutoEncoder (DAE) and LSTM model in Seoul, the Korea. The LSTM model was more accurate than the DAE model. Accordingly, we consider location information and air quality data and predict PM2.5 as well as PM10. We consider the Recurrent Neural Network (RNN) model such as the LSTM model. We analyze the correlation between address format, air quality data, and weather data, and PM and develop the PM prediction model.

2. Materials

2.1. Air Pollution Stations

In the Korea, air pollution stations are installed in eight cities and nine provinces (Table 3). The Korea has seven metropolitan cities, one self-governing city, eight provinces, and one self-governing province. We use air quality data from 2015 to 2019. Air quality data were collected by AirKorea [28]. The collected air quality data include PM10, PM2.5, O3, CO, SO2, and NO2 and were measured every hour; however, some data were missing due to communication problems in the air pollution stations.

2.2. Weather Stations

In the Korea, weather stations are installed in eight cities and nine provinces (Table 4). We used only the weather data measured at stations with the Automated Surface Observing System (ASOS) installed. The Korea has seven metropolitan cities, one self-governing city, eight provinces, and one self-governing province. We use weather data from 2015 to 2019. Weather data were collected by the Korea Meteorological Administration [29]. The collected weather data were temperature, humidity, wind speed, wind direction, snowfall, precipitation, and atmospheric pressure, and were measured every hour; however, some data were missing due to communication problems in the air pollution stations.

2.3. Data Description

Air quality data and weather data are important for predicting PM. We used Korea’s air quality data and weather data from 1 January 2015 to 30 November 2019, because air quality data provided by Air Korea did not have final measurement data as of December 2019. We added weather data for each region to air quality data. In some locations, air quality data were available, but weather data were not, so these locations were deleted. In addition, we deleted data in areas where air quality data were measured, but weather data were not. We combined weather data and air quality data based on address and time.

2.3.1. Weather Data

Weather data is meaningful data in predicting PM. Temperature affects air quality data [30]. Low humidity and low temperature are related to the PM2.5 concentration [31]. Also, the wind carries PM and affects the PM concentration [30,31]. In addition, rain removes air pollution [32,33].

2.3.2. Air Quality Data

NO2 and CO are related to automobile emissions [33], and SO2 and PM are related to factory emissions. O2 is produced when there are NO2 and NO in the atmosphere [34]. SO2, NO2, and O3 have a high correlation with PM [34], and a low correlation between CO and PM [35].

2.3.3. Address Format

PM is affected by the surrounding cities because it moves in the wind [34,36]. The address format used a total of three methods. The first method was divided into (1) city name, (2) district name or county name or ward name, and (3) dong name or eup name or myeon name. The second method was divided into (1) province name, (2) city name or county name or and (3) dong name or eup name or, and myeon name. The third method is the address of the station. We removed the building name from the address of the air quality station. Accordingly, we used the address format using the first method or the second method and the third method. We converted the address format using the one-hot encoding library.

2.3.4. Twenty-Four Solar Terms

The 24 solar terms are divided into seasons. Twenty-four solar terms are significant in time series data affected by weather [37]. We divided the 24 solar terms by dates provided by the Korea Meteorological Administration [38]. We converted the 24 solar terms using the one-hot encoding library.

2.3.5. Wind_x, Wind_y

The wind is expressed in terms of wind speed and wind direction. The wind direction represents the direction from 0 to 360 degrees, and the wind speed represents the distance per second of the wind. The accuracy is higher when calculating wind_x and wind_y than when using wind speed and wind direction [39].
wind x =   wind direction ×   cos   ( wind speed )
wind y = wind direction ×   sin   ( wind speed )  

2.3.6. Data Parameters

We predict PM10 and PM2.5 using air quality data and weather data. Air quality data include SO2, O3, CO, and NO2. Weather data include temperature, atmospheric pressure, humidity, precipitation, snowfall, precipitation, and snowfall. We performed deep learning using four air quality data, seven weather data, wind_x, wind_y, and 24 solar terms, and address format. The following shows the input data, measurement unit, and range (Table 5). We used data from 2015 to 2019. In the dataset, 70% were used as training data, 24% were used as verification data, and 6% were used as test data. We corrected the missing values for precipitation and snowfall to zero. CO and NO2 were corrected using the average. Resampling was performed for SO2 and O3, temperature, air pressure, wind speed, wind direction, and humidity. SO2 has 631319 missing values, CO has 636586, O3 has 474455, and NO2 has 492315. Temperatures have 470 missing values, wind speeds have 5242, wind directions have 19926, humidities have 6052, and atmospheric pressures have 1554. If the value of IsRain is 0, it means that it does not rain, and if it is 1, it means rain. If the value of IsSnow is 0, it means no snow, and 1 means snow.
We correlated the input data (Figure 1). Correlation analysis excludes Address1, Address2, Address3, Full Address, IsRain, and IsSnow, which are values expressed as integers, because only continuous time-series data has meaning.

3. Methods

In previous studies [40], the CNN-LSTM hybrid model showed better performance than the CNN and LSTM models. However, we excluded the hybrid models due to its low accuracy. We used the GRU model that is similar to the LSTM model, as well as the CNN model and LSTM model. We applied the CNN-LSTM and CNN-GRU hybrid models.

3.1. Long Short-Term Recurrent Unit

The LSTM model is one of the RNN models. The RNN model takes the previous results as input, causing long-term dependency problems (Figure 2).
The LSTM model is one of the models that solves the long-term dependency problem in the RNN model. Below is the structure of the LSTM model (Figure 3). The LSTM model has high accuracy for the prediction of time-series data.
Table 6 show the variables used in LSTM model. The LSTM model implements the logistic sigmoid function through Equation (3).
σ = 1 1 + e 1
The LSTM model uses Equation (4) to determine which data to keep and which to delete.
f t =   σ ( W f [ h t 1 , x t ] + b f )
The LSTM model determines the data to be updated with long-term memory through Equations (5) and (6). The LSTM model updates long-term memory through Equation (7).
i t =   σ ( W i [ h t 1 , x t ] + b i )
c t = tanh   ( W c [ h t 1 , x t ] + b c )  
  c t =   f t ×   c t 1 + i t × c t  
The LSTM model calculates the results through Equations (8) and (9).
o t =   σ ( W o [ h t 1 , x t ] +   b o )
  h t =   o t ×   tan h ( o t )  

3.2. Gated Recurrent Unit

The GRU model is an improved model of complex equations in the LSTM model. The following is the structure of the GRU model (Figure 4). The GRU model consists of the reset gate and the update gate. The GRU model is a model that improves the computation speed of the LSTM model.
Table 7 show the variables used in GRU model. In the GRU model, the logistic sigmoid function was implemented through Equation (3). The GRU model does not update or update h ˜ depending on the r t the value obtained through Equation (10). Through this, the long-term memory modification in the LSTM was improved.
r t j = σ ( [ W r x ] j + [ U r h t 1 ] j )
The GRU model uses h t 1 or h ˜ according to the z t   value through Equation (11). It updates h ˜ through Equation (12). The predicted value is output through Equation (13).
z t j = σ ( [ W z x ] j + [ U z h t 1 ] j )
h ˜ t j = tan h ( [ Wx ] j + [ U ( r t h t 1 ) ] ) j  
  h t j = z t j h t 1 j + ( 1 z t j ) h ˜ t j  

3.3. Convolutional Neural Network

The CNN model is used to extract hidden patterns through the convolution and pooling stages.

4. Experiments and Result

4.1. Evaluation Methods

We evaluated the performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). In deep learning, regression is verified via errors. MAE and RMSE are expressions of errors, and the lower the value, the better the performance. MAE and RMSE are calculated using Equations (14) and (15), respectively. MAE and RMSE use the difference between actual and predicted values to determine how many errors have occurred. The lower the values of MAE and RMSE, the higher the accuracy. Table 8 are variables used in MAE and RMSE.
MAE = 1 n i = 1 n | y i y i |
RMSE =   1 n i = 1 n ( y i y i ) 2

4.2. Data Prepare

We corrected outliers and missing values in the collected data. When the ranges of PM2.5 and PM10, humidity, temperature, precipitation, and snowfall were outside the range of Table 5, the average was corrected. The missing value of precipitation and snowfall was corrected to zero because it means that there was no rain or snow at that time. We resampled the missing values using the LSTM model to process the missing values of air quality data and weather data. When resampling air pollution information, we corrected the missing value using the average value (Table 9). The small values of the units of air quality data value made it difficult to understand whether it is appropriately predicted only by MAE and RMSE.
Accordingly, we evaluated the air pollution information by referring to a comparison graph between the actual (blue line) and predicted values (orange line; Figure 5). As a result of the comparison, NO2 and O3 are predicted similarly, but SO2 and CO are not. We resampled the missing values of NO2 and O3 using the predicted values. Because SO2 and CO are not predicted accurately, the missing values were corrected using the average.
When resampling the weather data, missing values were corrected using the average value. We predicted weather data for evaluation (Table 10) and compared the graph between the predicted (orange line) value and the actual (blue line) value (Figure 6). As a result of the comparison, atmospheric pressure, temperature, wind speed, wind direction, and humidity were predicted similarly. We have resampled the missing values of weather data using predicted values.

4.3. Comparison of PM Prediction Accuracy According to SO2

We performed PM prediction using only weather data to compare by air quality data (Table 11). Based on the results in Table 11, we analyzed the correlation by air quality data. Forecasts were made for PM10 and PM2.5 using only weather data (Figure 7). The orange line is the predicted value, and the blue line is the actual value.
We analyzed the correlation between SO2 and PM by comparing weather data and weather data with SO2. As a result, it was confirmed that RMSE and MAE of PM2.5 and PM10 were lowered (Table 12), and hence SO2 are related to PM2.5 and PM10. The actual and predicted values of PM10 and PM2.5 were compared (Figure 8). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM prediction is improved via SO2.

4.4. Comparison of PM Prediction Accuracy According to CO

We analyzed the correlation between CO and PM by comparing weather data and weather data with CO. As a result, it was confirmed that RMSE and MAE of PM2.5 and PM10 were lowered (Table 13), and hence CO is related to PM2.5 and PM10. The actual and predicted values of PM10 and PM2.5 were compared (Figure 9). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM prediction is improved via CO.

4.5. Comparison of PM Prediction Accuracy According to NO2

We analyzed the correlation between NO2 and PM by comparing weather data and weather data with NO2. As a result, it was confirmed that RMSE and MAE of PM2.5 and PM10 were lowered (Table 14), and hence NO2 is related to PM2.5 and PM10. The actual and predicted values of PM10 and PM2.5 were compared (Figure 10). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM prediction is improved via NO2.

4.6. Comparison of PM Prediction Accuracy According to O3

We analyzed the correlation between O3 and PM by comparing weather data and weather data with O3, and found that RMSE and MAE of PM10 were lowered (Table 15), MAE of PM2.5 was higher (Table 15), and hence O3 is related to PM10. The actual and predicted values of PM10 and PM2.5 were compared (Figure 11). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM10 prediction is improved via O3.

4.7. Comparison of PM Prediction Accuracy According to Atmospheric pressure

We analyzed the correlation between atmospheric pressure and PM by comparing weather data and weather data with atmospheric pressure. As a result, it was confirmed that RMSE and MAE of PM10 were lowered (Table 16), RMSE and MAE of PM2.5 were higher (Table 16), and hence atmospheric pressure is related to PM10. The actual and predicted values of PM10 and PM2.5 were compared (Figure 12). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM10 prediction is improved via atmospheric pressure.

4.8. Comparison of PM Prediction Accuracy According to Wind_x, Wind_y

We confirmed that the accuracy was improved through Equations (1) and (2) discussed in Section 2.3.5. After removing wind speed and wind direction, we evaluated the performance by adding wind_x and wind_y (Table 17). We confirmed that the accuracy of PM prediction is improved via wind_x and wind_y (Figure 13).

4.9. Comparison of PM Prediction Accuracy According to Address Format

We used the address format as discussed in the Section 2.3.3. The results were compared with Table 9 and confirmed to be improved (Table 18), and hence, the city and road are separated and affected by adjacent cities and provinces. These are the actual and predicted values of PM10 and PM2.5 (Figure 14). The orange line is the predicted value, and the blue line is the actual value. We confirmed that the accuracy of PM prediction is improved via address format.

4.10. Comparison of PM Prediction Accuracy According to 24 solar terms

We used the 24 solar terms as discussed in Section 2.3.4. The results were compared with Table 7 and confirmed to be improved (Table 19). We confirmed that the accuracy of PM prediction is improved via 24 solar terms (Figure 15).

4.11. Comparison of PM Prediction Accuracy According to Sequence Length

We evaluated the performance according to the sequence length with the data parameters in Table 5. The sequence length is 24 per day. The comparison confirmed that the best result was a sequence length of 7 days (Table 20), which generated the lowest values for PM2.5 MAE, PM2.5 RMSE, PM10 MAE, and PM10 RMSE. We highlighted that MAE and RMSE were reduced.

4.12. Comparison of PM Prediction Accuracy According to Batch Size

We evaluated the performance according to the batch size and found the best result with a batch size of 96 (Table 21). We highlighted that MAE and RMSE were reduced.

4.13. Compare of LSTM Model, CNN Model, and GRU Model

Based on the experimental results, we evaluated the performance with the LSTM and CNN models, the GRU model with a sequence length of 7 days, and a batch size of 96 (Table 22). As a result, the LSTM model appeared to be highly accurate in the training data (Figure 16), but the test data confirmed that the CNN model did not show similarity (Figure 17). The gray line is the actual value and the red line is the prediction value of the GRU model. The green line is the CNN prediction value, and the blue line is the LSTM model prediction value. We highlighted that lowest MAE and RMSE.
The difference between the actual and predicted values in each model was confirmed to differ from the training data and the test data. In particular, the training data of the CNN model had some similarities, but the test data could not be predicted. We confirmed that the highest accuracy was obtained using the LSTM model (Table 22).

5. Conclusions

We analyzed other PM prediction models. We analyzed the correlation between address format, air quality data, and weather data, and PM. We developed a PM prediction model through the 24 solar terms provided by the Korea Meteorological Administration, weather data, and air quality data provided by AirKorea. Our paper makes the following key contributions to the literature. (1) It was confirmed that the address format improves the accuracy when developing prediction models that are affected by regions such as PM and weather. (2) When developing a forecasting model that is affected by seasons such as PM and weather, it was confirmed that accuracy is improved when 24 solar terms are used. (3) It was confirmed that the air quality data improves the accuracy when developing the PM prediction model. (4) It was confirmed that the atmospheric pressure improves the accuracy when developing the PM prediction model. For future research, the authors will apply hybrid models, such as convolutional neural networks and recurrent neural networks. For future research, the authors will apply visualization such as a map. For future research, the authors will apply hybrid models of the LSTM model and other deep learning models (e.g., RNN, CNN, GRU, DAE, Q-Networks) to improve the accuracy of PM prediction.

Author Contributions

Conceptualization, H.L.; methodology, M.C. and S.H.; software, M.C.; validation, M.C., S.H. and H.L.; formal analysis, M.C.; investigation, H.L.; resources, M.C.; data curation, M.C and S.H.; writing original draft preparation, M.C.; writing—review and editing, H.L.; visualization, M.C.; supervision, H.L.; project administration, H.L; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2019-2015-0-00403) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation) and by Soonchunhyang Research Fund.

Acknowledgments

We appreciate the air quality indices data provided by the Korea Environment Corporation (http://www.airkorea.or.kr/) and the weather data provided by the Korea Meteorological Administration (http://www.kma.go.kr). This manuscript was proofread and edited by the professional English editors at HARRISCO (http://en.harrisco.net).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Park, D.U.; Kwon, C.H. Characteristics of PM10, PM2.5, CO2 and CO monitored in interiors and platforms of subway train in Seoul, Korea. Environ. Int. 2008, 34, 629–634. [Google Scholar] [CrossRef] [PubMed]
  2. Ten Threats to Global Health in 2019. Available online: https://www.who.int/news-room/feature-stories/ten-threats-to-global-health-in-2019 (accessed on 3 February 2020).
  3. Ebi, K.L.; McGregor, G. Climate change, tropospheric ozone and particulate matter, and health impacts. Environ. Health Perspect. 2008, 116, 1449–1455. [Google Scholar] [CrossRef] [PubMed]
  4. Samet, J.; Buist, S.; Bascom, R.; Garcia, J.; Lipsett, M.; Mauderly, J.; Mannino, D.; Rand, C.; Romieu, I.; Utell, M.; et al. What constitutes an adverse health effect of air pollution. Am. J. Respir. Crit. Care Med. 2000, 161, 665–673. [Google Scholar] [CrossRef] [Green Version]
  5. Mannucci, P.M.; Harari, S.; Martinelli, I.; Franchini, M. Effects on health of air pollution: A narrative review. Intern. Emerg. Med. 2015, 10, 657–662. [Google Scholar] [CrossRef]
  6. Janssen, N.A.H.; Fischer, P.; Marra, M.; Ameling, C.; Cassee, F.R. Short-term effects of PM2.5, PM10 and PM2.5–10 on daily mortality in the Netherlands. Sci. Total Environ. 2013, 463, 20–26. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Gauderman, W.J.; Avol, E.; Gilliland, F.; Vora, H.; Thomas, D.; Berhane, K.; McConnell, R.; Kuenzli, N.; Lurmann, F.; Rappaport, E.; et al. The effect of air pollution on lung development from 10 to 18 years of age. N. Engl. J. Med. 2004, 351, 1057–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Terada, N.; Maesako, K.I.; Hiruma, K.; Hamano, N.; Houki, G.; Konno, A.; Ikeda, T.; Sai, M. Diesel exhaust particulates enhance eosinophil adhesion to nasal epithelial cells and cause degranulation. Int. Arch. Allergy Immunol. 1997, 114, 167–174. [Google Scholar] [CrossRef]
  9. Gent, J.F.; Triche, E.W.; Holford, T.R.; Belanger, K.; Bracken, M.B.; Beckett, W.S.; Leaderer, B.P. Association of low-level ozone and fine particles with respiratory symptoms in children with asthma. JAMA 2003, 290, 1859–1867. [Google Scholar] [CrossRef] [Green Version]
  10. Holz, O.; Mücke, M.; Paasch, K.; Böhme, S.; Timm, P.; Richter, K.; Magnussen, H.; Jörres, R.A. Repeated ozone exposures enhance bronchial allergen responses in subjects with rhinitis or asthma. Clin. Exp. Allergy 2002, 32, 681–689. [Google Scholar] [CrossRef]
  11. McDonnell, W.F.; Abbey, D.E.; Nishino, N.; Lebowitz, M.D. Long-term ambient ozone concentration and the incidence of asthma in nonsmoking adults: The AHSMOG Study. Environ. Res. 1999, 80, 110–121. [Google Scholar] [CrossRef]
  12. D’amato, G.; Liccardi, G.; D’Amato, M. Environmental risk factors (outdoor air pollution and climatic changes) and increased trend of respiratory allergy. J. Investig. Allergol. Clin. Immunol. 2000, 10, 123–128. [Google Scholar] [PubMed]
  13. Salvador, P.; Artíñano, B.; Querol, X.; Alastuey, A. A combined analysis of backward trajectories and aerosol chemistry to characterise long-range transport episodes of particulate matter: The Madrid air basin, a case study. Sci. Total Environ. 2008, 390, 495–506. [Google Scholar] [CrossRef] [PubMed]
  14. Delfino, R.J.; Quintana, P.J.E.; Floro, J.; Gastañaga, V.M.; Samimi, B.S.; Kleinman, M.T.; Liu, L.J.S.; Bufalino, C.; Wu, C.F.; McLaren, C.E. Association of FEV1 in asthmatic children with personal and microenvironmental exposure to airborne particulate matter. Environ. Health Perspect. 2004, 112, 932–941. [Google Scholar] [CrossRef] [Green Version]
  15. Riedl, M.; Sanchez, D.D. Biology of diesel exhaust effects on respiratory function. J. Allergy Clin. Immunol. 2005, 115, 221–228. [Google Scholar] [CrossRef] [PubMed]
  16. Behndig, A.F.; Mudway, I.S.; Brown, J.L.; Stenfors, N.; Helleday, R.; Duggan, S.T.; Wilson, S.J.; Boman, C.; Cassee, F.R.; Frew, A.J.; et al. Airway antioxidant and inflammatory responses to diesel exhaust exposure in healthy humans. Eur. Respir. J. 2006, 27, 359–365. [Google Scholar] [CrossRef] [PubMed]
  17. Song, G.J.; Moon, Y.H.; Joo, J.H.; Lee, A.Y.; Lee, J.B. Distribution of Hazardous Heavy Metal in TSP, PM10 and PM2.5 Emitted from Coal-fired Power Plants. J. Korean Soc. Environ. Anal. 2018, 21, 172–180. [Google Scholar]
  18. Kim, Y.P. Air pollution in Seoul caused by aerosols. J. Korean Soc. Atmos. Environ. 2006, 22, 535–553. [Google Scholar]
  19. Kim, Y.W.; Lee, H.S.; Jang, Y.J.; Lee, H.J. How does media construct particulate matter risks?: A news frame and source analysis on particulate matter risks. Korean J. Commun. Stud. 2015, 59, 121–154. [Google Scholar]
  20. Oh, Y.M. Countermeasures for Fine Dust Management. The Government of the Korea: Seoul, Korea, 2013. [Google Scholar]
  21. Lee, J.S. Climate and Environment Headquarters Plan; Deputy Mayor for Climate & Environment in the Korea: Seoul, Korea, 2015.
  22. Park, S.B. Cause of fine Dust and Response Policy Issues; Korea Environmental Industry & Technology Institute: Seoul, Korea, 2018.
  23. Lee, D.K.; Lee, S.W. Hourly Prediction of Particulate Matter (PM2.5) Concentration Using Time Series Data and Random Forest. Kips Trans. Softw. Data Eng. 2020, 9, 129–136. [Google Scholar] [CrossRef]
  24. Yang, G.; Lee, H.M.; Lee, G.Y. A Hybrid Deep Learning Model to Forecast Particulate Matter Concentration Levels in Seoul, South Korea. Atmosphere 2020, 11, 348. [Google Scholar] [CrossRef] [Green Version]
  25. Xayasouk, T.; Lee, H.M.; Lee, G.Y. Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models. Sustainability 2020, 12, 2570. [Google Scholar] [CrossRef] [Green Version]
  26. Yi, H.; Bui, K.H.N.; Seon, C.N. A Deep Learning LSTM Framework for Urban Traffic Flow and Fine Dust Prediction. Korean Inst. Inf. Sci. Eng. 2020, 47, 292–297. [Google Scholar]
  27. Vong, C.M.; Ip, W.F.; Wong, P.K.; Chiu, C.C. Predicting minority class for suspended particulate matters level by xtream learning machine. Neurocomputing 2014, 128, 136–144. [Google Scholar] [CrossRef]
  28. Airkorea. Available online: http://www.airkorea.or.kr (accessed on 3 February 2020).
  29. Home-Korea Meteorlogical Administration. Available online: http://www.kma.go.kr/ (accessed on 3 February 2020).
  30. Papanastasiou, D.K.; Melas, D.; Kioutsioukis, I. Development and assessment of neural network and multiple regression models in order to predict PM10 levels in a medium-sized Mediterranean city. Water Air Soil Pollut. 2007, 182, 325–334. [Google Scholar] [CrossRef]
  31. Tai, A.P.K.; Mickley, L.J.; Jacob, D.J. Correlations between fine particulate matter (PM2.5) and meteorological variables in the United States: Implications for the sensitivity of PM2.5 to climate change. Atmos. Environ. 2010, 44, 3976–3984. [Google Scholar] [CrossRef]
  32. Yoo, Y.M.; Lee, Y.R.; Kim, D.; Jeong, M.J.; Stockwell, W.R.; Kundu, P.K.; Oh, S.M.; Shin, D.B.; Lee, S.J. New indices for wet scavenging of air pollutants (O3, CO, NO2, SO2, and PM10) by summertime rain. Atmos. Environ. 2014, 82, 226–237. [Google Scholar] [CrossRef]
  33. Yuxuan, W.; Brendon, M.M.; William, M.J.; Jiming, H.; Hong, M.; Chris, N.; Yaosheng, C. Variations of O3 and CO in Summertime at a Rural Site Near Beijing. Atmos. Chem. Phys. 2008, 8, 6355–6363. [Google Scholar] [CrossRef] [Green Version]
  34. Grivas, G.; Chaloulakou, A. Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece. Atmos. Environ. 2006, 40, 1216–1229. [Google Scholar] [CrossRef]
  35. Xie, Y.; Zhao, B.; Zhang, L.; Luo, R. Spatiotemporal variations of PM2.5 and PM10 concentrations between 31 Chinese cities and their relationships with SO2, NO2, CO and O3. Particuology 2015, 20, 141–149. [Google Scholar] [CrossRef]
  36. Sayegh, A.S.; Munir, S.; Habeebullah, T.M. Comparing the performance of statistical models for predicting PM10 concentrations. Aerosol Air Qual. Res. 2014, 14, 653–665. [Google Scholar] [CrossRef] [Green Version]
  37. Xie, J.; Hong, T. Load forecasting using 24 solar terms. J. Mod. Power Syst. Clean Energy 2018, 6, 208–214. [Google Scholar] [CrossRef] [Green Version]
  38. Weather Data Open Portal [Climate Statistics Analysis: Meteorological Events: 24 Solar Terms]. Available online: https://data.kma.go.kr/climate/solarTerms/solarTerms.do (accessed on 24 February 2020).
  39. Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J. Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos. Environ. 2015, 107, 118–128. [Google Scholar] [CrossRef]
  40. Chae, M.S.; Han, S.W.; Lee, H.W. CNN and LSTM models for predicting particulate matter in the Korea. In Proceedings of the 8th International Conference on Information, System and Convergence Applications, Ho Chi Minh, Vietnam, 12–15 February 2020. [Google Scholar]
Figure 1. The correlation analysis result.
Figure 1. The correlation analysis result.
Electronics 09 01146 g001
Figure 2. Structure of the RNN model.
Figure 2. Structure of the RNN model.
Electronics 09 01146 g002
Figure 3. Structure of the LSTM model.
Figure 3. Structure of the LSTM model.
Electronics 09 01146 g003
Figure 4. Structure of the GRU model.
Figure 4. Structure of the GRU model.
Electronics 09 01146 g004
Figure 5. A comparison graph of actual and predicted values for NO2, SO2, O3, and CO. (a) Predicted and actual value of NO2; (b) predicted and actual values of SO2; (c) predicted and actual values of O3; (d) predicted and actual values of CO.
Figure 5. A comparison graph of actual and predicted values for NO2, SO2, O3, and CO. (a) Predicted and actual value of NO2; (b) predicted and actual values of SO2; (c) predicted and actual values of O3; (d) predicted and actual values of CO.
Electronics 09 01146 g005
Figure 6. A comparison graph of actual and predicted values for atmospheric pressure, temperature, wind speed, wind direction, and humidity. (a) Predicted and actual values of humidity; (b) predicted and actual values of wind speed, (c) predicted and actual values of atmospheric pressure; (d) predicted and actual values of wind direction; (e) predicted and actual values of temperature.
Figure 6. A comparison graph of actual and predicted values for atmospheric pressure, temperature, wind speed, wind direction, and humidity. (a) Predicted and actual values of humidity; (b) predicted and actual values of wind speed, (c) predicted and actual values of atmospheric pressure; (d) predicted and actual values of wind direction; (e) predicted and actual values of temperature.
Electronics 09 01146 g006
Figure 7. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 7. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g007
Figure 8. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with SO2. (a) The predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 8. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with SO2. (a) The predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g008
Figure 9. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with CO. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 9. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with CO. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g009
Figure 10. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with NO2: (a) predicted and actual values of PM2.5, and (b) predicted and actual values of PM10.
Figure 10. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with NO2: (a) predicted and actual values of PM2.5, and (b) predicted and actual values of PM10.
Electronics 09 01146 g010
Figure 11. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with O3. (a) predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 11. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with O3. (a) predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g011
Figure 12. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with atmospheric pressure. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 12. A graph of the actual and predicted values of PM10 and PM2.5 using weather data with atmospheric pressure. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g012
Figure 13. This figure is a graph of the actual and predicted values of PM10 and PM2.5 using weather data with wind_x and wind_y: (a) predicted and actual values of PM2.5, and (b) predicted and actual values of PM10.
Figure 13. This figure is a graph of the actual and predicted values of PM10 and PM2.5 using weather data with wind_x and wind_y: (a) predicted and actual values of PM2.5, and (b) predicted and actual values of PM10.
Electronics 09 01146 g013
Figure 14. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 14. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g014
Figure 15. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 15. A graph of the actual and predicted values of PM10 and PM2.5. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g015
Figure 16. A graph of actual and predicted values of PM10 and PM2.5 using training data. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Figure 16. A graph of actual and predicted values of PM10 and PM2.5 using training data. (a) Predicted and actual values of PM2.5; (b) predicted and actual values of PM10.
Electronics 09 01146 g016
Figure 17. A graph of actual and predicted values of PM10 and PM2.5 using test data: (a) predicted and actual values of PM2.5, (b) predicted and actual values of PM10.
Figure 17. A graph of actual and predicted values of PM10 and PM2.5 using test data: (a) predicted and actual values of PM2.5, (b) predicted and actual values of PM10.
Electronics 09 01146 g017
Table 1. Air Quality Standard.
Table 1. Air Quality Standard.
Countries1hour Average on PM1024hour Average on PM10An annual Average on PM1024hour Average on PM2.5An Annual Average on PM2.5
The Korea-100 μg/m350 μg/m335 μg/m315 μg/m3
USA-150 μg/m3-35 μg/m315 μg/m3
Japan200 μg/m3100 μg/m3-35 μg/m315 μg/m3
China-150 μg/m370 μg/m375 μg/m335 μg/m3
WHO-50 μg/m320 μg/m325 μg/m310 μg/m3
EU-50 μg/m340 μg/m3-25 μg/m3
Table 2. Stages of PM in the Korea.
Table 2. Stages of PM in the Korea.
StatusPM10PM2.5
Good0 ~ 30 μg/m30 ~ 15 μg/m3
Normal31 ~ 80 μg/m316 ~ 35 μg/m3
Bad81 ~ 150 μg/m336 ~ 75 μg/m3
Very bad150 μg/m3 over76 μg/m3 over
Table 3. The number of air pollution stations in each city and province.
Table 3. The number of air pollution stations in each city and province.
City NameThe Number of
Stations
Province NameThe Number of
Stations
Seoul40Gyeonggi-do108
Incheon25Gangwon-do25
Daejeon12Chungcheongsam-do34
Daegu15Chungcheongbuk-do29
Ulsan17Gyeongsangnuk-do42
Busan29Gyeongsangnam-do38
Gwangju11Jeollabuk-do32
Sejong4Jeollanam-do38
Jeju Island88
Table 4. The number of weather stations in each city and province.
Table 4. The number of weather stations in each city and province.
City nameThe Number of
Stations
Province NameThe Number of
Stations
Seoul2Gyeonggi-do5
Incheon3Gangwon-do15
Daejeon1Chungcheongsam-do6
Daegu2Chungcheongbuk-do5
Ulsan1Gyeongsangnuk-do14
Busan1Gyeongsangnam-do10
Gwangju1Jeollabuk-do16
Sejong1Jeollanam-do14
Jeju Island4
Table 5. Input data.
Table 5. Input data.
VariableUnitRangeVariableUnitRange
Address1Integer≥ 0PM10   μ g / m 3   0–400
Address2Integer ≥ 0temperature     −25–45
Address3Integer≥ 0precipitationmm≥ 0
Full AddressInteger≥ 0wind_xFloat−12–12
Solar TermsInteger0–23wind_yFloat−12–12
SO2ppm≥ 0humidity%0–100
COppm≥ 0atmospheric pressurehPa≥ 0
O3ppm≥ 0snowfallcm≥0
NO2ppm≥ 0IsRainInteger0 or 1
PM2.5   μ g / m 3   0–180IsSnowInteger0 or 1
Table 6. Variables of the LSTM model.
Table 6. Variables of the LSTM model.
VariableDescription
  σ   Logistic sigmoid function
tcurrent state. t - 1 means the previous state.
iunit of input gate. i t means a unit of input gate of the current state.
funit of forget gate. f t means a unit of forgetting gate of the current state.
ounit of output gate. o t means a unit of output gate of the current state.
xInput value. x t means the input value of the current state.
bBias value. b f means bias of forget gate.
WWeight matrix. W f means weight matrix of forget gate.
cCell for long term memory.
c t means cell for long term memory of the current state.
hhidden state for short term memory
h t means hidden state for short term memory of the current state.
Table 7. Variables of the GRU model.
Table 7. Variables of the GRU model.
VariableDescription
  σ   Logistic sigmoid function
tcurrent state. t - 1 means the previous state.
jIndex of hidden unit
rUnit of reset gate. r t j means unit of reset gate of current state when calculating the j-th hidden unit. It decides whether the previous hidden state is ignored.
zUnit of update gate. z t j means a unit of update gate of current state when calculating the j-th hidden unit. It selects whether the hidden state is to be updated with a new hidden state.
UWeight matrix. U r means weight matrix of reset gate.
WWeight matrix. W r means weight matrix of the reset gate.
h ˜ A new hidden state. h ˜ t j means a new hidden state of current state when calculating the j-th hidden unit.
  h   The hidden state. h t j means the hidden state of the current state when calculating the j-th hidden unit. The output is the hidden state.
Table 8. Variables.
Table 8. Variables.
VariableDescription
nThe number of data
iThe index of elements
  y i   . Real value
  y i   Predicted value
Table 9. Results of MAE and RMSE of air quality data.
Table 9. Results of MAE and RMSE of air quality data.
Evaluation MethodSO2CONO2O3
MAE0.00197677597746572130.132972663758705120.0045972239373422910.00520195667252871
RMSE0.0027831991088995030.191317797407251470.0065461739963658330.007070109851593688
Table 10. Results of MAE and RMSE of air quality data.
Table 10. Results of MAE and RMSE of air quality data.
Evaluation MethodAtmospheric PressureTemperatureWind SpeedWind DirectionHumidity
MAE9.7090052285263232.4525059292221880.949774864725389469.979047825899534.971006381407262
RMSE12.063793002074773.1249943242968231.123879479814399694.412063185908646.291656963869701
Table 11. Evaluation of the predicted PM using weather data.
Table 11. Evaluation of the predicted PM using weather data.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.4245099508302997.3869905850393827.8108680490632165.192483483811323
Table 12. Evaluation of the predicted PM using weather data with SO2.
Table 12. Evaluation of the predicted PM using weather data with SO2.
PM2.5 RMSE.PM2.5 MAEPM10 RMSEPM10 MAE
11.1505916581782627.2128699460396117.2660451011080835.150813651956774
Table 13. Evaluation of the predicted PM using weather data with CO.
Table 13. Evaluation of the predicted PM using weather data with CO.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
10.8760656702728036.9528686151633756.6467103651618754.482592977233096
Table 14. Evaluation of the predicted PM using weather data with NO2.
Table 14. Evaluation of the predicted PM using weather data with NO2.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
10.9976603345829327.118366065733226.9817891992186984.811013752937683
Table 15. Evaluation of the predicted PM using weather data with O3.
Table 15. Evaluation of the predicted PM using weather data with O3.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.6092105524599927.5749592310258526.8026527152477994.745732709395488
Table 16. Evaluation of the predicted PM using weather data with atmospheric pressure.
Table 16. Evaluation of the predicted PM using weather data with atmospheric pressure.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.4185519946543747.65191555026083057.0994083927196335.154189156530305
Table 17. Evaluation of the predicted PM using weather data with wind_x and wind_y.
Table 17. Evaluation of the predicted PM using weather data with wind_x and wind_y.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.23179327054647.3163462607073966.8779214161969694.751750217462248
Table 18. Evaluation of the predicted PM using weather data with the address format.
Table 18. Evaluation of the predicted PM using weather data with the address format.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.1881620982236947.3120100798569267.0477615460097684.76084091141095
Table 19. Evaluation of the predicted PM using weather data with 24 solar terms.
Table 19. Evaluation of the predicted PM using weather data with 24 solar terms.
PM2.5 RMSEPM2.5 MAEPM10 RMSEPM10 MAE
11.3783459493492787.27556438025994556.757674875454034.663318307737441
Table 20. Performance evaluation results according to the sequence length.
Table 20. Performance evaluation results according to the sequence length.
Sequence LengthPM2.5 MAEPM2.5 RMSEPM10 MAEPM10 RMSE
1 Day6.872363873649879610.8729076818462364.8896410184502857.023668509637945
2 Days7.0728357066972210.9227893430874044.7522564566627036.864557932586567
3 Days8.01026984420251511.5654690910247565.4191691065787197.81227016304491
4 Days7.19396261769157511.1392973128305825.0142632925722957.125953867868762
5 Days7.071206378605304510.8850982579267214.6830818787613336.8245960495513325
6 Days6.95095759429455210.8773985466018324.88971487653236557.075202116985077
7 Days6.74857448852466310.6668117841052744.5762226016734986.706854128280908
8 Days8.31148739463837512.5243986146933666.0377625532111659.065500584144592
9 Days7.4598539208447111.0879435117614664.8691465369258536.957451893754408
10 Days7.06872742949577310.98476806133045.325572398451558.05212499374417
11 Days7.48635151990818711.4377789471961474.8393757734307987.135375508314753
12 Days7.06029940794360611.1256596519224886.1160404766716339.085695877353128
13 Days7.3588163495958611.3746451960903566.709976497376290510.786191153184818
14 Days7.05782547535288410.8548936420565014.9670737567806286.993131797131667
Table 21. Performance evaluation results according to the batch size.
Table 21. Performance evaluation results according to the batch size.
Batch SizePM2.5 MAEPM2.5 RMSEPM10 MAEPM10 RMSE
646.872363873649879610.8729076818462364.8896410184502857.023668509637945
966.91524855591700810.8722285859317064.56996935354370856.6472971974889905
1287.29077878438935811.3581007460706865.0905866593742377.652284409114551
1607.19533910169431510.95630825625994.8200490600249046.8492290512882485
1927.008031797103210.9256347925822785.2508789408469367.90720787936262
2567.62793778435802511.2553324517767454.79170527695988656.9397277896506875
5126.83328280516192710.7401583882377524.6916288739167716.85637822970579
Table 22. Performance evaluation result according to each model.
Table 22. Performance evaluation result according to each model.
Batch SizePM2.5 MAEPM2.5 RMSEPM10 MAEPM10 RMSE
CNN10.2172329495623513.5014151512389049.28052518718079313.293511747301356
LSTM7.20196505664524210.9642012148858064.6728492562851376.92387169617792
GRU7.79699900069937611.9863319100147596.5528757466647099.338689775858853

Share and Cite

MDPI and ACS Style

Chae, M.; Han, S.; Lee, H. Outdoor Particulate Matter Correlation Analysis and Prediction Based Deep Learning in the Korea. Electronics 2020, 9, 1146. https://doi.org/10.3390/electronics9071146

AMA Style

Chae M, Han S, Lee H. Outdoor Particulate Matter Correlation Analysis and Prediction Based Deep Learning in the Korea. Electronics. 2020; 9(7):1146. https://doi.org/10.3390/electronics9071146

Chicago/Turabian Style

Chae, Minsu, Sangwook Han, and HwaMin Lee. 2020. "Outdoor Particulate Matter Correlation Analysis and Prediction Based Deep Learning in the Korea" Electronics 9, no. 7: 1146. https://doi.org/10.3390/electronics9071146

APA Style

Chae, M., Han, S., & Lee, H. (2020). Outdoor Particulate Matter Correlation Analysis and Prediction Based Deep Learning in the Korea. Electronics, 9(7), 1146. https://doi.org/10.3390/electronics9071146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop