Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling
Abstract
:1. Introduction
- (1)
- Potential forecasting methods have been proposed. By summarizing weather conditions during past pollution events, mathematical methods can be used to quantitatively describe the possibility of certain changes in future weather conditions [10]. It is often used in severe weather forecasting and weather modification operations [11]. While these methods offer convenience and simplicity, their outcomes are exclusively dictated by weather conditions and meteorological parameters. This implies that the influence of actual emissions on the forecasts is disregarded, leading to a compromise in forecast accuracy.
- (2)
- Regression statistical models require a large number of analyses to establish a complex linear or nonlinear relationship between identified impact factors and pollutant concentration [12]. Future trends can be inferred through the input and output patterns related to air pollution. However, it is difficult to describe this relationship with a definite mathematical model. Although these methods are characterized by low input data requirements, the predicted outcomes typically pertain to point air quality data, which fall short of elucidating the underlying causes of pollution.
- (3)
- Numerical weather predictions are quantitative and objective predictions based on physicochemical processes. Numerical predictions can clearly reflect the air quality of all grid points in a certain region, determine the pollution causes, and have strong interpretability [13]. The precision of numerical forecasting is contingent upon the establishment of a relatively accurate numerical model, necessitating the employment of high-performance computing resources and comprehensive data on the emission parameters from pollution sources, as well as detailed meteorological information. Fulfilling these prerequisites is challenging, and the associated analysis costs are substantial.
2. Methods and Models
2.1. Data Source and Processing
2.1.1. Data Cleaning
2.1.2. Data Normalization
2.1.3. Data Preprocessing
- (A)
- Test for normal distribution
- (B)
- Autocorrelation analysis of variables
2.2. Classification and Characteristic Analysis of Meteorological Conditions
2.2.1. Univariate Significance Analysis
2.2.2. Ranking of Variable Influence Degree Based on a Random Forest Model
- (1)
- The importance of each meteorological condition factor was calculated and sorted in descending order.
- (2)
- Based on the feature importance ranking in (1), the factor proportions of the independent variables to be eliminated were determined, and a new feature set was obtained.
- (3)
- The above process was repeated with the new independent variable features until there were m remaining features (m is the set value).
- (4)
- According to each feature set obtained in the above process and the corresponding out-of-pocket error rate of the feature set, the feature set with the lowest out-of-pocket error rate was selected.
2.2.3. Analysis of the Seasonal Characteristics of Pollutant Concentrations and Meteorological Conditions
2.3. Air Quality Prediction Using a Presiciton Model with Mixed Monitoring Sites
2.3.1. Data Preprocessing
2.3.2. Multiclassification Model of Primary Pollutants Based on Machine Learning
- (1)
- Weighted model: multiple weighted regression prediction model [40].
- (2)
- LightGBM model: This model is a distributed lightweight gradient boosting framework based on the gradient boosting decision tree algorithm [41]. The LightGBM has the advantages of simple operation, strong expansibility, high accuracy, and strong robustness.
- (3)
- LR model: This model is used to express the likelihood of a target time [42]. The LR model is also used for discrete variable classification and probability prediction.
- (4)
- RF model: The RF model is a supervised learning algorithm based on a decision tree, and the selection of random features is further considered [43]. Classification prediction is achieved based on N decision tree classifications, and the final result is obtained through voting.
Model | Highlights | Advantages | Disadvantages |
---|---|---|---|
Weighted model | Essentially a non-parametric learning algorithm | The data itself exhibit good adaptability | Requires a large amount of computation |
LightGBM model | Adopted a leaf-wise splitting strategy | Supports parallel learning, enabling more efficient processing of large datasets | Consumes a substantial amount of memory |
LR model | Essentially a linear classifier | The model is clear and has probabilistic significance | Yields inferior predictive performance |
RF model | Introduced stochastic feature selection | Typically converges to a lower generalization error | Inferior initial performance and prone to overfitting |
2.3.3. Air Quality Regression Prediction Model Based on a Neural Network
- (1)
- The BP neural network algorithm carries out gradient back propagation on the error obtained by the objective function calculation of the feedforward neural network and adjusts the network parameters by calculating the error between the output layer value and the expected value to reduce error [45]. The structure of the BP neural network is divided into three layers: an input layer, a hidden layer, and an output layer. Each network layer only affects the next layer. If the prediction result is too different from the expected value, the parameters are adjusted through back propagation, and the most appropriate parameters are obtained to establish the model.
- (2)
- The LSTM model is a variant of the recurrent neural network (RNN) that was proposed to improve on the RNN. LSTM can change the weight of self-loops by adding an input gate, a forget gate and an output gate, alleviating the problems of gradient disappearance and gradient explosion during model training [46]. In addition, LSTM has excellent advantages in dealing with nonlinear time series data.
2.3.4. Evaluation Indices
3. Results and Discussions
3.1. Data Preprocessing Result
3.1.1. Test for Normal Distribution
3.1.2. Autocorrelation Analysis of the Dependent Variables
3.1.3. Autocorrelation Analysis of the Independent Variables
3.2. Classification and Analysis of Meteorological Conditions
3.2.1. Univariate Significance Analysis
3.2.2. Ranking of the Variable Influence Degree Based on the Random Forest Model
3.2.3. Analysis of the Seasonal Characteristics of Pollutant Concentrations and Meteorological Conditions
3.3. Air Quality Secondary Forecast Results
3.3.1. Multiclassification Prediction of Primary Pollutants Based on Machine Learning
3.3.2. Air Quality Prediction Based on Machine Learning
4. Conclusions
- (1)
- Through univariate and multivariate significance analysis, alongside a random forest-based method for multivariate importance ranking, we categorized and prioritized ten meteorological variables based on their impact on various pollutant concentrations. This approach enables a nuanced understanding of environmental factors influencing air quality.
- (2)
- We examined the seasonal distribution patterns of six key pollutants and analyzed the relationships between these pollutants and ten meteorological factors across different seasons. Our analysis uncovered that temperature, humidity, air pressure, and atmospheric conditions have a significant seasonal influence on pollutant concentrations, highlighting the necessity of incorporating seasonal dynamics into air quality forecasting models.
- (3)
- The evaluation of machine learning-based classification prediction models revealed the superior performance of the LightGBM classifier, achieving an accuracy of 97.5% and an F1 score of 93.3%. This finding underscores the effectiveness of the LightGBM model in air quality classification tasks.
- (4)
- In terms of AQI prediction, the LSTM model emerged as the most effective, demonstrating a high goodness-of-fit. The model achieved a 91.37% fit for AQI prediction, 90.46% for O3 prediction, and a perfect 100% for forecasting concentrations of primary pollutants in the test set. These results highlight the LSTM model’s potential in providing accurate air quality forecasts.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Acronyms | Meanings | Acronyms | Meanings |
---|---|---|---|
T | Measured temperature | R1p | The first forecast of rainfall |
H | Measured humidity | C1p | The first forecast of cloud amount |
AP | Measured air pressure | BH1p | The first forecast of the boundary layer height |
WS | Measured wind speed | AP1p | The first forecast of the air pressure |
WD | Measured wind direction | SHF1p | The first forecast of the sensible heat flux |
T1p | The first temperature forecast of 2 m near the ground | LHF1p | The first forecast of the latent heat flux |
K1p | The first forecast of the land surface temperature | OLR1p | The first forecast of the long-wave radiation |
SH1p | The first forecast of the specific humidity | SWR1p | The first forecast of the shortwave radiation |
H1p | The first forecast of the specific humidity | SSR1p | The first forecast of the surface solar radiation |
WS1p | The first wind speed forecast of 2 m near the ground | SO2(1p) | The first forecast of hourly mean SO2 concentration |
WD1p | The first wind direction forecast of 2 m near the ground | NO2(1p) | The first forecast of hourly mean NO2 concentration |
O3(1p) | The first forecast of hourly mean O3 concentration | PM2.5(1p) | The first forecast of hourly mean PM2.5 concentration |
CO1p | The first forecast of hourly mean CO concentration | PM10(1p) | The first forecast of hourly mean PM10 concentration |
References
- Suriano, D. Preface to State-of-the-Art in Real-Time Air Quality Monitoring through Low-Cost Technologies. Atmosphere 2023, 14, 554. [Google Scholar] [CrossRef]
- Li, X.; Hu, Z.; Cao, J.; Xu, X. The impact of environmental accountability on air pollution: A public attention perspective. Energy Policy 2022, 161, 112733. [Google Scholar] [CrossRef]
- Liu, Z.; Chen, Y.; Gu, X.; Yeoh, J.K.; Zhang, Q. Visibility classification and influencing-factors analysis of airport: A deep learning approach. Atmos. Environ. 2022, 278, 119085. [Google Scholar] [CrossRef]
- Kumari, S.; Jain, M.K. A critical review on air quality index. In Environmental Pollution: Select Proceedings of ICWEES-2016; Springer: Singapore, 2018; pp. 87–102. [Google Scholar]
- Zhu, Z.; Qiao, Y.; Liu, Q.; Lin, C.; Dang, E.; Fu, W.; Wang, G.; Dong, J. The impact of meteorological conditions on Air Quality Index under different urbanization gradients: A case from Taipei. Environ. Dev. Sustain. 2021, 23, 3994–4010. [Google Scholar] [CrossRef]
- Liu, H.; Sun, Y.; Tan, C.; Ho, C.; Zhao, L.; Hove, A. Toward the Development of an Empirical Model of Air Pollution Impact on Solar PV Output for Industry Use. IEEE J. Photovolt. 2023, 13, 991–997. [Google Scholar] [CrossRef]
- Singh, K.P.; Gupta, S.; Kumar, A.; Shukla, S.P. Linear and nonlinear modeling approaches for urban air quality prediction. Sci. Total Environ. 2012, 426, 244–255. [Google Scholar] [CrossRef] [PubMed]
- Kimura, R. Numerical weather prediction. J. Wind Eng. Ind. Aerodyn. 2002, 90, 1403–1414. [Google Scholar] [CrossRef]
- Wang, A.; Xu, J.; Tu, R.; Saleh, M.; Hatzopoulou, M. Potential of machine learning for prediction of traffic related air pollution. Transp. Res. Part D Transp. Environ. 2020, 88, 102599. [Google Scholar] [CrossRef]
- Wu, Q.; Lin, H. A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci. Total Environ. 2019, 683, 808–821. [Google Scholar] [CrossRef]
- Penza, M.; Suriano, D.; Pfister, V.; Prato, M.; Cassano, G. Urban Air Quality Monitoring with Networked Low-Cost Sensor-Systems. Proceedings 2017, 1, 573. [Google Scholar] [CrossRef]
- Dėdelė, A.; Miškinytė, A. The statistical evaluation and comparison of ADMS-Urban model for the prediction of nitrogen dioxide with air quality monitoring network. Environ. Monit. Assess. 2015, 187, 578. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Bai, M.; Zhang, Y.; Liu, J.; Yu, D. Multivariable space-time correction for wind speed in numerical weather prediction (NWP) based on ConvLSTM and the prediction of probability interval. Earth Sci. Inform. 2023, 16, 1953–1974. [Google Scholar] [CrossRef]
- Azid, A.; Juahir, H.; Toriman, M.E.; Kamarudin, M.K.A.; Saudi, A.S.M.; Hasnam, C.N.C.; Aziz, N.A.A.; Azaman, F.; Latif, M.T.; Zainuddin, S.F.M. Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia. Water Air Soil Pollut. 2014, 225, 2063. [Google Scholar] [CrossRef]
- Mishra, D.; Goyal, P.; Upadhyay, A. Artificial intelligence based approach to forecast PM2. 5 during haze episodes: A case study of Delhi, India. Atmos. Environ. 2015, 102, 239–248. [Google Scholar] [CrossRef]
- Su, Y.; Xie, H. Prediction of aqi by bp neural network based on genetic algorithm. In Proceedings of the 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 19–20 September 2020; pp. 625–629. [Google Scholar]
- Kow, P.-Y.; Chang, L.-C.; Lin, C.-Y.; Chou, C.C.-K.; Chang, F.-J. Deep neural networks for spatiotemporal PM2.5 forecasts based on atmospheric chemical transport model output and monitoring data. Environ. Pollut. 2022, 306, 119348. [Google Scholar] [CrossRef] [PubMed]
- Bai, L.; Wang, J.; Ma, X.; Lu, H. Air pollution forecasts: An overview. Int. J. Environ. Res. Public Health 2018, 15, 780. [Google Scholar] [CrossRef]
- Zhen, M.; Yi, M.; Luo, T.; Wang, F.; Yang, K.; Ma, X.; Cui, S.; Li, X. Application of a Fusion Model Based on Machine Learning in Visibility Prediction. Remote Sens. 2023, 15, 1450. [Google Scholar] [CrossRef]
- Zhang, G.; Martens, J.; Grosse, R.B. Fast convergence of natural gradient descent for over-parameterized neural networks. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Liu, Z.; Gu, X.; Yang, H.; Wang, L.; Chen, Y.; Wang, D. Novel YOLOv3 model with structure and hyperparameter optimization for detection of pavement concealed cracks in GPR images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22258–22268. [Google Scholar] [CrossRef]
- Wang, H.; Guo, L. Research on face recognition based on deep learning. In Proceedings of the 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), Manchester, UK, 23–25 October 2021; pp. 540–546. [Google Scholar]
- Liu, Z.; Gu, X.; Chen, J.; Wang, D.; Chen, Y.; Wang, L. Automatic recognition of pavement cracks from combined GPR B-scan and C-scan images using multiscale feature fusion deep neural networks. Autom. Constr. 2023, 146, 104698. [Google Scholar] [CrossRef]
- Feng, D.; Feng, M.Q. Computer vision for SHM of civil infrastructure: From dynamic response measurement to damage detection—A review. Eng. Struct. 2018, 156, 105–117. [Google Scholar] [CrossRef]
- Wang, D.; Liu, Z.; Gu, X.; Wu, W. Feature extraction and segmentation of pavement distress using an improved hybrid task cascade network. Int. J. Pavement Eng. 2023, 24, 2266098. [Google Scholar] [CrossRef]
- Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on GPR investigation and improved mask R-CNN. Autom. Constr. 2023, 146, 104689. [Google Scholar] [CrossRef]
- Almaliki, A.H.; Derdour, A.; Ali, E. Air Quality Index (AQI) Prediction in Holy Makkah Based on Machine Learning Methods. Sustainability 2023, 15, 13168. [Google Scholar] [CrossRef]
- Liang, Y.-C.; Maimury, Y.; Chen, A.H.-L.; Juarez, J.R.C. Machine learning-based prediction of air quality. Appl. Sci. 2020, 10, 9151. [Google Scholar] [CrossRef]
- Ma, J.; Ding, Y.; Cheng, J.C.; Jiang, F.; Tan, Y.; Gan, V.J.; Wan, Z. Identification of high impact factors of air quality on a national scale using big data and machine learning techniques. J. Clean. Prod. 2020, 244, 118955. [Google Scholar] [CrossRef]
- Guo, Y.; Li, K.; Zhao, B.; Shen, J.; Bloss, W.J.; Azzi, M.; Zhang, Y. Evaluating the real changes of air quality due to clean air actions using a machine learning technique: Results from 12 Chinese mega-cities during 2013–2020. Chemosphere 2022, 300, 134608. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Zhu, Q.; Yao, D.; Xu, W. Forecasting urban air quality via a back-propagation neural network and a selection sample rule. Atmosphere 2015, 6, 891–907. [Google Scholar] [CrossRef]
- Zhu, H.; Lu, X. The prediction of PM2. 5 value based on ARMA and improved BP neural network model. In Proceedings of the 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS), Ostrava, Czech Republic, 7–9 September 2016; pp. 515–517. [Google Scholar]
- Pardo, E.; Malpica, N. Air quality forecasting in Madrid using long short-term memory networks. In Biomedical Applications Based on Natural and Artificial Computing, Proceedings of the International Work-Conference on the Interplay between Natural and Artificial Computation, Corunna, Spain, 19–23 June 2017; Springer: Cham, Switzerland, 2021; pp. 232–239. [Google Scholar]
- Du, S.; Li, T.; Yang, Y.; Horng, S.-J. Deep air quality forecasting using hybrid deep learning framework. IEEE Trans. Knowl. Data Eng. 2019, 33, 2412–2424. [Google Scholar] [CrossRef]
- Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
- Liu, Z.; Cui, B.; Yang, Q.; Gu, X. Sensor-Based Structural Health Monitoring of Asphalt Pavements with Semi-Rigid Bases Combining Accelerated Pavement Testing and a Falling Weight Deflectometer Test. Sensors 2024, 24, 994. [Google Scholar] [CrossRef]
- Liu, Z.; Yang, Q.; Wang, A.; Gu, X. Vehicle Driving Safety of Underground Interchanges Using a Driving Simulator and Data Mining Analysis. Infrastructures 2024, 9, 28. [Google Scholar] [CrossRef]
- Bradter, U.; Altringham, J.D.; Kunin, W.E.; Thom, T.J.; O’Connell, J.; Benton, T.G. Variable ranking and selection with random forest for unbalanced data. Environ. Data Sci. 2022, 1, e30. [Google Scholar] [CrossRef]
- Perlmutt, L.; Stieb, D.; Cromar, K. Accuracy of quantification of risk using a single-pollutant Air Quality Index. J. Expo. Sci. Environ. Epidemiol. 2017, 27, 24–32. [Google Scholar] [CrossRef] [PubMed]
- Lu, B.; Harris, P.; Charlton, M.; Brunsdon, C. The GWmodel R package: Further topics for exploring spatial heterogeneity using geographically weighted models. Geo-Spat. Inf. Sci. 2014, 17, 85–101. [Google Scholar] [CrossRef]
- Liu, X.; Zhao, K.; Liu, Z.; Wang, L. PM2.5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm. Atmosphere 2023, 14, 1612. [Google Scholar] [CrossRef]
- Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef] [PubMed]
- Sheridan, R.P. Using random forest to model the domain applicability of another random forest model. J. Chem. Inf. Model. 2013, 53, 2837–2850. [Google Scholar] [CrossRef] [PubMed]
- Singh, J.; Sandhu, J.K.; Kumar, Y. An analysis of detection and diagnosis of different classes of skin diseases using artificial intelligence-based learning approaches with hyper parameters. Arch. Comput. Methods Eng. 2023, 32, 1051–1078. [Google Scholar] [CrossRef]
- Moshkbar-Bakhshayesh, K.; Ghofrani, M.B. Development of an efficient identifier for nuclear power plant transients based on latest advances of error back-propagation learning algorithm. IEEE Trans. Nucl. Sci. 2014, 61, 602–610. [Google Scholar] [CrossRef]
- Chen, H.; Guan, M.; Li, H. Air quality prediction based on integrated dual LSTM model. IEEE Access 2021, 9, 93285–93297. [Google Scholar] [CrossRef]
- Chen, M.-R.; Zeng, G.-Q.; Lu, K.-D.; Weng, J. A two-layer nonlinear combination method for short-term wind speed prediction based on ELM, ENN, and LSTM. IEEE Internet Things J. 2019, 6, 6997–7010. [Google Scholar] [CrossRef]
- Parmezan, A.R.S.; Souza, V.M.; Batista, G.E. Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
Parameters | Num Leaves | Boosting Type | N Estimators | Max Depth | Learning Rate | Colsample by Tree | Reg Alpha | Reg Lambda | Subsample |
---|---|---|---|---|---|---|---|---|---|
Values | 4402 | dart | 185 | 3 | 0.410088 | 0.92867 | 2.5477 | 4.63762 | 0.5363 |
Factor Analysis of Independent Variables | Data Distribution Normality Test | ||||||
---|---|---|---|---|---|---|---|
Statistical Analysis | Shapiro–Wilk Test | ||||||
Skewness | Standard Error of SKEWNESS | Kurtosis | Standard Error of Kurtosis | Statistic p-Value | Degree of Freedom | Significance | |
T | −0.69 | 0.13 | −0.10 | 0.26 | 0.949 | 352 | 0.000 |
H | −0.99 | 0.13 | 1.32 | 0.26 | 0.933 | 352 | 0.000 |
AP | 0.22 | 0.13 | −0.85 | 0.26 | 0.975 | 352 | 0.000 |
WS | 0.98 | 0.13 | 1.49 | 0.26 | 0.945 | 352 | 0.000 |
WD | 0.20 | 0.13 | −0.69 | 0.26 | 0.983 | 352 | 0.000 |
T1p | −0.72 | 0.13 | −0.14 | 0.26 | 0.938 | 352 | 0.000 |
K1p | −0.64 | 0.13 | −0.30 | 0.26 | 0.944 | 352 | 0.000 |
SH1p | −0.34 | 0.13 | −0.65 | 0.26 | 0.755 | 352 | 0.000 |
H1p | −1.26 | 0.13 | 2.28 | 0.26 | 0.916 | 352 | 0.000 |
WS1p | 0.42 | 0.13 | 0.31 | 0.26 | 0.988 | 352 | 0.004 |
WD1p | −0.11 | 0.13 | −0.70 | 0.26 | 0.981 | 352 | 0.000 |
R1p | 4.37 | 0.13 | 29.02 | 0.26 | 0.521 | 352 | 0.000 |
C1p | −0.03 | 0.13 | −1.02 | 0.26 | 0.971 | 352 | 0.000 |
BP1p | −0.09 | 0.13 | −0.25 | 0.26 | 0.996 | 352 | 0.509 |
AP1p | 0.11 | 0.13 | −0.98 | 0.26 | 0.972 | 352 | 0.000 |
SHF1p | −0.12 | 0.13 | −0.77 | 0.26 | 0.984 | 352 | 0.001 |
LHF1p | −0.09 | 0.13 | −1.19 | 0.26 | 0.957 | 352 | 0.000 |
OLR1p | −0.74 | 0.13 | −0.29 | 0.26 | 0.916 | 352 | 0.000 |
SWR1p | −0.47 | 0.13 | 0.32 | 0.26 | 0.978 | 352 | 0.000 |
SSR1p | −0.69 | 0.13 | −0.10 | 0.26 | 0.949 | 352 | 0.000 |
Preliminary Screening Index | Strong Correlation Indicators (r > 0.8) | Collinearity Diagnostics (Take O3 as an Example) | |||||
---|---|---|---|---|---|---|---|
Collinearity Statistics | Collinearity Diagnostics | ||||||
Model | Tolerance | VIF | Model | Eigenvalue | Condition Index | ||
T | T1p (r = 0.98) LHF1p (r = 0.88) OLR1p (r = 0.88) | constant | 1 | 4.938 | 1.000 | ||
T | 0.034 | 29.374 | 2 | 0.048 | 10.111 | ||
T1p | 0.016 | 63.686 | 3 | 0.012 | 20.686 | ||
LHF1p | 0.215 | 4.655 | 4 | 0.002 | 46.873 | ||
OLP1p | 0.093 | 10.788 | 5 | 0.000 | 110.910 | ||
H | H1p (r = 0.93) OLP1p (r = 0.82) | constant | 1 | 3.966 | 1.000 | ||
H | 0.153 | 6.536 | 2 | 0.025 | 12.490 | ||
H1p | 0.197 | 5.068 | 3 | 0.006 | 25.954 | ||
OLP1p | 0.463 | 2.159 | 4 | 0.003 | 36.456 | ||
AP | AP1p (r = 1.00) K1p (r = 0.98) SH1p (r = 0.91) | constant | 1 | 4.857 | 1.000 | ||
AP | 0.013 | 79.543 | 2 | 0.143 | 5.829 | ||
AP1p | 0.011 | 87.110 | 3 | 0.000 | 208.564 | ||
K1p | 0.151 | 6.635 | 4 | 3.065 × 10−6 | 1258.909 | ||
SH1p | 0.217 | 4.617 | 5 | 2.312 × 10−7 | 4582.906 | ||
WS | WS1p (r = 0.82) | constant | 1 | 2.922 | 1.000 | ||
WS | 0.384 | 2.604 | 2 | 0.058 | 7.075 | ||
WS1p | 0.384 | 2.604 | 3 | 0.020 | 12.097 | ||
SHF1p | LHF1p (r = 0.91) SSR1p (r = 0.90) SWR1p (r = 0.90, eliminate) | constant | 1 | 3.935 | 1.000 | ||
SHF1p | 0.137 | 7.315 | 2 | 0.047 | 9.151 | ||
LHF1p | 0.212 | 4.723 | 3 | 0.012 | 18.160 | ||
SSR1p | 0.240 | 4.175 | 4 | 0.006 | 26.284 | ||
WD | None | None | |||||
WD1p | None | ||||||
R1p | None | ||||||
C1p | None | ||||||
BP1p | None |
Model Dimension | Eigenvalue | Condition Index | Proportion of Variance (%) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Constant | T | H | AP | WS | WD | WD1p | R1p | C1p | BP1p | SHF1p | |||
1 | 9.50 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2 | 0.84 | 3.37 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.48 | 0.00 | 0.00 | 0.00 |
3 | 0.28 | 5.85 | 0.00 | 0.00 | 0.00 | 0.00 | 0.03 | 0.06 | 0.05 | 0.04 | 0.16 | 0.00 | 0.00 |
4 | 0.16 | 7.63 | 0.00 | 0.00 | 0.00 | 0.00 | 0.05 | 0.23 | 0.00 | 0.02 | 0.27 | 0.01 | 0.01 |
5 | 0.11 | 9.19 | 0.00 | 0.00 | 0.01 | 0.00 | 0.15 | 0.39 | 0.05 | 0.14 | 0.10 | 0.00 | 0.00 |
6 | 0.04 | 14.92 | 0.00 | 0.00 | 0.04 | 0.00 | 0.21 | 0.00 | 0.15 | 0.01 | 0.30 | 0.00 | 0.03 |
7 | 0.03 | 18.43 | 0.00 | 0.01 | 0.00 | 0.00 | 0.31 | 0.27 | 0.52 | 0.03 | 0.01 | 0.07 | 0.12 |
8 | 0.02 | 23.52 | 0.00 | 0.00 | 0.63 | 0.00 | 0.11 | 0.02 | 0.03 | 0.12 | 0.09 | 0.13 | 0.04 |
9 | 0.01 | 26.75 | 0.00 | 0.31 | 0.00 | 0.00 | 0.10 | 0.01 | 0.01 | 0.05 | 0.01 | 0.00 | 0.45 |
10 | 0.01 | 34.68 | 0.00 | 0.16 | 0.30 | 0.00 | 0.00 | 0.02 | 0.06 | 0.09 | 0.02 | 0.78 | 0.35 |
Correlation | Partial part of zero-order | 0.21 | −0.21 | −0.05 | −0.31 | −0.05 | 0.13 | −0.25 | −0.36 | 0.17 | 0.12 | ||
0.31 | −0.32 | 0.09 | −0.37 | −0.10 | −0.09 | −0.18 | −0.15 | −0.05 | 0.07 | ||||
0.25 | −0.25 | 0.07 | −0.30 | −0.08 | −0.07 | −0.14 | −0.11 | −0.04 | 0.06 | ||||
Collinearity statistics | tolerance | 0.15 | 0.46 | 0.14 | 0.58 | 0.66 | 0.28 | 0.55 | 0.63 | 0.30 | 0.28 | ||
statistics | 6.85 | 2.18 | 7.23 | 1.71 | 1.51 | 3.56 | 1.82 | 1.60 | 3.35 | 3.61 |
Variables | T | H | AP | WS | WD | WD1p | R1p | C1p | BP1p | SHF1p | |
---|---|---|---|---|---|---|---|---|---|---|---|
SO2 | R2 | 0.273 | 0.49 | 0.258 | 0.233 | 0.334 | 0.362 | 0.132 | 0.284 | 0.329 | 0.405 |
F | 1.191 | 2.966 | 1.074 | 1.286 | 1.498 | 2.359 | 1.374 | 1.149 | 1.58 | 2.132 | |
Significance | 0.151 | 0 | 0.331 | 0.084 | 0.008 | 0 | 0.084 | 0.201 | 0.003 | 0 | |
NO2 | R2 | 0.521 | 0.419 | 0.443 | 0.437 | 0.31 | 0.431 | 0.142 | 0.315 | 0.555 | 0.536 |
F | 3.46 | 2.22 | 2.454 | 3.293 | 1.34 | 3.15 | 1.492 | 1.333 | 4.025 | 3.619 | |
Significance | 0 | 0 | 0 | 0 | 0.04 | 0 | 0.041 | 0.042 | 0 | 0 | |
CO | R2 | 0.556 | 0.367 | 0.532 | 0.37 | 0.292 | 0.352 | 0.207 | 0.265 | 0.443 | 0.417 |
F | 3.98 | 1.787 | 3.504 | 2.493 | 1.233 | 2.262 | 2.351 | 1.045 | 2.564 | 2.237 | |
Significance | 0 | 0 | 0 | 0 | 0.105 | 0 | 0 | 0.388 | 0 | 0 | |
O3 | R2 | 0.293 | 0.393 | 0.225 | 0.243 | 0.271 | 0.166 | 0.128 | 0.364 | 0.252 | 0.266 |
F | 1.317 | 1.996 | 0.896 | 1.363 | 1.114 | 0.828 | 1.324 | 1.661 | 1.085 | 1.135 | |
Significance | 0.053 | 0 | 0.722 | 0.045 | 0.257 | 0.823 | 0.111 | 0.001 | 0.31 | 0.226 | |
PM2.5 | R2 | 0.558 | 0.386 | 0.482 | 0.432 | 0.338 | 0.284 | 0.248 | 0.263 | 0.439 | 0.456 |
F | 4.014 | 1.939 | 2.863 | 3.228 | 1.528 | 1.653 | 2.977 | 1.037 | 2.523 | 2.625 | |
Significance | 0 | 0 | 0 | 0 | 0.006 | 0.003 | 0 | 0.405 | 0 | 0 | |
PM10 | R2 | 0.361 | 0.253 | 0.331 | 0.163 | 0.277 | 0.232 | 0.184 | 0.331 | 0.253 | 0.227 |
F | 1.797 | 1.042 | 1.524 | 0.826 | 1.144 | 1.254 | 2.03 | 1.436 | 1.092 | 0.917 | |
Significance | 0 | 0.395 | 0.006 | 0.825 | 0.21 | 0.106 | 0.001 | 0.015 | 0.299 | 0.676 |
Pollutant Sources | Order of Influence Degree of Meteorological Conditions |
---|---|
SO2 | H > WD1p > WS > SHF1p > T > R1p > AP > BP1p > WD > C1p |
NO2 | WS > SHF1p > WD1p > T > AP > BP1p > H > R1p > C1p > WD |
CO | WS > AP > SHF1p > T > H > BP1p > C1p > R1p > WD > WD1p |
O3 | H > WS > T > C1p > SHF1p > BP1p > AP > R1p > WD > WD1p |
PM2.5 | WS > AP > H > SHF1p > T > R1p > WD1p > BP1p > C1p > WD |
PM10 | T > R1p > WD1p > SHF1p > AP > BP1p > WS > C1p > H > WD |
Model | Precision (P) | Accuracy | Recall Rate ® | F1 Score |
---|---|---|---|---|
LGBMClassifier | 97.5% | 92.5% | 89.5% | 93.3% |
WEIGHTEDClassifier | 95.6% | 91.5% | 87.5% | 91.4% |
LRClassifier | 96.5% | 91.4% | 87.5% | 88.2% |
RFClassifier | 95.6% | 85.8% | 81.8% | 83.5% |
Prediction Results of Key Indicators | LSTM Model | RF Model | ARIMA Model | WEIGHTED Model | LR Model | BP Neural Network | LGBM Model | |
---|---|---|---|---|---|---|---|---|
Prediction results of AQI | MAE | 5.4473 | 7.0214 | 7.7041 | 7.5150 | 8.4125 | 9.9681 | 10.5125 |
MSE | 51.0266 | 69.5979 | 65.7058 | 79.4924 | 75.9084 | 90.6030 | 118.1841 | |
RMSE | 7.1433 | 8.3425 | 8.1059 | 8.9159 | 8.7125 | 9.5186 | 10.8713 | |
R2 | 91.37% | 88.25% | 84.53% | 79.54% | 77.51% | 72.31% | 68.12% | |
Prediction results of O3 | MAE | 11.2485 | 13.5961 | 16.0979 | 20.9681 | 24.0815 | 28.8185 | 27.5191 |
MSE | 273.0674 | 363.1590 | 573.9719 | 833.4249 | 1153.2884 | 1083.6276 | 1164.8501 | |
RMSE | 16.5248 | 19.0567 | 23.9577 | 28.8691 | 33.9601 | 32.9185 | 34.1299 | |
R2 | 90.46% | 85.33% | 82.18% | 77.54% | 76.51% | 75.58% | 64.33% | |
Prediction of major pollutant | 20 (100%) | 18 (90%) | 15 (75%) | 16 (80%) | 16 (80%) | 15 (75%) | 15 (75%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Q.; Cui, B.; Liu, Z. Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling. Atmosphere 2024, 15, 553. https://doi.org/10.3390/atmos15050553
Liu Q, Cui B, Liu Z. Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling. Atmosphere. 2024; 15(5):553. https://doi.org/10.3390/atmos15050553
Chicago/Turabian StyleLiu, Qian, Bingyan Cui, and Zhen Liu. 2024. "Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling" Atmosphere 15, no. 5: 553. https://doi.org/10.3390/atmos15050553
APA StyleLiu, Q., Cui, B., & Liu, Z. (2024). Air Quality Class Prediction Using Machine Learning Methods Based on Monitoring Data and Secondary Modeling. Atmosphere, 15(5), 553. https://doi.org/10.3390/atmos15050553