Optimizing Regression Models for Predicting Noise Pollution Caused by Road Traffic
Abstract
:1. Introduction
- The dataset generated in this study was specifically designed to predict noise-pollution levels in public parks. The researchers collected data on various landscape configurations within the parks, such as the presence of trees, water bodies, and built structures. These data were then used to predict the levels of noise pollution in the parks. The generated dataset is a valuable resource for future studies as it provides a comprehensive and diverse representation of public-park landscapes and the associated noise-pollution levels;
- The hyperparameters in a machine-learning model are parameters that are set before training the model. They influence the learning process and can greatly affect the model’s performance. In this study, the researchers investigated different hyperparameter options to optimize the prediction models for noise pollution in public parks. This means that they tried various combinations of hyperparameters to determine which set of parameters produced the best results. By exploring different hyperparameter options, the researchers aimed to improve the accuracy of the noise-pollution predictions and to identify the optimal set of hyperparameters for this problem;
- Optimized and non-optimized regression models for predicting noise pollution in public parks. Regression is a type of machine-learning algorithm that is used to predict numerical values. The non-optimized regression models were developed without adjusting the hyperparameters, while the optimized regression models were developed by adjusting the hyperparameters using the findings from the different hyperparameter options explored earlier in the study. The comparison of the performance of the optimized and non-optimized regression models provided valuable insights into the impact of hyperparameter tuning on the accuracy of the noise-pollution predictions.
2. Method
2.1. Experimental Setup
2.2. Dataset Description
2.3. The Regression Models
- a.
- Fine Trees
- b.
- Support Vector Machines
- Kernel function. The SVM’s training involves applying a nonlinear transformation to the data, and the choice of this transformation is determined by the kernel function. Four kernel options were optimized: Gaussian kernel, linear kernel, quadratic kernel, and cubic kernel;
- Box-constraint mode. Models regulate observations with large residuals according to the box-constraint parameter. The model becomes more flexible with a higher box-constraint value, while it becomes more rigid with a lower value, and is less prone to overfitting. The choice of box constraint is a trade-off between model flexibility and simplicity, and the optimal value depends on the specific dataset and learning task;
- Epsilon mode. The epsilon (ε) value is a parameter that determines the minimum prediction error that will be considered non-zero in the epsilon mode. Any estimation errors that are smaller than the ε value will be neglected and considered zero. By setting a smaller epsilon value, the model becomes more flexible, as it can more accurately capture smaller deviations from the predicted values. However, a smaller epsilon value can also lead to overfitting, as the model may start to fit the noise in the data instead of the underlying trend;
- Kernel scale mode. A more flexible model is achieved with a smaller kernel scale, as it allows the kernel to capture more intricate relationships between predictor variables. A smoother model, on the other hand, is obtained with a larger kernel scale, which determines the distance between predictor variables where the kernel varies significantly;
- Standardized data. Predictor variables can be transformed using standardization, a technique that ensures they have a mean of 0 and a standard deviation of 1. Consequently, the dependence on arbitrary scales in the predictors is removed, and, generally, performance is improved. The effect of each variable is not distorted by differences in their scales, and all variables are given equal importance in the model. Standardization can also be useful when variables have different units or magnitudes.
- c.
- Gaussian Process Regression Models
- Basis function. Gaussian process regression models are characterized by their prior mean function based on the form of the basis function. It can take one of three options: zero, constant, and linear;
- Kernel function. The kernel function is responsible for measuring the correlation between the response and predictor values based on the distance between them. There are five kernel function options available: rational quadratic, squared exponential, Matérn 5/2, Matérn 3/2, and exponential. For three of these functions, the isotropic kernel can be used, where all predictors have the same correlation-length scale. Alternatively, a nonisotropic kernel can be used, where each predictor variable has its own unique correlation-length scale. A nonisotropic kernel can improve the accuracy of a model, but the fitting process can be slower as a result;
- Sigma mode. The term “sigma mode” pertains to the standard deviation of the observation noise in a model. The app typically tries to optimize this parameter by beginning with a particular value. To use a fixed value instead, the user can uncheck the “optimize numeric parameters” option in the advanced settings. The app chooses the initial value of the standard deviation of the observation noise using a heuristic procedure when sigma mode is set to automatic. This occurs in the non-optimized model;
- Kernel scale and standardize data. Same as in the SVM.
2.4. Model Evaluation
3. Results and Analysis
4. Discussion of Major Findings
- Optimizing regression models improves noise-pollution prediction accuracy: In the study, optimization was performed on three regression models: regression trees, support vector machines, and Gaussian process regression. The objective was to determine the optimal values of hyper parameters for each model that would enhance their prediction performance. The optimization process significantly improved the accuracy of the noise-pollution predictions. This was confirmed by the improvement in performance measures such as MSE, RMSE, and R2. The optimization process allowed the models to better capture the underlying relationships between the predictors and the response, resulting in more accurate predictions of noise-pollution levels;
- Optimized Gaussian process regression (GPR) model emerges as best performer: The results of the optimization process showed that the optimized GPR model emerged as the best performer among the three regression models. The optimized GPR model demonstrated the highest level of accuracy in terms of the performance measures, MSE, RMSE, and R2. It was able to effectively capture the relationships between the predictors and the response, resulting in highly accurate predictions of noise-pollution levels. The optimized GPR model outperformed the other models and emerged as the best model for predicting noise pollution. Despite being slower in terms of computation speed compared to the other models, its superior prediction accuracy makes it an ideal choice for use in addressing the problem of noise pollution;
- The optimization of the Gaussian process regression (GPR) model was performed by determining the optimal values of the hyperparameters. The hyperparameters are parameters that control the shape of the regression function and, therefore, have a significant impact on the accuracy of the predictions. The optimization process involved searching for the optimal values of these parameters that would lead to the best performance of the model. The optimal values of the hyperparameters were found to be: a basis function of zero, a nonisotropic Matérn 3/2 kernel function, a kernel scale of 43.0096, and a sigma parameter of 0.42787. These values were crucial in ensuring the optimal performance of the GPR model for noise-pollution prediction. The GPR model with these hyperparameters delivered highly accurate predictions, outperforming other regression models. The optimization process improved the accuracy of the noise-pollution predictions and allowed for the creation of an effective solution for mitigating the impact of noise pollution in open area nearby main roads.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GPR | Gaussian process regression |
MAE | Mean absolute error |
ML | Machine learning |
MSE | Mean square error |
R2 | R squared—coefficient of determination |
RMSE | Root means square error |
SVM | Support vector machine |
References
- Polson, N.G.; Sokolov, V.O. Deep learning for short-term traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2017, 79, 1–17. [Google Scholar] [CrossRef] [Green Version]
- De Coensel, B.; Brown, A.L.; Tomerini, D. A road traffic noise pattern simulation model that includes distributions of vehicle sound power levels. Appl. Acoust. 2016, 111, 170–178. [Google Scholar] [CrossRef] [Green Version]
- Björkman, M. Community noise annoyance: Importance of noise levels and the number of noise events. J. Sound Vib. 1991, 151, 497–503. [Google Scholar] [CrossRef]
- Sato, T.; Yano, T.; Björkman, M.; Rylander, R. Road traffic noise annoyance in relation to average noise level, number of events and maximum noise level. J. Sound Vib. 1999, 223, 775–784. [Google Scholar] [CrossRef] [Green Version]
- Costa, L.G.; Cole, T.B.; Dao, K.; Chang, Y.C.; Coburn, J.; Garrick, J.M. Effects of air pollution on the nervous system and its possible role in neurodevelopmental and neurodegenerative disorders. Pharmacol. Ther. 2020, 210, 107523. [Google Scholar] [CrossRef]
- Sørensen, M.; Poulsen, A.H.; Hvidtfeldt, U.A.; Brandt, J.; Frohn, L.M.; Ketzel, M.; Christensen, J.H.; Im, U.; Khan, J.; Münzel, T.; et al. Air pollution, road traffic noise and lack of greenness and risk of type 2 diabetes: A multi-exposure prospective study covering Denmark. Environ. Int. 2022, 170, 107570. [Google Scholar] [CrossRef]
- Klompmaker, J.O.; Hoek, G.; Bloemsma, L.D.; Marra, M.; Wijga, A.H.; van den Brink, C.; Brunekreef, B.; Lebret, E.; Gehring, U.; Janssen, N.A. Surrounding green, air pollution, traffic noise exposure and non-accidental and cause-specific mortality. Environ. Int. 2020, 134, 105341. [Google Scholar] [CrossRef]
- WHO. Noise EURO; WHO: Geneva, Switzerland, 2019. [Google Scholar]
- Torija, A.J.; Ruiz, D.P. Automated classification of urban locations for environmental noise impact assessment on the basis of road-traffic content. Expert Syst. Appl. 2016, 53, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Botteldooren, D.; Verkeyn, A.; Lercher, P. Noise Annoyance Modelling using Fuzzy Rule Based Systems. Noise Health 2002, 15, 27–44. [Google Scholar]
- Fallah-Shorshani, M.; Yin, X.; McConnell, R.; Fruin, S.; Franklin, M. Estimating traffic noise over a large urban area: An evalua-tion of methods. Environ. Int. 2022, 170, 107583. [Google Scholar] [CrossRef]
- Adulaimi, A.A.A.; Pradhan, B.; Chakraborty, S.; Alamri, A. Traffic Noise Modelling Using Land Use Regression Model Based on Machine Learning, Statistical Regression and GIS. Energies 2021, 14, 5095. [Google Scholar] [CrossRef]
- Yin, X.; Fallah-Shorshani, M.; McConnell, R.; Fruin, S.; Franklin, M. Predicting Fine Spatial Scale Traffic Noise Using Mobile Measurements and Machine Learning. Environ. Sci. Technol. 2020, 54, 12860–12869. [Google Scholar] [CrossRef]
- Nourani, V.; Gökçekuş, H.; Umar, I.K. Artificial intelligence based ensemble model for prediction of vehicular traffic noise. Environ. Res. 2020, 180, 108852. [Google Scholar] [CrossRef]
- Givargis, S.; Karimi, H. A basic neural traffic noise prediction model for Tehran’s roads. J. Environ. Manag. 2010, 91, 2529–2534. [Google Scholar] [CrossRef]
- Xu, D.; Wei, C.; Peng, P.; Xuan, Q.; Guo, H. GE-GAN: A novel deep learning framework for road traffic state estimation. Transp. Res. Part C Emerg. Technol. 2020, 117, 102635. [Google Scholar] [CrossRef]
- Can, A.; Chevallier, E.; Nadji, M.; Leclercq, L. Dynamic Traffic Modeling for Noise Impact Assessment of Traffic Strategies. Acta Acust. United Acust. 2010, 96, 482–493. [Google Scholar] [CrossRef]
- De Coensel, B.; De Muer, T.; Yperman, I.; Botteldooren, D. The influence of traffic flow dynamics on urban soundscapes. Appl. Acoust. 2005, 66, 175–194. [Google Scholar] [CrossRef] [Green Version]
- W Group. IMMI—Noise Prediction Software|Air Pollution Calculation Software. Available online: https://www.immi.eu/ (accessed on 7 January 2023).
- Gelbart, M.A.; Snoek, J.; Adams, R.P. Bayesian optimization with unknown constraints. arXiv 2014, arXiv:1403.5607. [Google Scholar]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 2000. [Google Scholar]
- Bravo-Moncayo, L.; Lucio-Naranjo, J.; Chávez, M.; Pavón-García, I.; Garzón, C. A machine learning approach for traffic-noise annoyance assessment. Appl. Acoust. 2019, 156, 262–270. [Google Scholar] [CrossRef]
- Chen, W.; An, J.; Li, R.; Fu, L.; Xie, G.; Bhuiyan, M.Z.A.; Li, K. A novel fuzzy deep-learning approach to traffic flow prediction with uncertain spatial–temporal data features. Futur. Gener. Comput. Syst. 2018, 89, 78–88. [Google Scholar] [CrossRef]
- Li, M.; Wang, Y.; Wang, Z.; Zheng, H. A deep learning method based on an attention mechanism for wireless network traffic prediction. Ad Hoc Netw. 2020, 107, 102258. [Google Scholar] [CrossRef]
- Lee, S.Y.; Le, T.H.M.; Kim, Y.M. Prediction and detection of potholes in urban roads: Machine learning and deep learning based image segmentation approaches. Dev. Built Environ. 2023, 13, 100109. [Google Scholar] [CrossRef]
Features | Values | Number of Observations | Target Variable |
---|---|---|---|
Distance | 15, 30, 45, 60, 75, 90 (m) | 6480 | Noise |
Time | Day, Night | ||
Landscape | None, Tree, Wall | ||
Road surface | Asphaltic concrete, Uneven surface | ||
Vehicles/h | 10, 20, 40, 50, 100, 200, 400, 500, 1000, 2000 | ||
Speed limit | 60, 80, 100, 120, 140 (km/h) | ||
Percentage of heavy vehicles | 5, 10, 20 (%) |
Model Type | Hyper- Parameter | Without Optimization | With Optimization (Range of Parameters’ Values) |
---|---|---|---|
Fine tree | Minimum leaf size | 4 | 1–3240 |
SVM | Kernel function | Linear | Gaussian, Linear, Quadratic, Cubic |
Kernel scale | Automatic | 0.001–1000 | |
Box constraint | Automatic | 0.001–1000 | |
Epsilon | Automatic | 0.012042–1204.2254 | |
Standardize data | True | True, False | |
GPR | Basis function | Constant | Constant, Zero, Linear |
Kernel function | Rational Quadratic | Nonisotropic Rational Quadratic | |
Isotropic Rational Quadratic | |||
Nonisotropic Squared Exponential | |||
Isotropic Squared Exponential | |||
Nonisotropic Matérn 5/2 | |||
Isotropic Matérn 5/2 | |||
Nonisotropic Matérn 3/2 | |||
Isotropic Matérn 3/2 | |||
Nonisotropic Exponential | |||
Isotropic Exponential | |||
Kernel scale | Automatic | 1.99–1990 | |
Sigma | Automatic | 0.0001–113.9076 | |
Standardize | True | True, False |
Model Type | RMSE | R2 | MSE | Prediction Speed (obs/s) * | Training Time | |||
---|---|---|---|---|---|---|---|---|
Sec | Min | |||||||
Non-optimizable models | Model 1 | Fine tree | 1.59 | 0.98 | 2.52 | 130,000 | 3.15 | 0.05 |
Model 2 | SVM | 3.84 | 0.89 | 14.78 | 53,000 | 7.19 | 0.12 | |
Model 3 | GPR | 1.41 | 0.98 | 1.98 | 8100 | 429.11 | 7.15 | |
Optimizable models | Model 4 | Fine tree | 1.57 | 0.98 | 2.48 | 420,000 | 21.20 | 0.35 |
Model 5 | SVM | 1.65 | 0.98 | 2.74 | 260,000 | 1206.60 | 20.11 | |
Model 6 | GPR | 0.19 | 1.00 | 0.04 | 4100 | 8373.70 | 139.56 |
Model Type | Hyper- Parameter | With Optimization | |
---|---|---|---|
Range | Optimal Value | ||
Fine tree | Minimum leaf size | 1–3240 | 3 |
SVM | Kernel function | Gaussian, Linear, Quadratic, Cubic | Gaussian |
Kernel scale | 0.001–1000 | 78.5289 | |
Box constraint | 0.001–1000 | 588.2126 | |
Epsilon | 0.012042–1204.2254 | 1.9145 | |
Standardize data | True, False | False | |
GPR | Basis function | Constant, Zero, Linear | Zero |
Kernel function | Nonisotropic Rational Quadratic | Nonisotropic Matérn 3/2 | |
Isotropic Rational Quadratic | |||
Nonisotropic Squared Exponential | |||
Isotropic Squared Exponential | |||
Nonisotropic Matérn 5/2 | |||
Isotropic Matérn 5/2 | |||
Nonisotropic Matérn 3/2 | |||
Isotropic Matérn 3/2 | |||
Nonisotropic Exponential | |||
Isotropic Exponential | |||
Kernel scale | 1.99–1990 | 43.0096 | |
Sigma | 0.0001–113.9076 | 0.42787 | |
Standardize | True, False | False |
Reference | Prediction Model | MSE | RMSE | R2 |
---|---|---|---|---|
[1] | Deep-Learning Media Filter Preprocessing (DLM8L) | 7.7 | - | 0.85 |
[22] | Artificial Neural Networks (ANN) | - | 1.91 | 0.33 |
[11] | eXtreme Gradient Boosting (XGB) | 0.65 | - | - |
[23] | Fuzzy Deep-Learning (FDCN) | - | 0.30 | - |
[24] | Spatio-Temporal Convolutional Network (LA-ResNet) | - | 4.5 | - |
[25] | Gaussian Process Regression (GPR) | 0.21 | 0.36 | 0.58 |
This study | Optimizable Gaussian Process Regression (GPR)—The best-performed model | 0.19 | 0.04 | 1.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Al-Shargabi, A.A.; Almhafdy, A.; AlSaleem, S.S.; Berardi, U.; Ali, A.A.M. Optimizing Regression Models for Predicting Noise Pollution Caused by Road Traffic. Sustainability 2023, 15, 10020. https://doi.org/10.3390/su151310020
Al-Shargabi AA, Almhafdy A, AlSaleem SS, Berardi U, Ali AAM. Optimizing Regression Models for Predicting Noise Pollution Caused by Road Traffic. Sustainability. 2023; 15(13):10020. https://doi.org/10.3390/su151310020
Chicago/Turabian StyleAl-Shargabi, Amal A., Abdulbasit Almhafdy, Saleem S. AlSaleem, Umberto Berardi, and Ahmed AbdelMonteleb M. Ali. 2023. "Optimizing Regression Models for Predicting Noise Pollution Caused by Road Traffic" Sustainability 15, no. 13: 10020. https://doi.org/10.3390/su151310020
APA StyleAl-Shargabi, A. A., Almhafdy, A., AlSaleem, S. S., Berardi, U., & Ali, A. A. M. (2023). Optimizing Regression Models for Predicting Noise Pollution Caused by Road Traffic. Sustainability, 15(13), 10020. https://doi.org/10.3390/su151310020