Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model
Abstract
:1. Introduction
- (1)
- The distribution of entry and exit stations on up-and-down highways is dense, and the traffic flow varies significantly within short spatial distances.
- (2)
- Owing to the uneven economic development of central cities and different regional functional positionings, significant differences exist in the composition of traffic components and traffic flow between different stations.
- (3)
- Traffic flow around urban highways is easily influenced by various policies and important urban activities in the central city, and the traffic flow at the same station varies considerably at different times.
- (1)
- First, a dataset of high-speed traffic flow around a city was constructed, and the data were dimensionalized to expand the data features. By combining the inherent attributes of randomness, continuity, periodicity, and volatility of high-speed traffic flow around the city, the data were visualized and their spatiotemporal distribution characteristics were studied. Different spatial (stations and high-speed entire lines) and temporal granularities (monthly and annually) were analyzed using annual periodicity and the intra-year variability of time characteristics of the dataset.
- (2)
- Second, different feature matrices were constructed for traffic flow data and then input into symmetric bidirectional long short-term memory (Bi-LSTM) and integrated extreme gradient boosting (XGBoost) models. The prediction accuracy of inputting time series data into the ensemble learning model after a monthly feature-matrix analysis was higher than that of directly inputting time series data into the symmetric Bi-LSTM model. To further improve the prediction accuracy of the integrated XGBoost model, a Bayesian algorithm was used to optimize the model and obtain optimal network parameters.
2. Literature Review
3. Materials
3.1. Data Characteristic Analysis
3.2. Data Feature Processing
3.2.1. Removing Invalid Features
3.2.2. Removing Strongly Correlated Features
3.2.3. Fusing Features
4. Methodology
4.1. XGBoost
4.2. Bayesian Optimization Algorithm
4.3. BO-XGBoost Model
5. Traffic Data Prediction Based on the BO-XGBoost
5.1. Experimental Setup
5.1.1. Experimental Datasets
5.1.2. Experimental Environment
5.1.3. Evaluating Indicator
5.2. Result Comparison and Analysis
5.3. Feature Importance Analysis Based on XGBoost
6. Conclusions
- (1)
- A dataset of high-speed traffic flow around a city was constructed, and the data were dimensionalized to expand the data features. By combining the inherent attributes of randomness, continuity, periodicity, and volatility of high-speed traffic flow data, the data were visualized, and their spatiotemporal distribution characteristics were studied. We observed that the dataset had apparent annual periodicity and intra-year variability.
- (2)
- We constructed different feature matrices for the traffic flow data and input those into the symmetric Bi-LSTM and integrated XGBoost models. The prediction accuracy of inputting time series data into the ensemble learning model after the monthly feature matrix analysis was observed to be higher than that of directly inputting time series data into the symmetric Bi-LSTM model. To further improve the prediction accuracy of the integrated XGBoost model, a Bayesian algorithm was used to optimize the model and obtain optimal network parameters. Experiments have shown that the improved XGBoost model has better accuracy, with a prediction accuracy of 0.90 and 0.87 for TF and PCU, respectively.
- Collecting more data on urban highways around central cities and analyzing the impact of urban traffic flow on the economic development of central cities;
- A hyperparameter optimization method with better accuracy and efficiency to determine the optimal parameter combination within the searchable range.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rajeh, T.M.; Li, T.; Li, C.; Javed, M.H.; Luo, Z.; Alhaek, F. Modeling multi-regional temporal correlation with gated recurrent unit and multiple linear regression for urban traffic flow prediction. Knowl. Based Syst. 2023, 262, 110237. [Google Scholar] [CrossRef]
- Guo, S.; Lin, Y.; Li, S.; Chen, Z.; Wan, H. Deep Spatial–Temporal 3D Convolutional Neural Networks for Traffic Data Forecasting. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3913–3926. [Google Scholar] [CrossRef]
- Zhan, A.; Du, F.; Chen, Z.; Yin, G.; Wang, M.; Zhang, Y. A traffic flow forecasting method based on the GA-SVR. J. High Speed Netw. 2022, 28, 97–106. [Google Scholar] [CrossRef]
- Zhou, T.; Huang, B.; Li, R.; Liu, X.; Huang, Z. An attention-based deep learning model for citywide traffic flow forecasting. Int. J. Digit. Earth 2022, 15, 323–344. [Google Scholar] [CrossRef]
- Kashyap, A.A.; Raviraj, S.; Devarakonda, A.; Nayak, K.S.R.; KV, S.; Bhat, S.J. Traffic flow prediction models—A review of deep learning techniques. Cogent Eng. 2022, 9, 2010510. [Google Scholar] [CrossRef]
- Wang, S.; Shao, C.; Zhang, J.; Zheng, Y.; Meng, M. Traffic flow prediction using bi-directional gated recurrent unit method. Urban Inform. 2022, 1, 16. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, G.; Shang, J.; Zhang, B. Short-term Traffic Flow Prediction With Residual Graph Attention Network. Eng. Lett. 2022, 30, 4. [Google Scholar]
- Wen, Y.; Xu, P.; Li, Z.; Xu, W.; Wang, X. RPConvformer: A novel Transformer-based deep neural networks for traffic flow prediction. Expert Syst. Appl. 2023, 218, 119587. [Google Scholar] [CrossRef]
- He, R.; Xiao, Y.; Lu, X.; Zhang, S.; Liu, Y. ST-3DGMR: Spatio-temporal 3D grouped multiscale ResNet network for region-based urban traffic flow prediction. Inf. Sci. 2023, 624, 68–93. [Google Scholar] [CrossRef]
- Razali, N.A.M.; Shamsaimon, N.; Ishak, K.K.; Ramli, S.; Amran, M.F.M.; Sukardi, S. Gap, techniques and evaluation: Traffic flow prediction using machine learning and deep learning. J. Big Data 2021, 8, 152. [Google Scholar] [CrossRef]
- Cengil, E.; Çınar, A.; Yıldırım, M. A hybrid approach for efficient multi-classification of white blood cells based on transfer learning techniques and traditional machine learning methods. Concurr. Comput. Pract. Exp. 2021, 34, e6756. [Google Scholar] [CrossRef]
- Polson, N.G.; Sokolov, V.O. Deep Learning for Short-Term Traffic Flow Prediction. Transp. Res. Part C 2017, 79, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Alajali, W.; Zhou, W.; Wen, S.; Wang, Y. Intersection Traffic Prediction Using Decision Tree Models. Symmetry 2018, 10, 386. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Wang, W.; Chen, X.M. Bibliometric methods in traffic flow prediction based on artificial intelligence. Expert Syst. Appl. 2023, 228, 120421. [Google Scholar] [CrossRef]
- Sun, Z.; Li, Y.; Pei, L.; Li, W.; Hao, X. Classification of Coarse Aggregate Particle Size Based on Deep Residual Network. Symmetry 2022, 14, 349. [Google Scholar] [CrossRef]
- Xianglong, L.; Liyao, N.; Shengrui, Z. An Algorithm for Traffic Flow Prediction Based on Improved SARIMA and GA. KSCE J. Civ. Eng. 2018, 22, 4107–4115. [Google Scholar] [CrossRef]
- Marcelino, P.; de Lurdes Antunes, M.; Fortunato, E.; Gomes, M.C. Transfer learning for pavement performance prediction. Int. J. Pavement Res. Technol. 2020, 13, 154–167. [Google Scholar] [CrossRef]
- Wang, X.; Xiao, J.; Fan, M.; Gang, C.; Tang, Y. Short-Term Traffic Flow Prediction Based on Ga-Bp Neural Network. Adv. Comput. Signals Syst. 2022, 6, 75–82. [Google Scholar]
- Zhang, L.; Alharbe, N.R.; Luo, G.; Yao, Z.; Li, Y. A hybrid forecasting framework based on support vector regression with a modified genetic algorithm and a random forest for traffic flow prediction. Tsinghua Sci. Technol. 2018, 23, 479–492. [Google Scholar] [CrossRef]
- Wang, D.; Wang, C.; Xiao, J.; Xiao, Z.; Chen, W.; Havyarimana, V. Bayesian optimization of support vector machine for regression prediction of short-term traffic flow. Intell. Data Anal. 2019, 23, 481–497. [Google Scholar] [CrossRef]
- Lu, B.; Gan, X.; Jin, H.; Fu, L.; Wang, X.; Zhang, H. Make More Connections: Urban Traffic Flow Forecasting with Spatiotemporal Adaptive Gated Graph Convolution Network. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–25. [Google Scholar] [CrossRef]
- Chai, C.; Ren, C.; Yin, C.; Xu, H.; Meng, Q.; Teng, J.; Gao, G. A Multifeature Fusion Short-Term Traffic Flow Prediction Model Based on Deep Learnings. J. Adv. Transp. 2022, 2022, 1702766. [Google Scholar] [CrossRef]
- Fang, W.; Zhuo, W.; Song, Y.; Yan, J.; Zhou, T.; Qin, J. [formula omitted]-LSTM: An error distribution free deep learning for short-term traffic flow forecasting. Neurocomputing 2023, 526, 180–190. [Google Scholar] [CrossRef]
- Lan, T.; Zhang, X.; Qu, D.; Yang, Y.; Chen, Y. Short-Term Traffic Flow Prediction Based on the Optimi-zation Study of Initial Weights of the Attention Mechanism. Sustainability 2023, 15, 1374. [Google Scholar] [CrossRef]
- Zhang, Z. Prediction of traffic flow based on deep learning. Int. J. Adv. Comput. Technol. 2020, 9, 5–11. [Google Scholar]
- Hao, X.; Liu, Y.; Pei, L.; Li, W.; Du, Y. Atmospheric Temperature Prediction Based on a BiLSTMAttention Model. Symmetry 2022, 14, 2470. [Google Scholar] [CrossRef]
- Zhuang, W.; Cao, Y. Short-Term Traffic Flow Prediction Based on CNN-BILSTM with Multicomponent Information. Appl. Sci. 2022, 12, 8714. [Google Scholar] [CrossRef]
- Sun, Z.; Pei, L.; Xu, L. Intelligent Detection and Restoration of Road Domain Environmental Perception Data Based on DS-LOF and GA-XGBoost. J. China Highw. Eng. 2023, 36, 15–26. [Google Scholar]
- Du, Q.; Yin, F.; Li, Z. Base station traffic prediction using XGBoost-LSTM with feature enhancement. IET Netw. 2020, 9, 0103. [Google Scholar] [CrossRef]
- Sun, B.; Sun, T.; Jiao, P. Spatio-Temporal Segmented Traffic Flow Prediction with ANPRS Data Based on Improved XGBoost. J. Adv. Transp. 2021, 2021, 1–24. [Google Scholar] [CrossRef]
- Tumash, L.; Canudas-de-Wit, C.; Delle Monache, M.L. Multi-directional continuous traffic model for large-scale urban networks. Transp. Res. Part B 2022, 158, 374–402. [Google Scholar] [CrossRef]
- Hu, Y.; Ma, T.; Chen, J. Multi-anticipative bi-directional visual field traffic flow models in the connected vehicle environment. Phys. A Stat. Mech. Its Appl. 2021, 584, 126372. [Google Scholar] [CrossRef]
- Pei, L.; Sun, Z.; Yu, T.; Li, W.; Hao, X.; Hu, Y.; Yang, C. Pavement aggregate shape classification based on extreme gradient boosting. Constr. Build. Mater. 2020, 256, 119356–119369. [Google Scholar] [CrossRef]
- Yuan, C.; He, C.; Xu, J.; Liao, L.; Kong, Q. Bayesian Optimization for Selecting Efficient Machine Learning Regressors to Determine Bond-slip Model of FRP-to-concrete Interface. Structures 2022, 39, 351–364. [Google Scholar] [CrossRef]
- Zhou, J.; Qiu, Y.; Zhu, S.; Armaghani, D.J.; Khandelwal, M.; Mohamad, E.T. Estimation of the TBM Advance Rate Under Hard Rock Conditions Using XGBoost and Bayesian Optimization. Undergr. Space 2020, 6, 506–515. [Google Scholar] [CrossRef]
- Pei, L.; Sun, Z.; Hu, Y.; Li, W.; Gao, Y.; Hao, X. Neural Network Model for Road Aggregate Size Calculation Based on Multiple Features. J. South China Univ. Technol. 2020, 48, 77–86. [Google Scholar]
Literature | Method | Overall Evaluation of the Method |
---|---|---|
[9] | ST-3DGMR | Compared with the state-of-the-art, ST-3DGMR has significantly lower RMSE on the BikeNYC, TaxiBJ, and TaxiCQ datasets. |
[18] | GA-BPNN | Experimental results indicated that the GA-BPNN algorithm had a better prediction effect and provided a certain reference value for short-time traffic-flow prediction. |
[22] | RFCGASVR | The results indicated that the proposed RFCGASVR model performed better than other methods. |
[23] | SVR | The experimental results indicated that the accuracy of this method is superior to classic SARIMA, MLP NN, ERT, and Adaboost methods. |
[24] | GWO-attention-LSTM | It has better performance and can provide effective help for traffic management control and traffic flow theory research. |
[26] | Bi-LSTM- Attention | Bidirectional long short-term memory (Bi-LSTM)—Attention models can effectively improve the prediction accuracy of traffic flow. |
[30] | XGBoost-LSTM | The XGBoost-LSTM model was used to predict base-station traffic and performed better than competing algorithms. |
[31] | XGBoost-I XGBoost-S | Two improved models have been proposed: independent XGBoost (XGBoost-I) with different interval independent adjustment parameters and static XGBoost-S with overall parameter adjustment. |
Abbreviation | Full Name | Type of Information | Abbreviation | Full Name | Type of Information |
---|---|---|---|---|---|
ObName | Observatory Name | position information | PCTF | Passenger Car Traffic Flow | real statistical indicators |
NO. | Start Stake Number | position information | TTF | Truck Traffic Flow | real statistical indicators |
SMPTF | Small and Medium Passenger Traffic Flow | real statistical indicators | TF | Traffic Flow | real statistical indicators |
LBTF | Large Bus Traffic Flow | real statistical indicators | PU | Passenger Unit | calculated indicators |
STTF | Small Truck Traffic Flow | real statistical indicators | TU | Truck Unit | indicators calculated |
MTTF | Medium Truck Traffic Flow | real statistical indicators | PCU | Passenger Car Unit | indicators calculated |
LTTF | Large Truck Traffic Flow | real statistical indicators | Speed | Motor Vehicle Speed | real statistical indicators |
ELCTF | Extra Large Cargo Traffic Flow | real statistical indicators | TVR | Traffic Volume Ratio | indicators calculated |
CTF | Container Traffic Flow | real statistical indicators | / | / | / |
Year | Month | Ob Name | NO. | SMPTF | LBTF | STTF | MTTF | LTTF | ELCTF | CTF | PCTF | TTF | TF | PU | TU | PCU * | Speed | TVR * |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2016 | 01 | FangJiaCun | 0 | 10,655 | 150 | 401 | 976 | 571 | 2125 | 130 | 10,805 | 4203 | 15,008 | 10,880 | 12,598 | 23,478 | 79.9 | 0.293 |
2016 | 01 | FangJiaCunLiJiao | 77.5 | 16,448 | 352 | 1061 | 786 | 911 | 2901 | 567 | 16,800 | 6226 | 23,026 | 16,976 | 18,845 | 35,821 | 83.7 | 0.448 |
2016 | 01 | HeChi ZhaiLiJiao | 46.8 | 38,904 | 4871 | 4320 | 3242 | 876 | 335 | 3043 | 43,775 | 11,816 | 55,591 | 46,211 | 25,323 | 71,534 | 70.8 | 0.894 |
2016 | 01 | HeChi ZhaiLiJiaoXi | 39.6 | 30,513 | 2375 | 3389 | 1581 | 459 | 342 | 1376 | 32,888 | 7147 | 40,035 | 34,076 | 14,010 | 48,085 | 64.9 | 0.601 |
2016 | 01 | LiuCunBaoLiJiao | 18.6 | 22,160 | 641 | 2185 | 946 | 1254 | 2252 | 427 | 22,801 | 7064 | 29,865 | 23,122 | 18,082 | 41,204 | 72.8 | 0.515 |
2016 | 01 | LvXiao ZhaiLiJiaoXi | 15.6 | 24,602 | 3351 | 4101 | 2888 | 6058 | 903 | 2083 | 27,953 | 16,033 | 43,986 | 29,629 | 38,551 | 68,180 | 71.7 | 0.852 |
2016 | 01 | MaoEr LiuLiJiaoNan | 34 | 26,393 | 310 | 2030 | 975 | 778 | 1637 | 203 | 26,703 | 5623 | 32,326 | 26,858 | 13,187 | 40,045 | 77.8 | 0.501 |
2016 | 01 | Qu Jiang LiJiaoDong | 57.5 | 42,650 | 554 | 3261 | 1539 | 970 | 2156 | 515 | 43,204 | 8441 | 51,645 | 43,481 | 19,164 | 62,645 | 66.8 | 0.783 |
2016 | 01 | Qu Jiang LIJiaoXi | 63.1 | 37,747 | 423 | 1831 | 1274 | 1211 | 3003 | 751 | 38,170 | 8070 | 46,240 | 38,382 | 22,391 | 60,773 | 81 | 0.76 |
2016 | 01 | XiGao Xin | 52.7 | 24,659 | 430 | 1748 | 680 | 457 | 1155 | 191 | 25,089 | 4231 | 29,320 | 25,304 | 9523 | 34,827 | 61.7 | 0.435 |
2016 | 01 | Xiang WangLiJiao | 73 | 33,517 | 1325 | 3721 | 5288 | 812 | 412 | 3728 | 34,842 | 13,961 | 48,803 | 35,505 | 30,649 | 66,154 | 74.4 | 0.827 |
2016 | 01 | XieWangLiJiao | 4.96 | 19,106 | 3326 | 4775 | 5410 | 1029 | 7871 | 3064 | 22,432 | 22,149 | 44,581 | 24,095 | 59,717 | 83,812 | 73.1 | 1.048 |
Model Parameters | Parameter Explanation | Optimal Values |
---|---|---|
max_depth | Limiting the maximum depth of a tree | 4 |
gamma | Penalty term coefficient, the minimum loss function reduction required to split the node | 0.3 |
min_child_weight | Minimum leaf node sample weights sum | 1 |
subsample | The rate at which samples are sampled when the decision tree is built | 0.8 |
colsample_bytree | The rate at which features are sampled when building a decision tree | 0.8 |
reg_alpha | L1 regularization parameters | 0.2 |
reg_lambda | L2 regularization parameters | 0.7 |
learning_rate | The learning rate, which is the model learning step | 0.035 |
n_estimators | The number of iterations raised | 300 |
Items | Configuration |
---|---|
Software | Anaconda, Jupyter Notebook |
Hardware | Win11, 12th Gen Intel(R) Core (TM) i5-12500 3.00 GHz, 16 GB of memory |
Language and Frames | Python and TensorFlow |
Model | Parameter Settings |
---|---|
DNN | shuffle = True, epochs = 1000, batch_size = 16, verbose = 1 |
RF | n_estimators = 50, max_depth = 2, min_samples_split = 4, min_samples_leaf’ = 1, max_features = 7, oob_score = False |
XGBoost | max_depth = 4, gamma = 0.5, min_child_weight = 1, subsample = 0.6, colsample_bytree = 0.8, reg_alpha = 0.2, reg_lambda = 0.6, learning_rate = 0.05, n_estimators = 300 |
GA-XGBoost | Pc = 0.6, Pm = 0.01, T = 100, M = 50 |
Bi-LSTM | epochs = 100, batch_size = 64, validation_split = 0.15, INPUT_DIMS = 1, TIME_STEPS = 12, lstm_units = 64 |
CatBoost | iterations = 400, learning_rate = 0.05, depth = 5, l2_leaf_reg = 1 |
Model | Training Set | Test Set | ||||
---|---|---|---|---|---|---|
R2 | MAE | MAPE% | R2 | MAE | MAPE% | |
DNN | 0.99 | 13.38 | 0.0169 | 0. 13 | 5289 | 5.98 |
RF | 0.87 | 2758.24 | 3.56 | 0.84 | 2967.42 | 3.83 |
CatBoost | 0.99 | 1.55 | 0.0001 | 0.22 | 5228.98 | 4.25 |
Bi-LSTM | 0.90 | 2254.21 | 2.91 | 0.88 | 2596.44 | 3.35 |
XGBoost | 0.91 | 2196.56 | 2.83 | 0.88 | 2238.15 | 2.89 |
GA-XGBoost | 0.91 | 2048.63 | 2.71 | 0.89 | 2213.57 | 2.87 |
BO-XGBoost | 0.92 | 1963.26 | 2.53 | 0.90 | 2231.65 | 2.88 |
Contrast Model | Training Set | Test Set | ||||
---|---|---|---|---|---|---|
R2 | MAE | MAPE% | R2 | MAE | MAPE% | |
DNN | 0.99 | 5.56 | 0.0089 | 0.12 | 6352 | 6.81 |
RF | 0.79 | 4448 | 4.02 | 0.78 | 4691.42 | 4.24 |
Catboost | 0.99 | 0.248 | 0.0002 | 0.42 | 5353.74 | 4.84 |
Bi-LSTM | 0.85 | 3652.73 | 3.22 | 0.81 | 4356.12 | 3.93 |
XGBoost | 0.87 | 3885.32 | 3.51 | 0.86 | 3712.91 | 3.35 |
GA-XGBoost | 0.88 | 3562.62 | 3.12 | 0.86 | 3521.66 | 3.41 |
BO-XGBoost | 0.89 | 3125.51 | 2.82 | 0.87 | 3447.14 | 3.12 |
Impact Factors | Importance Weight of PCU | Importance Weight of TF | Average |
---|---|---|---|
PV | 0.2949 | 0.0041 | 0.1495 |
PCTF | 0.1670 | 0.0050 | 0.086 |
LTTF | 0.1551 | 0.1845 | 0.1698 |
ELCTF | 0.1375 | 0.4450 | 0.29125 |
STTF | 0.0922 | 0.1865 | 0.13935 |
LBTF | 0.0746 | 0.0460 | 0.0603 |
Speed | 0.0387 | 0.0788 | 0.05875 |
SMPTF | 0.0284 | 0.0046 | 0.0165 |
CTF | 0.0116 | 0.0459 | 0.02875 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, X.; Chen, C.; Gao, R.; Xing, Z. Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model. Symmetry 2023, 15, 1453. https://doi.org/10.3390/sym15071453
Lu X, Chen C, Gao R, Xing Z. Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model. Symmetry. 2023; 15(7):1453. https://doi.org/10.3390/sym15071453
Chicago/Turabian StyleLu, Xin, Cai Chen, RuiDan Gao, and ZhenZhen Xing. 2023. "Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model" Symmetry 15, no. 7: 1453. https://doi.org/10.3390/sym15071453
APA StyleLu, X., Chen, C., Gao, R., & Xing, Z. (2023). Prediction of High-Speed Traffic Flow around City Based on BO-XGBoost Model. Symmetry, 15(7), 1453. https://doi.org/10.3390/sym15071453