Bus Schedule Time Prediction Based on LSTM-SVR Model
Abstract
:1. Introduction
1.1. Related Work
1.2. Our Contribution
- The model proposed in this paper is an innovative two-stage hybrid LSTM-SVR prediction framework. In the first stage, we fully utilize LSTM’s expertise in time series feature extraction to extract one-dimensional key time series features. In the second stage, using SVR, we combine these features with actual driving time to build an accurate nonlinear prediction model. The main advantage of this approach is its ability to effectively avoid overfitting, which is a particularly common problem in LSTM models, especially when the training data are limited. SVR excels in dealing with nonlinear problems and demonstrates excellent generalization capabilities even on small datasets. By integrating the advantages of LSTM and SVR, the hybrid model not only improves the prediction accuracy, but also is more comprehensive and efficient, which is especially suitable for applications in public transportation scheduling.
- A numerical experiment was conducted using real data from an urban bus scheduling system to validate the effectiveness and stability of the proposed model. Results show that this model surpasses LSTM, SVR, and BiLSTM-SOA models in prediction accuracy and stability, offering a scientific foundation for bus companies and managers to improve operational efficiency.
2. Materials and Models
2.1. Problem Description
2.2. Dataset
2.3. Models
2.3.1. LSTM
2.3.2. SVR
2.3.3. LSTM-SVR
3. Experiments
3.1. Experimental Setup
- Missing data processing
- 2.
- Outlier data processing
- 3.
- Data normalization
- NumPy: A numerical computation library for efficient processing of multi-dimensional arrays and matrices, providing a rich set of mathematical functions suitable for large-scale data operations.
- Pandas: A powerful data analysis and processing library that provides a flexible data structure (e.g., DataFrame) for easy data loading, cleaning, and transformation.
- Matplotlib: A commonly used plotting library for creating various static, dynamic, and interactive charts and graphs, enabling data visualization.
- Scikit-learn: A machine learning library that provides a wide range of algorithms and tools for data preprocessing, model training, evaluation, and parameter optimization.
- TensorFlow and Keras: Frameworks used for building and training LSTM models to capture complex patterns in time series data.
- DEAP: An evolutionary computation library mainly used to implement optimization algorithms such as the genetic algorithm, facilitating the solution of complex optimization problems.
- Scikit-optimize (skopt): A hyperparameter tuning library based on Bayesian optimization, used to automatically find the optimal parameter combinations for models.
3.2. Evaluation Metrics
3.3. Experimental Design
3.4. Results Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tirachini, A. Estimation of Travel Time and the Benefits of Upgrading the Fare Payment Technology in Urban Bus Services. Transp. Res. Part C Emerg. Technol. 2013, 30, 239–256. [Google Scholar] [CrossRef]
- Suwardo, W.; Napiah, M.; Kamaruddin, I. ARIMA Models for Bus Travel Time Prediction. J. Inst. Eng. Malays. 2010, 71, 49–58. Available online: http://scholars.utp.edu.my/id/eprint/5860 (accessed on 14 September 2024).
- Shang, H.; Liu, Y.; Huang, H.; Guo, R. Vehicle Scheduling Optimization Considering the Passenger Waiting Cost. J. Adv. Transp. 2019, 2019, 4212631. [Google Scholar] [CrossRef]
- Pan, H.; Tang, Y.; Wang, G. A Stock Index Futures Price Prediction Approach Based on the MULTI-GARCH-LSTM Mixed Model. Mathematics 2024, 12, 1677. [Google Scholar] [CrossRef]
- Ma, W.; Hong, Y.; Song, Y. On Stock Volatility Forecasting under Mixed-Frequency Data Based on Hybrid RR-MIDAS and CNN-LSTM Models. Mathematics 2024, 12, 1538. [Google Scholar] [CrossRef]
- Kviesis, A.; Zacepins, A.; Komasilovs, V.; Munizaga, M. Bus Arrival Time Prediction with Limited Data Set Using Regression Models. In Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS), Funchal, Madeira, Portugal, 16–18 March 2018. [Google Scholar] [CrossRef]
- Alam, O.; Kush, A.; Emami, A.; Pouladzadeh, P. Predicting Irregularities in Arrival Times for Transit Buses with Recurrent Neural Networks Using GPS Coordinates and Weather Data. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 7813–7826. [Google Scholar] [CrossRef]
- Chu, K.F.; Lam, A.Y.; Tsoi, K.H.; Huang, Z.; Loo, B.P. Deep Encoder Cross Network for Estimated Time of Arrival. IEEE Access 2023, 11, 76095–76107. [Google Scholar] [CrossRef]
- He, P.; Jiang, G.; Lam, S.K.; Tang, D. Travel-Time Prediction of Bus Journey with Multiple Bus Trips. IEEE Trans. Intell. Transp. Syst. 2019, 20, 4192–4205. [Google Scholar] [CrossRef]
- Jargalsaikhan, N.; Matsuyama, K. An Investigation of Machine Learning Methods for Prediction Bus Travel Time of Mongolian Public Transportation. Int. Workshop Adv. Imaging Technol. (IWAIT) 2020, 11515, 325–329. [Google Scholar] [CrossRef]
- Dunne, L.; Rocco Di Torrepadula, F.; Di Martino, S.; McArdle, G.; Nardone, D. Bus Journey Time Prediction with Machine Learning: An Empirical Experience in Two Cities. In International Symposium on Web and Wireless Geographical Information Systems (W2GIS); Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
- Wai, B.; Zhou, W. Designing and Implementing Real-Time Bus Time Predictions using Artificial Intelligence. Transp. Res. Rec. 2020, 2674, 636–648. [Google Scholar] [CrossRef]
- Liu, H.; Xu, H.; Yan, Y.; Cai, Z.; Sun, T.; Li, W. Bus Arrival Time Prediction Based on LSTM and Spatial-Temporal Feature Vector. IEEE Access 2020, 8, 11917–11929. [Google Scholar] [CrossRef]
- Hashi, A.O.; Hashim, S.Z.M.; Anwar, T.; Ahmed, A. A Robust Hybrid Model Based on Kalman-SVM for Bus Arrival Time Prediction. In Emerging Trends in Intelligent Computing and Informatics (IRICT), Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
- Zhou, T.; Wu, W.; Peng, L.; Zhang, M.; Li, Z.; Xiong, Y.; Bai, Y. Evaluation of Urban Bus Service Reliability on Variable Time Horizons Using a Hybrid Deep Learning Method. Reliab. Eng. Syst. Saf. 2022, 217, 108090. [Google Scholar] [CrossRef]
- Leong, S.H.; Lam, C.T.; Ng, B.K. Bus Arrival Time Prediction for Short-Distance Bus Stops with Real-Time Online Information. In Proceedings of the 2021 IEEE 21st International Conference on Communication Technology (ICCT), Tianjin, China, 13–16 October 2021. [Google Scholar] [CrossRef]
- Luo, X.; Li, D.; Yang, Y.; Zhang, S. Spatiotemporal Traffic Flow Prediction with KNN and LSTM. J. Adv. Transp. 2019, 2019, 4145353. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhu, J.; Zhang, J. Short-Term Passenger Flow Forecasting Based on Phase Space Reconstruction and LSTM. J. Inst. Eng. Malays. 2017, 482, 679–688. [Google Scholar] [CrossRef]
- Petersen, N.C.; Rodrigues, F.; Pereira, F.C. Multi-Output Bus Travel Time Prediction with Convolutional LSTM Neural Network. Expert Syst. Appl. 2019, 120, 426–435. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, H.; Jia, J.; Shi, B.; Wang, W. Understanding Urban Bus Travel Time: Statistical Analysis and a Deep Learning Prediction. Int. J. Mod. Phys. B 2023, 37, 2350034. [Google Scholar] [CrossRef]
- Jiang, R.; Hu, D.; Sun, Q.; Wu, X. Predicting bus travel time with hybrid incomplete data—A deep learning approach. Promet-Traffic Transp. 2022, 34, 673–685. [Google Scholar] [CrossRef]
- Zhou, B.; Zhou, D.D.; Sun, J.; Ni, X.Y. Bus Arrival Time Prediction Model Based on Bidirectional Long Short-Term Memory Network. J. Transp. Syst. Eng. Inf. Technol. 2023, 23, 148–160. [Google Scholar] [CrossRef]
- Xie, J.; Lin, Z.; Yin, J.; Lai, Z.; Wang, X.; Chen, X. Deep Reinforcement Learning Based Dynamic Bus Timetable Scheduling with Bidirectional Constraints. In Proceedings of the ninth-First International Conference of Big Data and Social Computing (BDSC), Harbin, China, 8–10 August 2024. [Google Scholar] [CrossRef]
- Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation. Int. J. Intell. Technol. Appl. Stat. 2018, 11, 105–111. [Google Scholar] [CrossRef]
- Bichri, H.; Chergui, A.; Hain, M. Investigating the Impact of Train/Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 527–530. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Bach, F.R.; Lanckriet, G.R.G.; Jordan, M.I. Multiple Kernel Learning, Conic Duality, and the SMO Algorithm. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), Banff, AB, Canada, 4–8 July 2004. [Google Scholar] [CrossRef]
- Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P. Learning the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res. 2004, 5, 27–72. [Google Scholar] [CrossRef]
- Adnan, F.A.; Jamaludin, K.R.; Wan Muhamad, W.Z.A.; Miskon, S. A Review of the Current Publication Trends on Missing Data Imputation over Three Decades: Direction and Future Research. Neural Comput. Appl. 2022, 34, 18325–18340. [Google Scholar] [CrossRef]
- Stekhoven, D.J.; Bühlmann, P. MissForest—Non-Parametric Missing Value Imputation for Mixed-Type Data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [PubMed]
- Shah, A.D.; Bartlett, J.W.; Carpenter, J.; Nicholas, O.; Hemingway, H. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study. Am. J. Epidemiol. 2014, 179, 764–774. [Google Scholar] [CrossRef]
- Perez, H.; Tah, J.H.M. Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using T-SNE. Mathematics 2020, 8, 662. [Google Scholar] [CrossRef]
- Lane, D.M.; Scott, D.; Hebl, M.; Guerra, R.; Osherson, D.; Zimmer, H. Introduction to Statistics: An Interactive E-Book, 1st ed.; University of Houston: Houston, TX, USA, 2013; Available online: https://www.onlinestatbook.com (accessed on 26 September 2024).
- Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Victoria, A.H.; Maragatham, G. Automatic Tuning of Hyperparameters Using Bayesian Optimization. Evol. Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
- Chang, Z.H.; Yang, Z.; Chen, W.B. Effective Adam-Optimized LSTM Neural Network for Electricity Price Forecasting. In Proceedings of the IEEE International Conference on Software Engineering and Service Sciences (ICSESS), Beijing, China, 23–25 November 2018. [Google Scholar] [CrossRef]
Variable Type | Variable Name | Explanation of Variables (Units) |
---|---|---|
Input variable | Cycle times | Number of bus cycles (times) |
Driving time of previous shift | Travel time of previous bus trip (hour: minute: second) | |
Rest time between two consecutive shifts of bus | Interval between two adjacent departures of same bus (minutes) | |
Interval time between current bus and previous bus | Interval between departure time of current bus and previous bus (minutes) | |
Actual departure time of current shift | Actual departure time of buses on current trip (hour: minute: second) | |
Actual driving time | Actual driving time of buses (hour: minute: second) | |
Intermediate output variables | One-dimensional time series features | One-dimensional time series features extracted by LSTM |
Final output variable | Predicted schedule time | Predicted bus schedule time (hour: minute: second) output by SVR model |
Parameters | Parameter’s Explanation | Parameter Value |
---|---|---|
Network layer | Number of network layers in the model | 4 |
Number of neurons in the input layer | Number of input features | 5 |
Number of neurons in the output layer | Output value, i.e., the extracted features | 1 |
Number of neurons in the hidden layer | Related to the performance of the network | 108, 56 |
Activation function | Activation function of the output layer | ReLU |
Optimizer | Model optimizer | Adam |
Rounds | Number of training times | 50 |
Batch normalization | Improves model generalization capabilities | True |
Dropout ratio | Dropout stochastic inactivation rate | 13% |
Model | MSE | RMSE | MAE | MAPE |
---|---|---|---|---|
LSTM-BOA | 0.3765 | 0.6136 | 0.4314 | 18.715 |
BiLSTM-SOA | 0.3604 | 0.6004 | 0.4193 | 18.4578 |
SVR-BOA | 0.3877 | 0.6227 | 0.4301 | 15.0201 |
LSTM-SVR-GS | 0.3088 | 0.5557 | 0.3967 | 6.8614 |
LSTM-SVR-BOA | 0.3059 | 0.5531 | 0.3913 | 6.7531 |
LSTM-SVR-GA | 0.3080 | 0.5550 | 0.3983 | 6.9460 |
LSTM-SVR-RSO | 0.3054 | 0.5526 | 0.3914 | 6.7926 |
Time Period | Summer Weekdays | Winter Weekdays | Holiday | ||||||
---|---|---|---|---|---|---|---|---|---|
Actual | Predicted | Actual | Predicted | Actual | Predicted | ||||
Time value (min) | 47.00 | 44.30 | 2.70 | 49.00 | 54.42 | 5.42 | 45.00 | 46.49 | 1.49 |
49.00 | 47.47 | 1.53 | 55.00 | 61.24 | 6.24 | 44.00 | 46.73 | 2.73 | |
51.00 | 51.31 | 0.31 | 62.00 | 59.59 | 2.41 | 42.00 | 49.85 | 7.85 | |
56.00 | 56.87 | 0.87 | 61.00 | 59.93 | 1.07 | 46.00 | 50.57 | 4.57 | |
57.00 | 59.45 | 2.45 | 63.00 | 58.78 | 4.22 | 48.00 | 50.63 | 2.63 | |
61.00 | 58.33 | 2.67 | 54.00 | 52.93 | 1.07 | 49.00 | 50.01 | 1.01 | |
59.00 | 55.80 | 3.20 | 56.00 | 59.19 | 3.19 | 51.00 | 50.62 | 0.38 | |
56.00 | 59.51 | 3.51 | 50.00 | 53.94 | 3.94 | 55.00 | 47.78 | 7.22 | |
60.00 | 56.72 | 3.28 | 57.00 | 55.37 | 1.63 | 52.00 | 55.38 | 3.38 | |
55.00 | 56.53 | 1.53 | 49.00 | 55.55 | 6.55 | 55.00 | 52.94 | 2.06 | |
63.00 | 57.43 | 5.57 | 59.00 | 58.71 | 0.29 | 50.00 | 52.44 | 2.44 | |
67.00 | 60.55 | 6.45 | 62.00 | 67.97 | 5.97 | 51.00 | 54.44 | 3.44 | |
72.00 | 64.84 | 7.16 | 65.00 | 74.72 | 9.72 | 59.00 | 53.43 | 5.57 | |
70.00 | 65.43 | 4.57 | 71.00 | 61.51 | 9.49 | 67.00 | 56.78 | 10.22 | |
58.00 | 54.92 | 3.08 | 76.00 | 68.85 | 7.15 | 75.00 | 55.20 | 19.80 | |
64.00 | 51.31 | 12.69 | 66.00 | 67.35 | 1.35 | 60.00 | 62.64 | 2.64 | |
58.00 | 49.09 | 8.91 | 65.00 | 55.98 | 9.02 | 47.00 | 62.66 | 15.66 | |
45.00 | 43.73 | 1.27 | 62.00 | 57.38 | 4.62 | 65.00 | 60.71 | 4.29 | |
46.00 | 40.79 | 5.21 | 57.00 | 53.30 | 3.70 | 55.00 | 48.28 | 6.72 | |
39.00 | 38.84 | 0.16 | 51.00 | 50.16 | 0.84 | 41.00 | 51.68 | 10.68 |
Summer Weekdays | Winter Weekdays | Holiday | |||||||
---|---|---|---|---|---|---|---|---|---|
Time comparison | Planned | Predicted | Actual | Planned | Predicted | Actual | Planned | Predicted | Actual |
6:41:00 | 6:40:42 | 06:38:00 | 8:10:00 | 8:07:25 | 08:05:00 | 6:48:00 | 6:54:31 | 06:56:00 | |
7:40:00 | 7:39:32 | 07:38:00 | 8:16:00 | 8:23:04 | 08:22:00 | 7:14:00 | 7:17:16 | 07:20:00 | |
7:46:00 | 7:47:41 | 07:48:00 | 9:00:00 | 8:56:13 | 08:52:00 | 8:03:00 | 8:06:09 | 08:14:00 | |
8:04:00 | 8:05:03 | 08:08:00 | 9:09:00 | 9:10:04 | 09:09:00 | 8:12:00 | 8:16:26 | 08:21:00 | |
8:25:00 | 8:26:08 | 08:27:00 | 9:45:00 | 9:46:16 | 09:45:00 | 8:30:00 | 8:32:22 | 08:35:00 | |
8:52:00 | 8:57:33 | 09:00:00 | 13:22:00 | 13:21:38 | 13:20:00 | 8:38:00 | 8:40:59 | 08:42:00 | |
10:12:00 | 10:12:40 | 10:10:00 | 13:30:00 | 13:33:27 | 13:40:00 | 9:10:00 | 9:08:23 | 09:08:00 | |
10:21:00 | 10:33:12 | 10:30:00 | 13:58:00 | 13:50:11 | 14:00:00 | 10:14:00 | 10:09:13 | 10:02:00 | |
11:15:00 | 11:18:30 | 11:22:00 | 14:18:00 | 14:20:17 | 14:20:00 | 11:46:00 | 11:48:37 | 11:52:00 | |
11:24:00 | 11:59:17 | 11:56:00 | 16:26:00 | 16:26:02 | 16:32:00 | 12:26:00 | 12:26:04 | 12:24:00 | |
12:20:00 | 12:28:28 | 12:30:00 | 16:32:00 | 16:29:25 | 16:38:00 | 12:36:00 | 12:26:33 | 12:29:00 | |
12:40:00 | 12:49:34 | 12:44:00 | 16:50:00 | 16:50:17 | 17:00:00 | 12:48:00 | 12:40:34 | 12:44:00 | |
16:16:00 | 16:16:13 | 16:05:00 | 16:56:00 | 17:31:29 | 17:22:00 | 13:14:00 | 13:09:34 | 13:04:00 | |
16:32:00 | 16:32:09 | 16:25:00 | 17:20:00 | 17:35:09 | 17:28:00 | 14:04:00 | 13:54:13 | 13:44:00 | |
16:50:00 | 16:49:34 | 16:45:00 | 17:39:00 | 17:44:39 | 17:46:00 | 14:29:00 | 14:19:48 | 14:00:00 | |
16:56:00 | 16:55:05 | 16:52:00 | 18:14:00 | 18:09:01 | 18:00:00 | 17:46:00 | 17:52:22 | 17:55:00 | |
21:12:00 | 21:10:33 | 20:50:00 | 18:30:00 | 18:22:37 | 18:18:00 | 18:48:00 | 18:36:20 | 18:52:00 | |
21:40:00 | 21:46:16 | 21:45:00 | 18:40:00 | 18:31:42 | 18:28:00 | 19:00:00 | 19:04:17 | 19:00:00 | |
22:44:00 | 22:35:13 | 22:30:00 | 19:10:00 | 19:10:50 | 19:10:00 | 20:30:00 | 20:26:43 | 20:20:00 | |
23:00:00 | 23:00:00 | 23:00:00 | 19:20:00 | 19:25:01 | 19:20:00 | 22:12:00 | 21:54:19 | 22:05:00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ge, Z.; Yang, L.; Li, J.; Chen, Y.; Xu, Y. Bus Schedule Time Prediction Based on LSTM-SVR Model. Mathematics 2024, 12, 3589. https://doi.org/10.3390/math12223589
Ge Z, Yang L, Li J, Chen Y, Xu Y. Bus Schedule Time Prediction Based on LSTM-SVR Model. Mathematics. 2024; 12(22):3589. https://doi.org/10.3390/math12223589
Chicago/Turabian StyleGe, Zhili, Linbo Yang, Jiayao Li, Yuan Chen, and Yingying Xu. 2024. "Bus Schedule Time Prediction Based on LSTM-SVR Model" Mathematics 12, no. 22: 3589. https://doi.org/10.3390/math12223589
APA StyleGe, Z., Yang, L., Li, J., Chen, Y., & Xu, Y. (2024). Bus Schedule Time Prediction Based on LSTM-SVR Model. Mathematics, 12(22), 3589. https://doi.org/10.3390/math12223589