Trip Purpose Imputation Using GPS Trajectories with Machine Learning
Abstract
:1. Introduction
2. Literature Review
2.1. Data Sources
2.2. Data Preparation
2.3. Classification Techniques
2.4. Model Performance Assessment
3. Materials and Methods
3.1. Materials
3.2. Methods
4. Results
4.1. Initial Analysis Using Random Forests
4.2. Ensemble Filter with Multiple Classification Algorithms
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Axhausen, K.; Schönfelder, S.; Wolf, J.; Oliveira, M.; Samaga, U. 80 weeks of GPS-traces: Approaches to enriching the trip information. Transp. Res. Rec. 2003, 1870, 46–54. [Google Scholar] [CrossRef]
- Nguyen, M.H.; Armoogum, J.; Madre, J.L.; Garcia, C. Reviewing trip purpose imputation in GPS-based travel surveys. J. Traffic Transp. Eng. 2020, 7, 395–412. [Google Scholar] [CrossRef]
- Ermagun, A.; Fan, Y.; Wolfson, J.; Adomavicius, G.; Das, K. Real-time trip purpose prediction using online location-based search and discovery services. Transp. Res. Part C Emerg. Technol. 2017, 77, 96–112. [Google Scholar] [CrossRef]
- Gong, L.; Morikawa, T.; Yamamoto, T.; Sato, H. Deriving personal trip tata from GPS data: A literature review on the existing methodologies. Procedia Soc. Behav. Sci. 2014, 138, 557–565. [Google Scholar] [CrossRef] [Green Version]
- Meng, C.; Cui, Y.; He, Q.; Su, L.; Gao, J. Towards the Inference of Travel Purpose with Heterogeneous Urban Data. IEEE Trans. Big Data 2019, 1. [Google Scholar] [CrossRef]
- Gong, L.; Kanamori, R.; Yamamoto, T. Data selection in machine learning for identifying trip purposes and travel modes from longitudinal GPS data collection lasting for seasons. Travel Behav. Soc. 2018, 11, 131–140. [Google Scholar] [CrossRef]
- Yazdizadeh, A.; Patterson, Z.; Farooq, B. Ensemble Convolutional Neural Networks for Mode Inference in Smartphone Travel Survey. IEEE Trans. Intell. Transp. Syst. 2020, 21, 2232–2239. [Google Scholar] [CrossRef] [Green Version]
- Deng, Z.; Ji, M. Deriving rules for trip purpose identification from GPS travel survey data and land use data: A machine learning approach. In Proceedings of the 7th International Conference on Traffic and Transportation Studies, Kunming, China, 3–5 August 2010. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, L. Imputing trip purposes for long-distance travel. Transportation 2015, 42, 581–595. [Google Scholar] [CrossRef]
- Janzen, M.; Vanhoof, M.; Axhausen, K.W. Purpose imputation for long-distance tours without personal information. In Arbeitsberichte Verkehrs-und Raumplanung; ETH Zurich: Zurich, Switzerland, 2016; Volume 1181. [Google Scholar] [CrossRef]
- García, S.; Luengo, J.; Herrera, F. Data Preprocessing in Data Mining; Springer: Berlin, Germany, 2015. [Google Scholar]
- García, S.; Luengo, J.; Herrera, F. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 2016, 98, 1–29. [Google Scholar] [CrossRef]
- Frenay, B.; Verleysen, M. Classification in the Presence of Label Noise: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 845–869. [Google Scholar] [CrossRef]
- Zhu, X.; Wu, X. Class Noise vs. Attribute Noise: A Quantitative Study. Artif. Intell. Rev. 2004, 22, 177–210. [Google Scholar] [CrossRef]
- Brodley, C.E.; Friedl, M.A. Identifying and eliminating mislabeled training instances. In Proceedings of the National Conference on Artificial Intelligence, Portland, OR, USA, 4–8 August 1996; pp. 799–805. [Google Scholar]
- Brodley, C.E.; Friedl, M.A. Identifying mislabeled training data. J. Artif. Intell. Res. 1999, 11, 131–167. [Google Scholar] [CrossRef]
- Kotsiantis, S.B.; Kanellopoulos, D.; Pintelas, P.E. Data preprocessing for supervised leaning. Int. J. Comput. Sci. 2006, 1, 111–117. [Google Scholar]
- Lakshminarayan, K.; Harp, S.A.; Samad, T. Imputation of missing data in industrial databases. Appl. Intell. 1999, 11, 259–275. [Google Scholar] [CrossRef]
- Batista, G.E.; Monard, M.C. An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 2003, 17, 519–533. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Ling, C.X.; Li, C. Data mining for direct marketing: Problems and solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; Volume 98, pp. 73–79. [Google Scholar]
- Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; Volume 97, pp. 179–186. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hongkong, China, 1–6 June 2008. [Google Scholar]
- Montini, L.; Rieser-Schüssler, N.; Horni, A.; Axhausen, K. Trip purpose identification from GPS tracks. Transp. Res. Rec. 2014, 2405, 16–23. [Google Scholar] [CrossRef]
- Shen, L.; Stopher, P.R. A process for trip purpose imputation from Global Positioning System data. Transp. Res. Part C Emerg. Technol. 2013, 36, 261–267. [Google Scholar] [CrossRef]
- Oliveira, M.; Vovsha, P.; Wolf, J.; Mitchell, M. Evaluation of two methods for identifying trip purpose in GPS-based household travel surveys. Transp. Res. Rec. 2014, 2405, 33–41. [Google Scholar] [CrossRef]
- Xiao, G.; Juan, Z.; Zhang, C. Detecting trip purposes from smartphone-based travel surveys with artificial neural networks and particle swarm optimization. Transp. Res. Part C Emerg. Technol. 2016, 71, 447–463. [Google Scholar] [CrossRef]
- Montini, L.; Rieser-Schüssler, N.; Axhausen, K.W. Personalisation in multi-day GPS and accelerometer data processing. In Proceedings of the 14th Swiss Transport Research Conference (STRC 2014), Ascona, Switzerland, 14–16 May 2014. [Google Scholar]
- Gong, L.; Yamamoto, T.; Morikawa, T. Comparison of activity type identification from mobile phone GPS data using various machine learning methods. Asian Transp. Stud. 2016, 4, 114–128. [Google Scholar] [CrossRef]
- Feng, T.; Timmermans, H.J. Detecting activity type from GPS traces using spatial and temporal information. Eur. J. Transp. Infrastruct. Res. 2015, 15, 662–674. [Google Scholar] [CrossRef]
- Liao, L.; Fox, D.; Kautz, H. Extracting places and activities from GPS traces using hierarchical conditional random fields. Int. J. Robot. Res. 2007, 26, 119–134. [Google Scholar] [CrossRef]
- Li, A.; Huang, Y.; Axhausen, K.W. An approach to imputing destination activities for inclusion in measures of bicycle accessibility. J. Transp. Geogr. 2020, 82, 102566. [Google Scholar] [CrossRef]
- Garnett, R.; Stewart, R. Comparison of GPS units and mobile Apple GPS capabilities in an urban landscape. Cartogr. Geogr. Inf. Sci. 2015, 42, 1–8. [Google Scholar] [CrossRef]
- Molloy, J.; Castro Fernández, A.; Götschi, T.; Schoeman, B.; Tchervenkov, C.; Tomic, U.; Hintermann, B.; Axhausen, K.W. A national-scale mobility pricing experiment using GPS tracking and online surveys in Switzerland. Response rates and survey method results. Arbeitsberichte-Verkehrs-Und Raumplanung 2020, 1555. [Google Scholar] [CrossRef]
- Molloy, J.; Tchervenkov, C.; Schatzmann, T.; Schoeman, B.; Hintermann, B.; Axhausen, K.W. MOBIS-COVID19/25. Results as of 19/10/2020 (Post-Lockdown). 2020. Available online: https://doi.org/10.3929/ethz-b-000447684 (accessed on 9 November 2021).
- Schlich, R.; Schönfelder, S.; Hanson, S.; Axhausen, K. Structures of Leisure Travel: Temporal and Spatial Variability. Transp. Rev. 2004, 24, 219–237. [Google Scholar] [CrossRef]
- Stauffacher, M.; Schlich, R.; Axhausen, K.W.; Scholz, R.W. The diversity of travel behaviour: Motives and social interactions in leisure time activities. Arbeitsberichte-Verkehrs-Und Raumplanung 2005, 328. [Google Scholar] [CrossRef]
- Wu, X.; Kumar, V. The Top Ten Algorithms in Data Mining; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
- Kowarik, A.; Templ, M. Imputation with the R Package VIM. J. Stat. Softw. 2016, 74, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Ward, J.H., Jr. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
- Lance, G.N.; Williams, W.T. A general theory of classificatory sorting strategies: 1. hierarchical systems. Comput. J. 1967, 9, 373–380. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Song, Y.Y.; Ying, L.U. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Out-of-Bag Estimation; Technical Rerport; Department of Statistics, University of California: Berkeley, CA, USA, 1996. [Google Scholar]
- Breiman, L.; Cutler, A. Manual—Setting Up, Using, and Understanding Random Forests V4.0. 2003. Available online: https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf (accessed on 9 November 2021).
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef] [Green Version]
- Khoshgoftaar, T.M.; Golawala, M.; Van Hulse, J. An empirical study of learning from imbalanced data using random forest. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Patras, Greece, 29–31 October 2007; Volume 2, pp. 310–317. [Google Scholar]
- Abellán, J.; Mantas, C.J.; Castellano, J.G.; Moral-García, S. Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Syst. Appl. 2018, 97, 228–243. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Pang, S.L.; Gong, J.Z. C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks. Syst. Eng. Theory Pract. 2009, 29, 94–104. [Google Scholar] [CrossRef]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
- Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F.; Chang, C.C.; Lin, C.C.; Meyer, M.D. Package ‘e1071’. 2019. Available online: https://cran.r-project.org/web/packages/e1071/e1071.pdf (accessed on 9 November 2021).
- Friedman, J.H. Multivariate Adaptive Regression Splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
- Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A. kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 2004, 11, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Branco, P.; Ribeiro, R.P.; Torgo, L. UBL: An R package for utility-based learning. arXiv 2016, arXiv:1604.08079. [Google Scholar]
- Del Río, S.; López, V.; Benítez, J.M.; Herrera, F. On the use of MapReduce for imbalanced big data using Random Forest. Inf. Sci. 2014, 285, 112–137. [Google Scholar] [CrossRef]
- Guan, D.; Yuan, W. A Survey of mislabeled training data detection techniques for pattern classification. IETE Tech. Rev. 2013, 30, 524–530. [Google Scholar] [CrossRef]
Category | Example Activities | Count | Percent |
---|---|---|---|
Home | Any activities at home | 293,129 | 16.1 |
Work | Any activities at work place | 171,329 | 9.4 |
Leisure | Exercise, travel | 123,735 | 6.8 |
Shopping | Food, clothing | 64,071 | 3.5 |
Other | Transfer | 46,413 | 2.5 |
Errand | Travel for business | 40,119 | 2.2 |
Assistance | Pick up/drop off | 28,189 | 1.5 |
Education | University, school | 12,694 | 0.7 |
Unlabeled | - | 1,041,409 | 57.2 |
Total | - | 1,821,088 | 100 |
Personal-Based | Activity-Based | Cluster-Based |
---|---|---|
Household size | Duration | m(duration) |
Employment * | Start time | std(duration) |
Age | End time | m(start time) |
Annual income * | Day of week * | std(start time) |
If a worker * | Activities per day | m(end time) |
If a student * | - | std(end time) |
- | - | Percentage of weekdays |
- | - | Percentage of activities per cluster |
- | - | Daily occurrence |
- | - | Distance to most often visited cluster |
Predicted | Home | Work | Leisure | Shopping | Errand | Other | Assistance | Education | Accuracy | Precision | |
---|---|---|---|---|---|---|---|---|---|---|---|
Labeled | |||||||||||
Home | 286,000 | 1980 | 2980 | 937 | 720 | 558 | 384 | 20 | 97.4% | 95.8% | |
Work | 3670 | 157,000 | 5400 | 1680 | 1490 | 1270 | 362 | 131 | 91.8% | 92.0% | |
Leisure | 3430 | 3850 | 105,000 | 4670 | 2720 | 2930 | 870 | 190 | 84.9% | 74.0% | |
Shopping | 1340 | 1960 | 7770 | 47,600 | 2690 | 2130 | 511 | 71 | 74.3% | 74.7% | |
Errand | 2020 | 2630 | 7700 | 4400 | 27,000 | 1960 | 560 | 107 | 58.3% | 72.7% | |
Other | 1090 | 1680 | 6870 | 2710 | 1540 | 25,600 | 429 | 192 | 63.9% | 71.7% | |
Assistance | 913 | 1110 | 5430 | 1590 | 861 | 1030 | 17,200 | 50 | 61.1% | 84.5% | |
Education | 64 | 448 | 737 | 149 | 145 | 267 | 39 | 10,800 | 85.4% | 93.4% |
Random Forest | C5.0 | Naive Bayes | MARS | |
---|---|---|---|---|
Original data | 85.8% | 84.7% | 57.2% | 66.7% |
Ensemble filtered (8.5%) data | 93.6% | 93.2% | 61.8% | 73.0% |
Original data, across participants imputation | 68.0% | 65.4% | 57.2% | 66.6% |
Ensemble filtered (8.5%) data, across participants imputation | 74.8% | 72.3% | 62.0% | 72.7% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, Q.; Molloy, J.; Axhausen, K.W. Trip Purpose Imputation Using GPS Trajectories with Machine Learning. ISPRS Int. J. Geo-Inf. 2021, 10, 775. https://doi.org/10.3390/ijgi10110775
Gao Q, Molloy J, Axhausen KW. Trip Purpose Imputation Using GPS Trajectories with Machine Learning. ISPRS International Journal of Geo-Information. 2021; 10(11):775. https://doi.org/10.3390/ijgi10110775
Chicago/Turabian StyleGao, Qinggang, Joseph Molloy, and Kay W. Axhausen. 2021. "Trip Purpose Imputation Using GPS Trajectories with Machine Learning" ISPRS International Journal of Geo-Information 10, no. 11: 775. https://doi.org/10.3390/ijgi10110775
APA StyleGao, Q., Molloy, J., & Axhausen, K. W. (2021). Trip Purpose Imputation Using GPS Trajectories with Machine Learning. ISPRS International Journal of Geo-Information, 10(11), 775. https://doi.org/10.3390/ijgi10110775