Uncertainty-Driven Data Aggregation for Imitation Learning in Autonomous Vehicles
Abstract
:1. Introduction
- We introduce a novel DAgger framework that synergistically integrates Bayesian uncertainty estimation via MFVI and critical state identification for improved imitation learning in autonomous driving.
- We demonstrate that MFVI provides better-calibrated uncertainty estimates compared to MC-dropout, leading to more effective data collection and improved driving performance.
2. Background
2.1. Imitation Learning
2.2. DAgger
Algorithm 1 DAgger |
Collect initial dataset using expert policy Initialize dataset Train initial policy for
N
do Generate on-policy dataset using Ask the expert for labels for to get Aggregate datasets: Train policy end for return
|
2.3. Bayesian Neural Networks and Inference
3. Method
3.1. Imitation Learning for Autonomous Vehicles
3.2. Mean Field Variational Inference
3.3. Critical Scenes
3.4. DAgger with Uncertainty Estimates and Critical States
Algorithm 2 DAgger with Uncertainty Estimate and Critical States |
Collect using expert policy Initialize variational inference network Initialize uncertainty threshold Initialize replay buffer Let for to N do Generate on-policy datasets using sampling critical states from sampling unsafe states:
Get of visited states by expert Combine datasets: while do Sample randomly from end while Train with policy initialized from Training . Update . end for return
|
4. Experiments and Evaluation
4.1. Environment
4.2. Uncertainty Estimation
4.2.1. Mean and Standard Deviation of Uncertainty Estimates
4.2.2. Monte Carlo Sample Size
4.2.3. Infraction Prediction
4.2.4. Expert Ratio
4.3. Driving Performance
4.3.1. Metrics
4.3.2. Baseline and Alternatives
4.3.3. Performance and Ablation Study
4.3.4. Comparison against DARB
4.3.5. Comparison against UAIL and UAIL+
4.3.6. Infraction Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Codevilla, F.; Miiller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1–9. [Google Scholar]
- Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Prakash, A.; Chitta, K.; Geiger, A. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7077–7087. [Google Scholar]
- Xiao, Y.; Codevilla, F.; Gurram, A.; Urfalioglu, O.; López, A.M. Multimodal end-to-end autonomous driving. IEEE Trans. Intell. Transp. Syst. 2020, 23, 537–547. [Google Scholar] [CrossRef]
- Yang, Z.; Zhang, Y.; Yu, J.; Cai, J.; Luo, J. End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2289–2294. [Google Scholar]
- Brantley, K.; Sun, W.; Henaff, M. Disagreement-regularized imitation learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LO, USA, 6–9 May 2019. [Google Scholar]
- Rajaraman, N.; Yang, L.; Jiao, J.; Ramchandran, K. Toward the fundamental limits of imitation learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2914–2924. [Google Scholar]
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
- Mullins, G.E.; Dress, A.G.; Stankiewicz, P.G.; Appler, J.D.; Gupta, S.K. Accelerated testing and evaluation of autonomous vehicles via imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 5636–5642. [Google Scholar]
- Ross, S.; Bagnell, J.A. Reinforcement and imitation learning via interactive no-regret learning. arXiv 2014, arXiv:1406.5979. [Google Scholar]
- Sun, W.; Venkatraman, A.; Gordon, G.J.; Boots, B.; Bagnell, J.A. Deeply aggrevated: Differentiable imitation learning for sequential prediction. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; pp. 3309–3318. [Google Scholar]
- Monfort, M.; Johnson, M.; Oliva, A.; Hofmann, K. Asynchronous data aggregation for training end to end visual control networks. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, Sao Paulo, Brazil, 8–12 May 2017; pp. 530–537. [Google Scholar]
- Prakash, A.; Behl, A.; Ohn-Bar, E.; Chitta, K.; Geiger, A. Exploring data aggregation in policy learning for vision-based urban autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, DC, USA, 14–19 June 2020; pp. 11763–11773. [Google Scholar]
- Zhang, Z.; Liniger, A.; Dai, D.; Yu, F.; Van Gool, L. End-to-end urban driving by imitating a reinforcement learning coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 15222–15232. [Google Scholar]
- Zhang, J.; Cho, K. Query-efficient imitation learning for end-to-end autonomous driving. arXiv 2016, arXiv:1605.06450. [Google Scholar]
- Laskey, M.; Powers, C.; Joshi, R.; Poursohi, A.; Goldberg, K. Learning robust bed making using deep imitation learning with dart. arXiv 2017, arXiv:1711.02525. [Google Scholar]
- Menda, K.; Driggs-Campbell, K.; Kochenderfer, M.J. Ensembledagger: A bayesian approach to safe imitation learning. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 5041–5048. [Google Scholar]
- Menda, K.; Driggs-Campbell, K.; Kochenderfer, M.J. Dropoutdagger: A bayesian approach to safe imitation learning. arXiv 2017, arXiv:1709.06166. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Cronrath, C.; Jorge, E.; Moberg, J.; Jirstrand, M.; Lennartson, B. BAgger: A Bayesian algorithm for safe and query-efficient imitation learning. In Proceedings of the Machine Learning in Robot Motion Planning—IROS 2018 Workshop, Madrid, Spain, 5 October 2018. [Google Scholar]
- Cui, Y.; Isele, D.; Niekum, S.; Fujimura, K. Uncertainty-aware data aggregation for deep imitation learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 761–767. [Google Scholar]
- Wilson, A.G.; Izmailov, P. Bayesian deep learning and a probabilistic perspective of generalization. Adv. Neural Inf. Process. Syst. 2020, 33, 4697–4708. [Google Scholar]
- Michelmore, R.; Wicker, M.; Laurenti, L.; Cardelli, L.; Gal, Y.; Kwiatkowska, M. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 7344–7350. [Google Scholar]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
- Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An introduction to variational methods for graphical models. Mach. Learn. 1999, 37, 183–233. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9329–9338. [Google Scholar]
- Leaderboard, C. CARLA Leaderboard. 2024. Available online: https://leaderboard.carla.org/ (accessed on 16 March 2024).
- Hussein, A.; Gaber, M.M.; Elyan, E.; Jayne, C. Imitation learning: A survey of learning methods. ACM Comput. Surv. (CSUR) 2017, 50, 1–35. [Google Scholar] [CrossRef]
- Jackman, S. Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. Am. J. Political Sci. 2000, 44, 375–404. [Google Scholar] [CrossRef]
- Zhang, C.; Bütepage, J.; Kjellström, H.; Mandt, S. Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2008–2026. [Google Scholar] [CrossRef] [PubMed]
- Ravindran, R.; Santora, M.J.; Jamali, M.M. Camera, lidar, and radar sensor fusion based on bayesian neural network (clr-bnn). IEEE Sens. J. 2022, 22, 6964–6974. [Google Scholar] [CrossRef]
- Verstraete, D.; Droguett, E.; Modarres, M. A deep adversarial approach based on multi-sensor fusion for semi-supervised remaining useful life prognostics. Sensors 2019, 20, 176. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Mao, S.; Zhang, J.; Periaswamy, S.C.; Patton, J. Indoor radio map construction and localization with deep Gaussian processes. IEEE Internet Things J. 2020, 7, 11238–11249. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhu, B.; Yang, S.; Zhao, J.; Deng, W. Vehicle trajectory prediction considering driver uncertainty and vehicle dynamics based on dynamic bayesian network. IEEE Trans. Syst. Man Cybern. Syst. 2022, 53, 689–703. [Google Scholar] [CrossRef]
- Brechtel, S.; Gindele, T.; Dillmann, R. Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 392–399. [Google Scholar]
- Hoel, C.J.; Wolff, K.; Laine, L. Tactical decision-making in autonomous driving by reinforcement learning with uncertainty estimation. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1563–1569. [Google Scholar]
- Ma, J.; Xie, H.; Song, K.; Liu, H. A bayesian driver agent model for autonomous vehicles system based on knowledge-aware and real-time data. Sensors 2021, 21, 331. [Google Scholar] [CrossRef]
- Minderhoud, M.M.; Bovy, P.H. Extended time-to-collision measures for road traffic safety assessment. Accid. Anal. Prev. 2001, 33, 89–97. [Google Scholar] [CrossRef]
- Van der Horst, A.R.A. A Time-Based Analysis of Road User Behaviour in Normal and Critical Encounters; TU Delft Library: Delft, The Netherlands, 1991. [Google Scholar]
- Ramezanı-khansarı, E.; Nejad, F.M.; Moogeh, S. Comparing time to collision and time headway as safety criteria. Pamukkale Üniversitesi Mühendislik Bilim. Derg. 2020, 27, 669–675. [Google Scholar] [CrossRef]
- Ovadia, Y.; Fertig, E.; Ren, J.; Nado, Z.; Sculley, D.; Nowozin, S.; Dillon, J.; Lakshminarayanan, B.; Snoek, J. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Adv. Neural Inf. Process. Syst. 2019, 32, 1254. [Google Scholar]
Action | Conditions | Uncertainty Value (u,) | |||
---|---|---|---|---|---|
Follow | Left | Right | Straight | ||
Steer | Train | (0.597,0.169) | (0.782,0.197) | (0.742,0.187) | (0.627,0.156) |
NW | (0.624,0.169) | (0.774,0.212) | (0.792,0.241) | (0.674,0.201) | |
NT | (0.627,0.187) | (0.775,0.272) | (0.772,0.236) | (0.672,0.244) | |
NWT | (0.644,0.193) | (0.795,0.247) | (0.812,0.263) | (0.694,0.274) | |
Throttle | Train | (0.352,0.262) | (0.447,0.214) | (0.478,0.274) | (0.382,0.198) |
NW | (0.372,0.257) | (0.467,0.244) | (0.493,0.179) | (0.342,0.231) | |
NT | (0.379,0.264) | (0.455,0.264) | (0.497,0.146) | (0.368,0.245) | |
NWT | (0.424,0.287) | (0.461,0.234) | (0.498,0.174) | (0.391,0.227) | |
Brake | Train | (0.742,0.145) | (0.421,0.124) | (0.447,0.126) | (0.712,0.110) |
NW | (0.765,0.121) | (0.415,0.121) | (0.490,0.132) | (0.722,0.152) | |
NT | (0.810,0.114) | (0.427,0.112) | (0.493,0.146) | (0.730,0.154) | |
NWT | (0.801,0.132) | (0.433,0.164) | (0.489,0.263) | (0.732,0.167) |
Action | Iter | Uncertainty Value (u,) | |||
---|---|---|---|---|---|
Follow | Left | Right | Straight | ||
Steer | Iter 0 | (0.597,0.169) | (0.782,0.197) | (0.742,0.187) | (0.627,0.156) |
Iter 1 | (0.554,0.154) | (0.724,0.223) | (0.702,0.217) | (0.594,0.201) | |
Iter 2 | (0.490,0.156) | (0.695,0.190) | (0.684,0.245) | (0.523,0.188) | |
Iter 3 | (0.467,0.142) | (0.674,0.247) | (0.652,0.193) | (0.497,0.174) | |
Iter 4 | (0.468,0.141) | (0.664,0.247) | (0.654,0.189) | (0.497,0.187) | |
Throttle | Iter 0 | (0.352,0.262) | (0.447,0.214) | (0.478,0.274) | (0.382,0.198) |
Iter 1 | (0.334,0.257) | (0.425,0.241) | (0.452,0.187) | (0.337,0.214) | |
Iter 2 | (0.319,0.214) | (0.401,0.197) | (0.421,0.146) | (0.318,0.241) | |
Iter 3 | (0.302,0.155) | (0.394,0.210) | (0.402,0.148) | (0.304,0.187) | |
Iter 4 | (0.289,0.164) | (0.374,0.232) | (0.432,0.165) | (0.324,0.196) | |
Brake | Iter 0 | (0.742,0.145) | (0.421,0.124) | (0.447,0.126) | (0.712,0.110) |
Iter 1 | (0.720,0.137) | (0.396,0.210) | (0.402,0.210) | (0.682,0.201) | |
Iter 2 | (0.702,0.156) | (0.368,0.192) | (0.394,0.176) | (0.662,0.175) | |
Iter 3 | (0.675,0.135) | (0.334,0.126) | (0.380,0.193) | (0.630,0.142) | |
Iter 4 | (0.660,0.136) | (0.326,0.107) | (0.362,0.142) | (0.631,0.137) |
DS (%, ↑) | RC (%, ↑) | IS (↑) | CO (↓) | CP (↓) | CV (↓) | RI (↓) | AB (↓) | |
---|---|---|---|---|---|---|---|---|
Unit | %, ↑ | %, ↑ | %, ↑ | /Km, ↓ | /Km, ↓ | /Km, ↓ | /Km, ↓ | /Km, ↓ |
DARB | 21.12 ± 2 | 27.8 ± 5 | 76 ± 3 | 1.26 ± 0.21 | 1.92 ± 0.32 | 0.96 ± 0.11 | 1.32 ± 0.04 | 3.26 ± 0.14 |
CILRS | 5.37 ± 2 | 14.4 ± 2 | 55 ± 1 | 2.35 ± 0.25 | 2.69 ± 0.25 | 1.48 ± 0.18 | 1.62 ± 0.08 | 4.28 ± 0.20 |
DA_UE | 27.9 ± 3 | 34.8 ± 3 | 79 ± 1 | 1.01 ± 0.26 | 1.62 ± 0.23 | 0.76 ± 0.12 | 1.56 ± 0.04 | 3.02 ± 0.18 |
DA_CS | 18.9 ± 2 | 26.4 ± 3 | 71 ± 2 | 1.36 ± 0.14 | 1.82 ± 0.34 | 1.20 ± 0.17 | 1.29 ± 0.08 | 3.21 ± 0.21 |
DA_UE+CS | 30.1 ± 2 | 36.7 ± 4 | 82 ± 4 | 0.96 ± 0.17 | 1.42 ± 0.19 | 0.72 ± 0.13 | 0.94 ± 0.05 | 2.84 ± 0.18 |
UAIL+ | 9.77 ± 2 | 17.4 ± 2 | 63 ± 1 | 2.11 ± 0.36 | 2.42 ± 0.21 | 1.35 ± 0.17 | 1.45 ± 0.14 | 3.27 ± 0.16 |
UAIL | 7.37 ± 2 | 16.4 ± 2 | 59 ± 1 | 2.26 ± 0.26 | 2.79 ± 0.22 | 1.43 ± 0.18 | 1.42 ± 0.08 | 4.01 ± 0.20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.; Wang, Y. Uncertainty-Driven Data Aggregation for Imitation Learning in Autonomous Vehicles. Information 2024, 15, 336. https://doi.org/10.3390/info15060336
Wang C, Wang Y. Uncertainty-Driven Data Aggregation for Imitation Learning in Autonomous Vehicles. Information. 2024; 15(6):336. https://doi.org/10.3390/info15060336
Chicago/Turabian StyleWang, Changquan, and Yun Wang. 2024. "Uncertainty-Driven Data Aggregation for Imitation Learning in Autonomous Vehicles" Information 15, no. 6: 336. https://doi.org/10.3390/info15060336
APA StyleWang, C., & Wang, Y. (2024). Uncertainty-Driven Data Aggregation for Imitation Learning in Autonomous Vehicles. Information, 15(6), 336. https://doi.org/10.3390/info15060336