Bidirectional Planning for Autonomous Driving Framework with Large Language Model
Abstract
:1. Introduction
- Proposing SafeMod, a modular framework for autonomous navigation powered by LLMs, integrating intent inference within the forward planner and including a backward planner for adaptive decision-making.
- Introducing a novel bidirectional planning approach that combines forward strategy generation (via the forward planner and intent analyzer) and backward safety evaluation (via the backward planner), ensuring optimal performance and strict safety compliance in dynamic environments.
- Demonstrating through extensive experiments on nuScenes and CARLA that SafeMod improves navigation success rates and safety performance, outperforming current state-of-the-art methods.
2. Related Work
2.1. LLMs for Autonomous Driving Decision-Making
2.2. Knowledge-Driven Approaches for Autonomous Driving
3. Problem Formulation
- Forward Planning: Given the current state and the inferred intent I, the forward planner generates a sequence of actions:
- Backward Planning: After forward planning, the backward planner evaluates the safety and feasibility of the generated trajectory . The backward planning phase involves minimizing a risk function to ensure the trajectory is safe:The plan is adjusted iteratively to minimize risks, ensuring the trajectory remains within the safe operating bounds:
4. Methodology
4.1. Forward Planning
4.1.1. BEV-Planning Module
4.1.2. Video Sense Module
- CHANGELANELEFT: Move one lane to the left.
- CHANGELANERIGHT: Move one lane to the right.
- LANEFOLLOW: Continue in the current lane.
- LEFT: Turn left at the intersection.
- RIGHT: Turn right at the intersection.
- STRAIGHT: Keep straight at the intersection.
4.2. Backward Planning
- Representation Model: This model encodes the latent state based on past states, actions, and observations, expressed as:
- Transition Model: The transition model predicts future states through a Gaussian distribution, ensuring consistency between the predicted and actual dynamics:
- Reward Model: This model calculates the expected rewards for each state, optimizing agent actions by:
4.2.1. Q-Value Function
4.2.2. Policy Optimization
4.2.3. Safety Guarantee
4.2.4. Policy Optimization
5. Algorithm Overview
Algorithm 1: SafeMod Framework with Detailed Modules. |
|
6. Experiments
6.1. Environmental Setup
6.1.1. Experimental Setup in Dataset
6.1.2. Experimental Setup in CARLA Simulator
Urban Driving Environments
- Town 5: Features a diverse mix of urban and suburban layouts with various road types, including highways, sharp turns, and multi-lane streets. This environment provides a balanced challenge between high-speed driving and precision navigation through complex intersections and roundabouts.
- Route Length: In total, 100–500 m, ideal for testing quick decision-making in tight, complex environments.
- Number of Intersections: Three intersections, allowing frequent navigation decisions.
- Test Focus: Evaluates the model’s handling of lane changes in dense traffic and intersections, crucial for short-term challenges in busy settings.
- Route Length: In total, 1000–2000 m, testing endurance and reliability over extended distances.
- Number of Intersections: Ten intersections, providing multiple decision points.
- Test Focus: Assesses overall performance on long routes, focusing on route completion, safety, and consistency in dynamic traffic environments.
6.2. Evaluation Metrics
6.2.1. Open-Loop Metrics
- L2 Metric: The L2 metric (L2), or Euclidean distance, plays a vital role in evaluating trajectory precision for self-driving vehicles. In a two-dimensional plane, the L2 distance between points and is mathematically expressed as:In the context of trajectory prediction, the L2 metric quantifies the deviation of predicted vehicle positions from actual positions, allowing for effective evaluation of model performance. To incorporate time information into the L2 metric, we express the L2 error at the k-th second as the mean error from 0 to k seconds:This formula calculates the average error over the specified time period, providing a comprehensive assessment of trajectory accuracy. The final average L2 error is computed by averaging across three timesteps, effectively producing an average of averages.
- Collision Rate: The collision rate (Collision%) is a fundamental metric used to evaluate the safety performance of autonomous driving systems. It is traditionally defined as the ratio of the number of collision events to the total distance traveled or time duration. Mathematically, it can be expressed as:This simple approach provides an initial measure of safety performance by evaluating how frequently a system encounters collisions relative to the distance covered. However, in open-loop evaluation scenarios, where the vehicle operates without real-time feedback, a more detailed method is often applied for a more accurate and reliable measure of the collision rate.In this context, we use the method shown in the following equation:Here, represents the collision rate at step k, and denotes the number of collision events observed at each time step t. By summing the collision events over time steps and averaging the result, this approach provides a smoother and more consistent measure of the system’s performance. It effectively reduces the impact of short-term fluctuations in collision frequency, ensuring a robust evaluation of safety over longer periods of operation.
6.2.2. Closed-Loop Metrics
- Route Completion (RC): This criterion quantifies the fraction of each path that the autonomous vehicle completes independently. It reflects the system’s capacity to follow the predetermined path. The metric is computed using this formula:
- Driving Score (DS): This is the primary evaluation metric used on the leader board, combining route completion with an infraction penalty to assess both the accuracy and safety of the agent’s driving. It is defined as:
6.2.3. Real-Time Performance
- Baseline Comparison: We first measured the performance of the original SafeMod system without video sense module as the baseline.
- SF, UH, and TF Integration: Next, we evaluated the system with UH+TF (for enhanced decision-making and contextual understanding), SF (for improved perception and object recognition).
- Metrics: The key metrics recorded were:
- −
- Inference Latency: Time taken from input sensor data to output control actions (measured in milliseconds).
- −
- Frame Rate: The frequency of decision-making (measured in frames per second, FPS).
6.3. Baseline Setup
- ST-P3 [36]: ST-P3 (Spatial-Temporal Perception-Prediction-Planning) presents a comprehensive vision-driven system for autonomous vehicles. It unifies perception, prediction, and planning through spatio-temporal feature extraction. By minimizing perceptual redundancies, this method enhances predictive precision and planning safety, resulting in superior collision avoidance capabilities in dynamic driving environments.
- VAD [22]: VAD (Vectorized Autonomous Driving) is a framework for efficient autonomous driving that utilizes vectorized scene representation. It processes complex driving environments by simplifying the perception, prediction, and planning tasks into manageable vectors. This vectorized approach enables faster decision-making and higher efficiency in dynamic environments without relying on traditional deep reinforcement learning methods.
- UniAD [46]: UniAD (Unified Autonomous Driving) is a unified framework for autonomous driving that integrates perception, prediction, and planning into a single network. UniAD prioritizes all tasks to directly contribute to planning, reducing errors and improving task coordination. By using unified query interfaces, it facilitates communication between tasks and provides complementary feature abstractions for agent interaction. Evaluated on the nuScenes benchmark, UniAD outperforms previous state-of-the-art methods across all metrics. Code and models are publicly available.
- CILRS [20]: CILRS (Conditional Imitation Learning for Autonomous Driving with Reinforcement and Supervision) is a framework which investigates behavior cloning in autonomous driving, demonstrating state-of-the-art results in unseen environments, while highlighting limitations such as dataset bias, generalization issues, and training instability.
- Transfuser [47]: Transfuser introduced a fusion technique based on self-attention for combining image and LiDAR data in autonomous driving systems. In contrast to fusion methods relying on geometry, which face challenges in crowded and changing environments, Transfuser employs transformer components for merging feature representations from both perspective and top-down viewpoints at various scales.
- GPT-Driver [48]: GPT-Driver proposed a novel approach that transforms OpenAI’s GPT-3.5 into a reliable motion planner for autonomous vehicles by reformulating motion planning as a language modeling problem. Using language tokens for input and output, the large language model generates driving trajectories through language descriptions of coordinate positions. Evaluated on the nuScenes dataset, this approach demonstrates strong generalization, effectiveness, and interpretability.
7. Results and Analysis
7.1. Open-Loop Evaluation
7.2. Closed-Loop Evaluation
7.3. Ablation Study
7.3.1. VLM Estimate Validation
- SF Only (first row): With only the Sense Function active, the system produces an average L2 error of 1.07 m and a collision rate of 0.65%. This result indicates that, without the forward-thinking and state-updating modules, the system faces challenges in maintaining accurate trajectories and avoiding collisions.
- SF + TF (second row): When the Thinking Forward module is added, the system’s performance improves significantly, reducing the L2 error to 0.77 m and cutting the collision rate to 0.26%. This demonstrates that forward-looking planning contributes greatly to both trajectory accuracy and safety.
- SF + UH + TF (third row): Activating all three modules results in the best overall performance, with an average L2 error of 0.71 m and a collision rate of just 0.22%. This shows that the integration of state updating, forward-thinking planning, and sensing leads to the most balanced and optimal results, improving both trajectory accuracy and safety.
7.3.2. Real-Time Performance Validation
- Inference Latency: As presented in Table 4, the inference latency increases from 50 ms in the BP configuration to 75 ms with the addition of the SF module, and further to 92 ms when both UH and TF modules are incorporated. This rise is attributed to the enhanced computational complexity introduced by these additional modules. Nevertheless, the system maintains inference times that are acceptable for real-time operations.
- Frame Rate: The frame rate remains consistent at 20 FPS when the SF module is added to the BP configuration and experiences a slight decrease to 19 FPS with the inclusion of UH and TF modules. This performance ensures that the system stays well above the critical threshold of 15 FPS required for safe, real-time autonomous driving.
7.3.3. Backward Performance Test
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, J.; Li, H.; Liu, J.; Zou, Z.; Ye, X.; Wang, F.; Wang, H. Exploring the Causality of End-to-End Autonomous Driving. arXiv 2024, arXiv:2407.06546. [Google Scholar]
- Ma, Y.; Cui, C.; Cao, X.; Ye, W.; Liu, P.; Lu, J.; Wang, Z. Lampilot: An open benchmark dataset for autonomous driving with language model programs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 15141–15151. [Google Scholar]
- Atakishiyev, S.; Salameh, M.; Yao, H.; Goebel, R. Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions. IEEE Access 2024, 12, 3431437. [Google Scholar] [CrossRef]
- Parekh, D.; Poddar, N.; Rajpurkar, A.; Chahal, M.; Kumar, N.; Joshi, G.P.; Cho, W. A review on autonomous vehicles: Progress, methods and challenges. Electronics 2022, 11, 2162. [Google Scholar] [CrossRef]
- Zhang, Z.; Li, J. A review of artificial intelligence in embedded systems. Micromachines 2023, 14, 897. [Google Scholar] [CrossRef]
- Zhang, Z.; Fisac, J.F. Safe Occlusion-aware Autonomous Driving via Game-Theoretic Active Perception. arXiv 2021, arXiv:2105.08169v2. [Google Scholar]
- Li, Y.; Wang, J.; Lu, X.; Shi, T.; Xu, Q.; Li, K. Pedestrian trajectory prediction at un-signalized intersection using probabilistic reasoning and sequence learning. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 1047–1053. [Google Scholar]
- Sana, F.; Azad, N.L.; Raahemifar, K. Autonomous vehicle decision-making and control in complex and unconventional scenarios—A review. Machines 2023, 11, 676. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G. nuScenes: A multimodal dataset for autonomous driving. arXiv 2020, arXiv:1903.11027v5. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the Conference on Robot Learning (CoRL), Mountain View, CA, USA, 13–15 November 2017; Volume 78. [Google Scholar]
- Zhou, Z.; Zhang, J.; Zhang, J.; Wang, B.; Shi, T.; Khamis, A. In-context Learning for Automated Driving Scenarios. arXiv 2024, arXiv:2405.04135. [Google Scholar]
- Sadigh, D.; Sastry, S.S.; Seshia, S.A.; Dragan, A. Information gathering actions over human internal state. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 66–73. [Google Scholar]
- Sun, L.; Zhan, W.; Tomizuka, M.; Dragan, A.D. Courteous autonomous cars. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 663–670. [Google Scholar]
- Xu, Z.; Zhang, Y.; Xie, E.; Zhao, Z.; Guo, Y.; Wong, K.K.; Li, Z.; Zhao, H. DriveGPT4: Interpretable end-to-end autonomous driving via large language model. arXiv 2023, arXiv:2310.01412. [Google Scholar] [CrossRef]
- Sha, H.; Mu, Y.; Jiang, Y.; Chen, L.; Xu, C.; Luo, P.; Li, S.E.; Tomizuka, M.; Zhan, W.; Ding, M. LanguageMPC: Large language models as decision makers for autonomous driving. arXiv 2023, arXiv:2310.03026. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3D object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Li, Z.; Wang, W.; Li, H.; Xie, E.; Sima, C.; Lu, T.; Qiao, Y.; Dai, J. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–18. [Google Scholar]
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.L.; Han, S. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 2774–2781. [Google Scholar]
- Zhang, Z.; Liniger, A.; Dai, D.; Yu, F.; Van Gool, L. End-to-end urban driving by imitating a reinforcement learning coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15222–15232. [Google Scholar]
- Codevilla, F.; Santana, E.; Lopez, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Chen, D.; Zhou, B.; Koltun, V.; Krahenbuhl, P. Learning by cheating. In Proceedings of the Conference on Robot Learning. PMLR, Virtual, 16–18 November 2020; pp. 66–75. [Google Scholar]
- Jiang, B.; Chen, S.; Xu, Q.; Liao, B.; Zhou, H.; Zhang, Q.; Liu, W.; Wang, X.; Huang, C. VAD: Vectorized Scene Representation for Efficient Autonomous Driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023. [Google Scholar]
- Peng, Y.; Han, J.; Zhang, Z.; Fan, L.; Liu, T.; Qi, S.; Feng, X.; Ma, Y.; Wang, Y.; Zhu, S.C. The TONG Test: Evaluating Artificial General Intelligence through Dynamic Embodied Physical and Social Interactions. Engineering 2023, 34, 12–22. [Google Scholar] [CrossRef]
- Liu, Z.; Jiang, H.; Tan, H.; Zhao, F. An overview of the latest progress and core challenge of autonomous vehicle technologies. MATEC Web Conf. 2020, 308, 06002. [Google Scholar] [CrossRef]
- Dou, F.; Ye, J.; Yuan, G.; Lu, Q.; Niu, W.; Sun, H.; Guan, L.; Lu, G.; Mai, G.; Liu, N.; et al. Towards artificial general intelligence (AGI) in the Internet of Things (IoT): Opportunities and challenges. arXiv 2023, arXiv:2309.07438. [Google Scholar]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The rise and potential of large language model based agents: A survey. arXiv 2023, arXiv:2309.07864. [Google Scholar]
- Li, X.; Bai, Y.; Cai, P.; Wen, L.; Fu, D.; Zhang, B.; Yang, X.; Cai, X.; Ma, T.; Guo, J.; et al. Towards knowledge-driven autonomous driving. arXiv 2023, arXiv:2312.04316. [Google Scholar]
- Fu, D.; Li, X.; Wen, L.; Dou, M.; Cai, P.; Shi, B.; Qiao, Y. Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 910–919. [Google Scholar]
- Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Zhou, Y.; Liang, K.; Chen, J.; Lu, J.; Yang, Z.; Liao, K.D.; et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 958–979. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Jiang, B.; Chen, S.; Wang, X.; Liao, B.; Cheng, T.; Chen, J.; Zhou, H.; Zhang, Q.; Liu, W.; Huang, C. Perceive, interact, predict: Learning dynamic and static clues for end-to-end motion prediction. arXiv 2022, arXiv:2212.02181. [Google Scholar]
- Ngiam, J.; Caine, B.; Vasudevan, V.; Zhang, Z.; Chiang, H.T.L.; Ling, J.; Roelofs, R.; Bewley, A.; Liu, C.; Venugopal, A.; et al. Scene transformer: A unified architecture for predicting multiple agent trajectories. arXiv 2021, arXiv:2106.08417. [Google Scholar]
- Liao, B.; Chen, S.; Wang, X.; Cheng, T.; Zhang, Q.; Liu, W.; Huang, C. MapTR: Structured modeling and learning for online vectorized HD map construction. arXiv 2021, arXiv:2208.14437. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Hu, S.; Chen, L.; Wu, P.; Li, H.; Yan, J.; Tao, D. ST-P3: End-to-End Vision-Based Autonomous Driving via Spatial-Temporal Feature Learning. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Hu, Y.; Yang, J.; Chen, L.; Li, K.; Sima, C.; Zhu, X.; Chai, S.; Du, S.; Lin, T.; Wang, W.; et al. Goal-oriented autonomous driving. arXiv 2022, arXiv:2212.10156. [Google Scholar]
- Zhu, B.; Lin, B.; Ning, M.; Yan, Y.; Cui, J.; Wang, H.; Pang, Y.; Jiang, W.; Zhang, J.; Li, Z.; et al. LanguageBind: Extending Video-Language Pretraining to N-Modality by Language-Based Semantic Alignment. arXiv 2023, arXiv:2310.01852. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Luo, H.; Ji, L.; Zhong, M.; Chen, Y.; Lei, W.; Duan, N.; Li, T. CLIP4Clip: An Empirical Study of CLIP for End-to-End Video Clip Retrieval. arXiv 2021, arXiv:2104.08860. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Kuttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; Volume 33. [Google Scholar]
- Wang, J.; Yi, X.; Guo, R.; Jin, H.; Xu, P.; Li, S.; Wang, X.; Guo, X.; Li, C.; Xu, X.; et al. Milvus: A Purpose-Built Vector Data Management System. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Xi’an, China, 20–25 June 2021. [Google Scholar]
- Hafner, D.; Lillicrap, T.; Ba, J.; Norouzi, M. Dream to control: Learning behaviors by latent imagination. arXiv 2019, arXiv:1912.01603. [Google Scholar]
- Mavrin, B.; Yao, H.; Kong, L.; Wu, K.; Yu, Y. Distributional reinforcement learning for efficient exploration. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 4424–4434. [Google Scholar]
- Hu, Y.; Yang, J.; Chen, L.; Li, K.; Sima, C.; Zhu, X.; Chai, S.; Du, S.; Lin, T.; Wang, W.; et al. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Chitta, K.; Prakash, A.; Jaeger, B.; Yu, Z.; Renz, K.; Geiger, A. TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving. arXiv 2022, arXiv:2205.15997v1. [Google Scholar] [CrossRef]
- Mao, J.; Qian, Y.; Ye, J.; Zhao, H.; Wang, Y. GPT-Driver: Learning to drive with GPT. arXiv 2023, arXiv:2310.01415v3. [Google Scholar]
- Hu, P.; Huang, A.; Dolan, J.; Held, D.; Ramanan, D. Safe local motion planning with self-supervised freespace forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Khurana, T.; Hu, P.; Dave, A.; Ziglar, J.; Held, D.; Ramanan, D. Differentiable raycasting for self-supervised occupancy forecasting. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Method | L2 (m) ↓ | Collision (%) ↓ | ||||||
---|---|---|---|---|---|---|---|---|
1 s | 2 s | 3 s | Avg. | 1 s | 2 s | 3 s | Avg. | |
ST-P3 [36] | 1.35 | 1.91 | 2.75 | 2.00 | 0.25 | 0.72 | 1.31 | 0.76 |
VAD [22] | 0.31 | 0.79 | 1.52 | 0.87 | 0.06 | 0.15 | 0.48 | 0.23 |
FF [49] | 0.56 | 1.21 | 2.56 | 1.44 | 0.09 | 0.21 | 1.09 | 0.46 |
EO [50] | 0.62 | 1.41 | 2.42 | 1.48 | 0.06 | 0.17 | 1.12 | 0.45 |
UniAD [46] | 0.51 | 0.98 | 1.71 | 1.07 | 0.07 | 0.13 | 0.74 | 0.31 |
GPT-Driver [48] | 0.28 | 0.81 | 1.56 | 0.88 | 0.09 | 0.17 | 1.12 | 0.46 |
SafeMod | 0.25 | 0.67 | 1.41 | 0.78 | 0.04 | 0.12 | 0.44 | 0.20 |
Method | Town05 Short | Town05 Long | ||
---|---|---|---|---|
DS↑ | RC↑ | DS↑ | RC↑ | |
CILRS [20] | 7.43 | 13.47 | 3.71 | 7.21 |
Transfuser [47] | 55.55 | 80.03 | 32.17 | 57.41 |
VAD [22] | 65.32 | 88.14 | 31.01 | 74.94 |
ST-P3 [36] | 54.88 | 86.32 | 11.04 | 83.03 |
SafeMod | 65.45 | 88.84 | 32.02 | 78.66 |
Step | L2 (m) ↓ | Collision (%) ↓ | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
SF | UH | TF | 1 s | 2 s | 3 s | Avg. | 1 s | 2 s | 3 s | Avg. |
✓ | - | - | 0.68 | 0.91 | 1.62 | 1.07 | 0.44 | 0.62 | 0.90 | 0.65 |
✓ | - | ✓ | 0.44 | 0.75 | 1.46 | 0.77 | 0.12 | 0.23 | 0.44 | 0.26 |
✓ | ✓ | ✓ | 0.41 | 0.69 | 1.41 | 0.71 | 0.07 | 0.17 | 0.41 | 0.22 |
System Configuration | Inference Latency (ms) | Frame Rate (FPS) |
---|---|---|
BP | 50 | 20 |
BP + SF | 75 | 20 |
BP + SF + UH + TF | 92 | 19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, Z.; Sun, Q.; Matsumaru, T. Bidirectional Planning for Autonomous Driving Framework with Large Language Model. Sensors 2024, 24, 6723. https://doi.org/10.3390/s24206723
Ma Z, Sun Q, Matsumaru T. Bidirectional Planning for Autonomous Driving Framework with Large Language Model. Sensors. 2024; 24(20):6723. https://doi.org/10.3390/s24206723
Chicago/Turabian StyleMa, Zhikun, Qicong Sun, and Takafumi Matsumaru. 2024. "Bidirectional Planning for Autonomous Driving Framework with Large Language Model" Sensors 24, no. 20: 6723. https://doi.org/10.3390/s24206723
APA StyleMa, Z., Sun, Q., & Matsumaru, T. (2024). Bidirectional Planning for Autonomous Driving Framework with Large Language Model. Sensors, 24(20), 6723. https://doi.org/10.3390/s24206723