From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR
Abstract
:1. Introduction
- This study demonstrates that DDPG reinforcement learning networks can be effectively trained and applied to autonomous driving technology, achieving unsupervised learning tasks with 3D-LiDAR sensors.
- In the processing stage of point cloud data, this study adopts the classical PointNet-based algorithm to directly process the point cloud data instead of voxelization and combines with self-attention to optimize the data, which are inputted into the reinforcement learning network to cope with the challenge of 3D-LiDAR data in a realistic environment.
- After the implementation of unsupervised autonomous driving in a simulator, this study successfully built and deployed a realistic 1/10-scale experimental vehicle, named F1tenth, to complete autonomous driving tasks in real environments, advancing real-road condition applications and bridging the gap between simulation and reality.
2. Related Work
3. Methodology
3.1. Reinforcement Learning Network
3.2. Point Cloud Processing Algorithm
- Point cloud input
- 2.
- Initial feature extraction
- 3.
- Local Feature Extraction (KNN and Local Attention)
- 4.
- Global feature extraction (multi-head attention at full point cloud scope)
- 5.
- Multi-scale feature fusion
- 6.
- Global pooling
- 7.
- Feature Dimension Expansion
4. Experiments and Results
4.1. Implementation in Carla Simulator
- Network Initialization: The first step is to define the structure of the Actor and Critic networks, including the input, hidden, and output layers. The learning rate of the actor is 0.001. The learning rate of the critic is 0.0001. The discount factor γ is 0.99. The range of weights is [−0.003, 0.003]. The weights and biases of the network are initialized in this process. The weight matrix is initialized with random values to control the effect of the input data on each neuron, with a core role as a linear transformation. Bias vectors are initialized to zero, adjusting neuron outputs to enhance model expressiveness. During subsequent training, the optimization algorithm minimizes the loss function by calculating the gradient and constantly adjusting both. Target networks for training stability are initialized with weights identical to the main network. The replay buffer is initialized empty with a 10,000-capacity storage. Each experience corresponds to a state transition tuple, containing the current state, action, reward, next state, and terminal state Boolean.
- Agent Generation: To prevent policy stagnation and enhance generalization, vehicles are spawned at random map locations for each episode, simulating real-world randomness and improving exploration efficiency. To prevent the model’s strategy from falling into a fixed pattern and to enhance the generalization ability of the agent, the vehicle is spawned at random map locations for each episode. It also simulates the randomness of the real world and enhances the agent’s exploration efficiency.
- Environmental Information by sensor: The 3D-LiDAR inputs point cloud data containing three-dimensional coordinates.
- Point Cloud Processing: Global features are extracted using the study’s point cloud processing algorithm.
- Actor-network Input: The 1024-dimensional features pass through two fully connected layers with 64 neurons each, activated by ReLU, outputting action vectors a1 and a2. a1 represents acceleration, ranging between −1 and +1, while a2 represents steering angle, ranging between −15 degrees and +15 degrees. Due to the nature of the DDPG, the output of actions allows for smooth steering angles and throttle control instead of the simple left or right direction in the DQN method. The state selects the action output after adding Gaussian noise according to the current strategy. Following the task objective, the reward function is designed as follows: the agent obtains a reward of 0.01 if it chooses to go forward; when it collides with an obstacle, the reward is −1 and the episode is terminated. Since pulling out of the lane also results in a collision with the obstacle, there is no need to set it separately. After executing an action, it receives signals from the environment about the next state, the reward, and whether to terminate or not. Next, the transfer data are stored in the replay buffer.
- Sample Data from the Replay Buffer: When the amount of data in the buffer exceeds the minimum batch size, a small random batch is sampled from the buffer.
- Critic-network Implementation: This inputs state and action to output the scalar expected returns under the current policy. States are represented by point cloud global feature vectors. Actions come from the actor-network output. Both pass through fully connected layers for feature extraction, and then the state features and action features are fused into a vector that outputs the Q-value through the last layer of the fully connected network. The critic network uses the mean square error loss to update the weights. The output Q-value is used to guide the actor network to improve the driving strategy of the vehicle.
- Actor-network Implementation: Following the critic network’s performance evaluation, this step calculates actor network gradients and updates the parameters.
- Target Network Updates: Soft updates maintain stability after the main network updates with .
4.2. Implement in Real Environment
- ¯
- Chassis: The chassis used in this experiment is from Traxxas, a simplified version of the chassis structure that is as close as possible to a real car. The size of the experimental car is one-tenth of the vehicle on the road. It uses the Ackermann steering mechanism, which is commonly employed in automotive design to minimize tire slip and ensure that each wheel follows its optimum path during turns. This setup is crucial for replicating the real dynamics of vehicle steering and control.
- ¯
- Battery: A lithium-polymer battery with a capacity of 5000 MAH was used in this experiment.
- ¯
- Motor: A high-performance brushless motor that provides precise control over the car’s speed and maneuverability.
- ¯
- NVIDIA Jetson TX2: The primary computing unit that runs all the deep learning models, including the DDPG algorithm and the point cloud processing module. The Jetson TX2 was selected for its computational power while maintaining a small size, which is critical for real-time processing.
- ¯
- Communication: The vehicle communicates with the station through an antenna for remote input of commands, monitoring, data logging, and algorithm tuning.
- ¯
- VESC (Vedder Electronic Speed Controller): A component responsible for controlling the speed and steering of the vehicle. The VESC is directly interfaced with the NVIDIA Jetson TX2, enabling precise control based on the actions determined by the DDPG model.
- ¯
- Powerboard: Distributes power to all components, ensuring stable operation and proper voltage levels.
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Igliński, H.; Babiak, M. Analysis of the potential of autonomous vehicles in reducing the emissions of greenhouse gases in road transport. Procedia Eng. 2017, 192, 353–358. [Google Scholar] [CrossRef]
- Uzzaman, A.; Muhammad, W. A Comprehensive Review of Environmental and Economic Impacts of Autonomous Vehicles. Control Syst. Optim. Lett. 2024, 2, 303–309. [Google Scholar] [CrossRef]
- Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
- Chen, R.C.; Saravanarajan, V.S.; Chen, L.S.; Yu, H. Road Segmentation and Environment Labeling for Autonomous Vehicles. Appl. Sci. 2022, 12, 7191. [Google Scholar] [CrossRef]
- Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Hassabis, D. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Zhou, J. A Review of LiDAR Sensor Technologies for Perception in Automated Driving. Acad. J. Sci. Technol. 2022, 3, 255–261. [Google Scholar] [CrossRef]
- Liu, Y.; Diao, S. An Automatic Driving Trajectory Planning Approach in Complex Traffic Scenarios Based on Integrated Driver Style Inference and Deep Reinforcement Learning. PLoS ONE 2024, 19, e0297192. [Google Scholar] [CrossRef]
- Yuan, M.; Shan, J.; Mi, K. From Naturalistic Traffic Data to Learning-Based Driving Policy: A Sim-to-Real Study. IEEE Trans. Veh. Technol. 2024, 73, 245–257. [Google Scholar] [CrossRef]
- Wang, C.; Cui, X.; Zhao, S.; Zhou, X.; Song, Y.; Wang, Y.; Guo, K. A Deep Reinforcement Learning-Based Active Suspension Control Algorithm Considering Deterministic Experience Tracing for Autonomous Vehicles. Appl. Soft Comput. 2024, 153, 111259. [Google Scholar] [CrossRef]
- Bosello, M.; Tse, R.; Pau, G. Train in Austria, Race in Montecarlo: Generalized RL for Cross-Track F1 Tenth LiDAR-Based Races. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; pp. 290–298. [Google Scholar]
- Li, Z.; Yuan, S.; Yin, X.; Li, X.; Tang, S. Research into Autonomous Vehicles Following and Obstacle Avoidance Based on Deep Reinforcement Learning Method under Map Constraints. Sensors 2023, 23, 844. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Zhang, Y.; Shi, X.; Qiu, F. A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors 2022, 22, 8055. [Google Scholar] [CrossRef] [PubMed]
- Yang, K.; Tang, X.; Qiu, S.; Jin, S.; Wei, Z.; Wang, H. Towards Robust Decision-Making for Autonomous Driving on Highway. IEEE Trans. Veh. Technol. 2023, 72, 11251–11263. [Google Scholar] [CrossRef]
- Bellotti, F.; Lazzaroni, L.; Capello, A.; Cossu, M.; De Gloria, A.; Berta, R. Explaining a Deep Reinforcement Learning (DRL)-Based Automated Driving Agent in Highway Simulations. IEEE Access 2023, 11, 28522–28550. [Google Scholar] [CrossRef]
- Al-Sharman, M.; Dempster, R.; Daoud, M.A.; Nasr, M.; Rayside, D.; Melek, W. Self-Learned Autonomous Driving at Unsignalized Intersections: A Hierarchical Reinforced Learning Approach for Feasible Decision-Making. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12345–12356. [Google Scholar] [CrossRef]
- Wang, J.; Sun, H.; Zhu, C. Vision-Based Autonomous Driving: A Hierarchical Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2023, 72, 11213–11226. [Google Scholar] [CrossRef]
- Mao, Z.; Liu, Y.; Qu, X. Integrating Big Data Analytics in Autonomous Driving: An Unsupervised Hierarchical Reinforcement Learning Approach. Transp. Res. Part C Emerg. Technol. 2024, 162, 104606. [Google Scholar] [CrossRef]
- Pérez-Gil, Ó.; Barea, R.; López-Guillén, E.; Bergasa, L.M.; Gómez-Huélamo, C.; Gutiérrez, R.; Díaz-Díaz, A. Deep reinforcement learning based control for Autonomous Vehicles in CARLA. Multimed. Tools Appl. 2022, 81, 3553–3576. [Google Scholar] [CrossRef]
- Petryshyn, B.; Postupaiev, S.; Ben Bari, S.; Ostreika, A. Deep Reinforcement Learning for Autonomous Driving in Amazon Web Services DeepRacer. Information 2024, 15, 113. [Google Scholar] [CrossRef]
- Wang, W.; Luo, H.; Zheng, Q.; Wang, C.; Guo, W. A Deep Reinforcement Learning Framework for Vehicle Detection and Pose Estimation in 3D Point Clouds. In Proceedings of the 6th International Conference on Artificial Intelligence and Security (ICAIS 2020), Hohhot, China, 17–20 July 2020; pp. 405–416. [Google Scholar]
- Lee, Y.; Park, S. A Deep Learning-Based Perception Algorithm Using 3D LiDAR for Autonomous Driving: Simultaneous Segmentation and Detection Network (SSADNet). Appl. Sci. 2020, 10, 4486. [Google Scholar] [CrossRef]
- Chen, Y.; Tse, R.; Bosello, M.; Aguiari, D.; Tang, S.K.; Pau, G. Enabling Deep Reinforcement Learning Autonomous Driving by 3D-LiDAR Point Clouds. In Proceedings of the Fourteenth International Conference on Digital Image Processing (ICDIP 2022), Wuhan, China, 20–23 May 2022; Volume 12342, pp. 362–371. [Google Scholar]
- Zhai, Y.; Liu, Z.; Miao, Y.; Wang, H. Efficient Reinforcement Learning for 3D LiDAR Navigation of Mobile Robot. In Proceedings of the 41st Chinese Control Conference (CCC 2022), Hefei, China, 25–27 July 2022; pp. 3755–3760. [Google Scholar]
- Sánchez, M.; Morales, J.; Martínez, J.L. Reinforcement and Curriculum Learning for Off-Road Navigation of a UGV with a 3D LiDAR. Sensors 2023, 23, 3239. [Google Scholar] [CrossRef] [PubMed]
- Bellman, R.; Kalaba, R.E. Dynamic Programming and Modern Control Theory; Academic Press: New York, NY, USA, 1965; Volume 81. [Google Scholar]
- Sutton, R.S.; Barto, A.G. The reinforcement learning problem. In Reinforcement learning: An introduction; MIT Press: Cambridge, MA, USA, 1998; pp. 51–85. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the International Conference on Machine Learning (ICML 2014), Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Mnih, V. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Vaswani, A. Attention Is All You Need. In Advances in Neural Information Processing Systems (NeurIPS 2017); Curran Associates, Inc.: Long Beach, CA, USA, 2017. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. PCT: Point Cloud Transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning (CoRL 2017), Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- O’Kelly, M.; Sukhil, V.; Abbas, H.; Harkins, J.; Kao, C.; Pant, Y.V.; Bertogna, M. F1/10: An Open-Source Autonomous Cyber-Physical Platform. arXiv 2019, arXiv:1901.08567. [Google Scholar]
Channels | 16 |
rotation_frequency (r/s) | 10.0 |
Range (m) | 50.0 |
upper_fov (°) | 15.0 |
lower_fov (°) | −15.0 |
horizontal_fov (°) | 360.0 |
sensor_tick (s) | 0.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Lam, C.T.; Pau, G.; Ke, W. From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Appl. Sci. 2025, 15, 1423. https://doi.org/10.3390/app15031423
Chen Y, Lam CT, Pau G, Ke W. From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Applied Sciences. 2025; 15(3):1423. https://doi.org/10.3390/app15031423
Chicago/Turabian StyleChen, Yuhan, Chan Tong Lam, Giovanni Pau, and Wei Ke. 2025. "From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR" Applied Sciences 15, no. 3: 1423. https://doi.org/10.3390/app15031423
APA StyleChen, Y., Lam, C. T., Pau, G., & Ke, W. (2025). From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Applied Sciences, 15(3), 1423. https://doi.org/10.3390/app15031423