From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR

Chen, Yuhan; Lam, Chan Tong; Pau, Giovanni; Ke, Wei

doi:10.3390/app15031423

Open AccessArticle

From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR

¹

Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China

²

Department of Computer Science and Engineering—DISI, University of Bologna, 47521 Cesena, Italy

³

Autonomous Robotics Research Center, Technology Innovation Institute (TII), Abu Dhabi P.O. Box 9639, United Arab Emirates

⁴

Department of Computer Science, University of California, Los Angeles, CA 90095, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1423; https://doi.org/10.3390/app15031423

Submission received: 16 December 2024 / Revised: 21 January 2025 / Accepted: 28 January 2025 / Published: 30 January 2025

(This article belongs to the Special Issue Applications of Artificial Intelligence in Transportation Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Autonomous driving technology faces significant challenges in processing complex environmental data and making real-time decisions. Traditional supervised learning approaches heavily rely on extensive data labeling, which incurs substantial costs. This study presents a complete implementation framework combining Deep Deterministic Policy Gradient (DDPG) reinforcement learning with 3D-LiDAR perception techniques for practical application in autonomous driving. DDPG meets the continuous action space requirements of driving, and the point cloud processing module uses a traditional algorithm combined with attention mechanisms to provide high awareness of the environment. The solution is first validated in a simulation environment and then successfully migrated to a real environment based on a 1/10-scale F1tenth experimental vehicle. The experimental results show that the method proposed in this study is able to complete the autonomous driving task in the real environment, providing a feasible technical path for the engineering application of advanced sensor technology combined with complex learning algorithms in the field of autonomous driving.

Keywords:

autonomous driving; deep reinforcement learning; Deep Deterministic Policy Gradient (DDPG); 3D-LiDAR; point cloud; attention mechanism

1. Introduction

Autonomous driving technology represents a transformative advancement in transportation, significantly improving traffic safety by reducing human driving errors while enhancing overall transportation efficiency. As a cutting-edge technology at the forefront of artificial intelligence applications, it continues to have profound impacts on society’s economy, environmental sustainability, and quality of life. Studies have shown that autonomous vehicles could potentially reduce carbon emissions through optimized routing and more efficient driving patterns [1] while also contributing to economic benefits by reducing traffic congestion and accidents [2]. Self-driving vehicles need to demonstrate remarkable capabilities in making real-time decisions within dynamic and complex environments, operating safely and independently without human intervention in various weather conditions and traffic scenarios [3].

Supervised learning, traditionally serving as a fundamental autonomous driving solution, trains vehicles to make decisions through carefully curated datasets and corresponding labels. However, this conventional approach often reveals its limitations when facing diverse and unpredictable road scenarios, struggling to handle uncertainties that were not present in the training data. Moreover, supervised learning’s dependency on extensive labeled data to improve model performance demands substantial human resources and time [4]. Deep reinforcement learning addresses these challenges, drawing inspiration from human learning patterns. Agents learn and make decisions through environmental interaction and feedback without explicit human supervision for data labeling [5]. This unsupervised learning approach can accomplish autonomous driving tasks while reducing technology costs. This study implements autonomous driving tasks using reinforcement learning with 3D-LiDAR sensors in both simulated and real environments.

While Deep Q-Network (DQN) serves as a foundational reinforcement learning algorithm with proven success in many applications, it exhibits inherent limitations when handling continuous action spaces, which are crucial for fine vehicle control in real-world scenarios [6]. Vehicles typically require precise, continuous adjustments in acceleration, braking, and steering to ensure both safety and passenger comfort. DQN is designed for discrete action selection, making it difficult for it to perform well in such situations. To address these issues, this study applies the Deep Deterministic Policy Gradient (DDPG) algorithm, which directly maps sensory inputs to continuous action outputs, making it particularly well suited for continuous control problems [7]. DDPG effectively combines deep reinforcement learning benefits with an actor–critic architecture, learning optimal driving policies through systematic trial and error. The actor network determines optimal actions based on current states, while the critic network evaluates action quality and provides detailed feedback to refine the actor’s behavior. This combination enhances learning efficiency, ensuring vehicles can adapt to various driving scenarios while maintaining consistent performance.

Compared to traditional RGB camera sensors that rely primarily on vision, 3D-LiDAR technology provides significantly more precise depth measurements, clearly detecting object contours and positions regardless of challenging lighting conditions, including nighttime operations, strong direct sunlight, or heavily shadowed areas. It constructs detailed 3D environmental maps by emitting laser pulses and measuring their return times, generating high-precision distance information with millimeter-level accuracy. This advanced capability helps systems accurately identify and locate surrounding vehicles, pedestrians, and obstacles in real time [8]. Three-dimensional LiDAR also demonstrates exceptional performance in the detection range, typically identifying objects hundreds of meters away. It is a crucial capability for high-speed vehicles operating on highways or in complex urban environments. In autonomous driving scenarios, the early detection of distant obstacles and traffic conditions helps vehicles better adjust speed and route, significantly improving overall safety margins. Therefore, this study develops an autonomous driving solution adapted for 3D-LiDAR sensors. Three-dimensional LiDAR outputs point cloud data, a three-dimensional format fundamentally different from conventional visual images. In this study, an algorithm based on PointNet combined with an attention mechanism is used in the processing stage of the point cloud data to perceive the environment and feed refined environmental data into the reinforcement learning framework.

The scientific novelty and contributions of this research in autonomous driving technology manifest in several significant aspects:

This study demonstrates that DDPG reinforcement learning networks can be effectively trained and applied to autonomous driving technology, achieving unsupervised learning tasks with 3D-LiDAR sensors.
In the processing stage of point cloud data, this study adopts the classical PointNet-based algorithm to directly process the point cloud data instead of voxelization and combines with self-attention to optimize the data, which are inputted into the reinforcement learning network to cope with the challenge of 3D-LiDAR data in a realistic environment.
After the implementation of unsupervised autonomous driving in a simulator, this study successfully built and deployed a realistic 1/10-scale experimental vehicle, named F1tenth, to complete autonomous driving tasks in real environments, advancing real-road condition applications and bridging the gap between simulation and reality.

The remainder of this article is structured as follows. Section 2 provides a review of related research on applying reinforcement learning for autonomous driving and the implementation of 3D-LiDAR in autonomous driving. Section 3 presents the detailed methodology of the DDPG reinforcement learning network implementation and point cloud data processing solutions, including architectural decisions and optimization strategies. Section 4 states experimental processes and results in both simulated and real environments. Section 5 summarizes the research and proposes future directions for related work, including potential improvements and extensions. This study aims to advance the practical application of unsupervised learning solutions in autonomous driving technology, contributing to the goal of safer and more efficient transportation systems.

2. Related Work

Reinforcement learning has many applications in the field of autonomous driving. These applications focus on different aspects. In terms of training models, the study by Liu and Diao in 2024 dealt with complex traffic road conditions such as intersections by extracting drivers’ style features [9]. However, the study applies to fewer road conditions and is not generalizable. Deep Recurrent Q-learning from the demonstration algorithm (DRQfD) proposed by Yuan et al. also carries out efficient training by learning to predict the future state of surrounding vehicles [10]. This study was demonstrated in a simulated environment and did not discuss in depth the effectiveness of implementation in a real environment. Wang et al. proposed deterministic experience tracing (DET), which can strengthen the experience replay buffer to provide a more stable control performance [11]. However, the sample of scenarios in this study was small and not diverse enough. Bosello et al. focused on holistic reinforcement learning strategies in their study in 2022, where they implemented an effective application of DQN on racing cars [12]. The study used 2D-LiDAR instead of more advanced high-dimensional LiDAR sensors. Li, Z. et al. in 2023 effectively improved vehicle following and obstacle avoidance through reinforcement learning to determine external obstacles [13]. Li, W. et al. also contributed to car-following [14]. The difference is that they used DDPG as a reinforcement learning algorithm to propose a car-following decision-making strategy. However, these two studies rely on specific scenarios and may not be able to perform well in untrained scenarios. Yang et al. in 2023 proposed a DDPG-based reinforcement learning algorithm to improve decision-making on the highway [15]. Bellotti et al. also focused on the highway as a road environment to increase the predictive ability of the agent by introducing an attention mechanism [16]. The road environment of these two studies is limited to highways and lacks generalization. There are also many studies on the hierarchical reinforcement learning approach. Al-Sharman et al.’s study focuses on left-turn strategies at unsignalized intersections [17]. The hierarchical framework proposed by Wang et al. improves learning efficiency and decision-making in complex environments [18]. The novel hierarchical reinforcement learning method proposed by Mao et al. can train sub-networks without designing artificial rewards. These studies focus on the algorithmic aspects of the agents, and the part involving the sensor data of the agents is relatively lacking [19]. Pérez-Gil et al. in their 2022 research used a camera as the sensor and utilized deep reinforcement learning in the simulator to complete autonomous driving navigation tasks [20]. The environmental information was input in RGB data format without employing a higher-precision LiDAR sensor. The work primarily focused on using reinforcement learning to complete navigation tasks between two given points. In 2024, Petryshyn et al. used two different sensor configurations, a single camera combined with 1D-LiDAR and a stereo camera, to achieve multiple autonomous driving schemes on a track by different deep reinforcement learning algorithms [21]. Previous studies had different focuses in applying deep reinforcement learning to autonomous driving. This study fills the gap by implementing the autonomous driving task using the DDPG algorithm through a single 3D-LiDAR as the high-precision sensor and effectively applies the system in both simulated and real-world environments.

Autonomous driving algorithms that can work with 3D-LiDAR are more advantageous at the practical application level due to the high performance of 3D-LiDAR. Wang et al. in 2020 used a reinforcement learning scheme to improve the performance of 3D-LiDAR in vehicle detection and attitude estimation [22]. Lee and Park’s research in 2020 enhanced the capabilities of perception modules in autonomous driving tasks by segmenting and detecting drivable areas and vehicles through point cloud data [23]. These two studies were on specific modules rather than overall decision-making capability. Chen et al.’s study in 2022 utilized DQN in conjunction with 3D-LiDAR for an autonomous driving task [24]. However, the application was only at the simulator stage and not in the real-world environment. Zhai et al. used 3D-LiDAR as the sensor to achieve end-to-end autonomous navigation [25]. Sánchez et al.’s study in 2023 also improved the performance of autonomous navigation using 3D-LiDAR [26]. Both studies demonstrate the benefits of 3D-LiDAR for environmental awareness.

3. Methodology

This section describes the technical approach that this article uses to enable a deep reinforcement learning solution to implement autonomous driving with 3D-LiDAR in both virtual and real environments. The overall framework used in this article to implement the autonomous driving task is the DDPG network. Section 3.1 will explain in detail how the reinforcement learning network of this research is built using the DDPG algorithms. A 3D-LiDAR sensor passes the environmental information into the integral network architecture of this research in the point cloud data format. Section 3.2 will elaborate on how to process the input sensor data from 3D-LiDAR to pass into the DDPG network in the point cloud data processing stage.

3.1. Reinforcement Learning Network

This study uses reinforcement learning as an overall framework for implementing autonomous driving. It is a machine learning scheme that learns in interaction, similar to the process of human learning knowledge. The agent continuously learns knowledge based on the rewards or penalties it receives in its interaction with the environment in order to explore the correct behaviors adapted to the environment on its own. In this research, the agent is the experimental vehicle that explores unfamiliar environments through trial and error based on the established reward–penalty mechanism. Through continuous feedback, it self-corrects its behavior and ultimately accomplishes the intended autonomous driving tasks in the environment. Reinforcement learning is based on the Markov Decision Process (MDP) [27]. This process mainly contains the following elements: S, A, P, R, γ. “S” is the state of the vehicle in the environment. “A” represents all the actions of the vehicle. “P” represents the probability of the vehicle transitioning from state s at time t to state s’ at time t + 1 after executing action a. A state corresponds to an action or the probability of an action. And with an action, the next state is determined. This means that each state can be described by a definite value. From this, it can be determined whether a state is good or bad. For example, if the self-driving car to the left hits an obstacle, then the state to the left is a bad state. Thus, the goodness of a state is equivalent to the expectation of a future return, and “R” represents the return that the state at a given time will have. “R” (reward) is a real value that represents reward or punishment. A positive number is returned when the vehicle performs the expected action, while a negative number is returned when the vehicle performs the wrong action. “γ” is discount factor that determines the importance of future rewards relative to immediate rewards.

The purpose of reinforcement learning is to maximize long-term future rewards, which can also be expressed as finding the largest return U. “U” represents the sum of rewards accumulated in all states after executing a set of actions, but the direct addition of rewards in an infinite time sequence will result in unbiased and infinite loops of states. Therefore, the concept of discount γ, with a value range of zero to one, is introduced into this function, and the reward returned from the subsequent state is multiplied by the discount coefficient. This means that the current reward is more important than the reward of future feedback. The definition formula is as follows [28]:

\begin{matrix} U (s_{0} s_{1} s_{2} \dots) = \sum_{t = 0}^{\infty} γ^{t} R (s_{t}), 0 \leq γ < 1 \\ \leq \sum_{t = 0}^{\infty} γ^{t} R_{m a x} = \frac{R_{m a x}}{1 - γ} \end{matrix}

(1)

The MDP is a cyclic process in which the agent takes actions to change its state in order to obtain rewards and interact with the environment. The strategy of the MDP depends entirely on the current transaction, which is a reflection of its Markovian nature.

Based on the above MDP, this research implements a DDPG network to achieve autonomous driving tasks. This choice is motivated by the continuous nature of steering and throttle actions during vehicle operation. DDPG combines the advantages of deep learning and Q-learning, making it particularly suitable for handling continuous action spaces. DDPG, proposed by Lillicrap et al. in 2015, is an actor–critic algorithm that operates through the collaboration of two networks [7]. The actor network determines the optimal policy, while the critic network evaluates the value of the current policy. Its key characteristics include off-policy training and the use of deep neural networks to handle complex, high-dimensional state and action spaces.

The construction of the actor network is first addressed, which is theoretically grounded in the Deterministic Policy Gradient (DPG) algorithm proposed by Silver et al. in 2014 [29]. A stochastic policy is denoted as

a \sim π_{θ} (\cdot ∣ s)

, whereas if the policy is deterministic, it can be denoted as

a = μ_{θ} (s)

.

The Deterministic Policy Gradient Theorem is:

\nabla_{θ^{μ}} J = E_{s_{t} \sim ρ^{β}} [\nabla_{a} Q (s, a ∣ θ^{Q}) |s = s_{t}, a = μ (s_{t}) \cdot \nabla_{θ_{μ}} μ (s ∣ θ^{μ})| s = s_{t}]

(2)

It has the same structure as the strategy gradient theorem. However, the deterministic gradient theorem is based on the fact that the value function for each state and action is already available. For a given state in the continuous action space, this research aims to construct an actor network that outputs the action that maximizes the value function through a Deterministic Policy Gradient:

μ (s) = a r g \underset{a}{m a x} Q (s, a)

(3)

The actor network is the strategy μ. Next, the research creates the critic network, which is the value function Q, to satisfy the generalized actor–critic framework.

The Bellman optimality equation is [27]

Q^{*} (s, a) = \underset{s^{'} \sim P}{E} [r (s, a) + γ \underset{a^{'}}{m a x} Q^{*} (s^{'}, a^{'})]

(4)

It states that the value of each state under the optimal policy must be equal to the expected return of the optimal action in this state. The Mean Square Bellman Error (MSBE) was used to measure the difference between the value of

Q_{ϕ}

fitted by the neural network and the value calculated by Bellman’s equation:

L (ϕ, D) = \underset{(s, a, r, s^{'}, d) \sim D}{E} [{(Q_{ϕ} (s, a) - (r + γ (1 - d) \underset{a^{'}}{m a x} Q_{ϕ} (s^{'}, a^{'})))}^{2}]

(5)

When

s^{'}

is in a terminated state,

d = 1

; otherwise,

d = 0

. The loss function shown in Equation (5) is derived from the Bellman equation and implemented in Deep Q-Learning, as introduced by Mnih et al. in their work on Atari learning [6,30].

The actor network takes the environmental state s as input and outputs the action a that maximizes the value function Q for that state, thereby forming the deterministic policy μ. This network performs gradient ascent directly on the value function Q, aiming to find the action a that yields the highest Q-value. The Q-value used here is obtained from the critic network’s output in the previous iteration. The critic network takes both the environmental state s and the action a as inputs, which are generated by the actor network, and outputs the estimated Q-value. This network is trained using target values computed through the Bellman optimality equation.

To stabilize training, DDPG introduces the target network that mirrors the structure of the main networks, which include actor and critic networks. This target network updates more slowly than the main network to prevent training instability caused by frequent updates. The target network parameters are initially copied from the main network. This research employs soft updates, where target network parameters gradually move toward the main network parameters with a small step size, allowing for incremental updates during each episode. The target actor network generates the next actions, while the target critic network estimates target Q-values used to update the main network.

In reinforcement learning, the replay buffer is used to store the experience data (states, actions, rewards, etc.) of the agent’s interaction with the environment. It is another important component of the DDPG algorithm. Training stability and efficiency are improved by randomly sampling mini batches from this buffer, breaking temporal correlations in the data. In this research, priority is given to samples with higher temporal difference (TD) errors, focusing on more valuable learning experiences. The assumption of independent homogeneous distributions is no longer valid when generating samples by successive explorations in the environment. Therefore, the replay buffer maintains a buffer, which stores trajectories, and stores the

(s, a, r, s^{'}, d)

data derived from each sampling from the environment in the buffer.

Unlike the basic DQN algorithm, DDPG incorporates noise to encourage broader action space exploration during training to improve learning effectiveness. Since DDPG is a deterministic policy algorithm where the actor network outputs specific actions, exploration would be limited without noise as the agent would consistently output identical actions. To address this, DDPG adds noise to the actor’s output of the action, thus increasing the exploration. This research implements the noise component by adding Gaussian noise directly to the actor’s action output. As training proceeds, the noise is gradually reduced through a decay mechanism so that the agent can gradually focus on using the learned strategy. The decay mechanism is implemented by exponential decay of the standard deviation of the noise.

Based on the DDPG network constructed above, Figure 1 illustrates the workflow of this research implementation.

3.2. Point Cloud Processing Algorithm

As the first deep learning architecture to directly process 3D point cloud data, PointNet has a simple and efficient architecture, an end-to-end feature learning capability, and adaptability to irregular point cloud data [31]. However, while it performs well in global feature extraction, it has limitations in capturing local and global dependencies within the point cloud. PointNet processes each point independently and then obtains global features through max pooling. This design is unable to adequately account for the relationships between points when faced with complex 3D structures, especially in capturing long-range dependencies and local details.

In future autonomous driving tasks, agents may encounter highly complex environments and diverse obstacles; the proposed technological approach should enhance scalability and generalizability to address these challenges. The system not only needs to understand the overall structure of the surrounding environment but also needs to capture the intricate details and relationships between objects at different distances to ensure that the task can be performed safely and effectively. Therefore, data preprocessing algorithms for sensor inputs must be capable of addressing this challenge. The attention mechanism provides a flexible and effective way to improve this aspect [32]. By strengthening local feature capture between point clouds and improving fine-grained information modeling, the approach enhances the model’s understanding of global structures and complex dependencies within point clouds. This improvement draws inspiration from the successful application of Transformer architectures in natural language processing. To boost point cloud data processing capabilities, the research introduces self-attention mechanisms to enhance the model’s local feature capture, rendering it more accurate and robust when processing irregular and complex three-dimensional point cloud scenarios.

The workflow of the point cloud processing stage used in this study is shown in Figure 2 and summarized below.

Point cloud input

Three-dimensional LiDAR acquires the environment information and inputs the point cloud. The point cloud data used in this study only take the 3D coordinates of each point without other information such as color. The data format is N × 3, i.e., 3D coordinates of N points (x, y, z).

2.: Initial feature extraction

This module applies Multi-Layer Perceptron (MLP) to each point independently. It preserves the original design of PointNet. The 3D point coordinates are embedded into the high-dimensional feature space to provide a richer representation for subsequent feature extraction. The difference is that the dimension of the PointNet output is 64, and this study extends the dimensions to 128. It can provide more information for subsequent local and global feature modeling. The main task of the original PointNet is point cloud classification and simple semantic segmentation. Sixty-four-dimensional features are sufficient for capturing local features. The design of PointNet focuses on global max-pooling, which integrates point cloud features at the global feature level. However, this study introduces multi-head self-attention mechanisms in the subsequent steps, which usually require high input feature dimensions. A higher initial feature dimension allows the attention module to establish point-to-point relationships more efficiently. In the feature extraction module, multi-head attention usually requires high input feature dimensions in order to compute the matrix multiplication of query, key, and value. If the initial feature dimensions are too low, subsequent modules will not have enough information to compute the attention. The output dimensions of the initial MLP are consistent with the local and global attention modules, avoiding additional dimension expansion or contraction operations.

3.: Local Feature Extraction (KNN and Local Attention)

In 2019, the DGCNN research proposed to dynamically construct a local neighborhood graph by using k-nearest neighbor (KNN) algorithm to extract local geometric features of points on the graph structure [33]. Firstly, for each point, the set of KNN is calculated based on the Euclidean distance of the point cloud to construct the KNN neighborhood, and the obtained neighborhood point features are fed into the next local feature processing section.

Point Cloud Transformer (PCT) proposes a transformer-based local attention mechanism for local feature modeling of point cloud data [34]. In this study, the single-head attention mechanism is used in the local feature extraction module, while the multi-head attention mechanism is used only in the global feature extraction. This design reduces the computational complexity. Local multi-head attention requires computing multiple query–key pairs for the neighborhood of each point, which is more computationally intensive than single-head attention. For large-scale point clouds or embedded scenarios, it is more efficient to use single-head attention. The local module quickly extracts geometric features through the single-head self-attention mechanism. For the local neighborhood of each point, the input features are the feature matrices of the neighborhood points. A linear projection is used to generate the query, key, and value. The similarity of query and key is calculated by dot product and then normalized by softmax to obtain the attention weights of the local neighborhood:

A_{local} = s o f t m a x (\frac{Q_{local} \cdot K_{local}^{T}}{\sqrt{d_{k}}})

(6)

A_{local}

means local attention weights.

Q_{local}

means local query vectors. It is a set of query vectors extracted from a local region.

K_{local}^{T}

means the transpose of the local key vectors.

\sqrt{d_{k}}

is the scaling factor, which is used to prevent the dot product values from becoming too large. The dot product of the query and key vectors computes a similarity score between each query vector and each key vector within the local region. A larger value indicates that the two vectors are more related.

s o f t m a x

is the normalization function, which transforms the raw attention scores into a valid probability distribution. This is a weight matrix that represents the importance of each point in the neighborhood to the other points. Using the attention weight

A_{local}

to weight the feature V of the neighborhood points, the aggregated local features can be obtained.

4.: Global feature extraction (multi-head attention at full point cloud scope)

Point-to-point relationships at the global scale are more complex, and single-head attention is not sufficient to express the dependencies between all points. Global feature extraction needs to model a more complex context, and using multi-head attention can better serve the global task and provide final global features. So, this module uses the multi-head attention mechanism. The input to this stage comes from the local features extracted in the previous stage. Three matrices, Q, K, and V, projected through local features are used for the multi-head self-attention mechanism. The dot product of Q and K is used and normalized to obtain the attention weights for the full pair of points. Finally, the features are weighted and summed to output global features.

5.: Multi-scale feature fusion

The Point Transformer proposes a weighted fusion strategy of local and global features, which is used to enhance the representation of point cloud features [35]. According to the strategy, this study weighted the fusion of local and global features in this step to generate the final point cloud feature representation.

6.: Global pooling

Global max pooling is the core module of PointNet, which is used to aggregate the point features into the global features of point clouds. The previous module outputs the features of each point, and each point has a set of feature vectors. It reflects the semantic representation of each point in local and global contexts, but these features are specific to the point, not the whole point cloud. It can be aggregated into a single global feature after pooling. It is a high-level semantic representation of the entire point cloud, which represents the shape and structure of the integral point cloud and is no longer a point-by-point fine-grained feature. The inputs to the point cloud are permutation-invariant, and global pooling makes the model insensitive to the order in which the points are arranged. With the max pooling operation, the generated global features are always consistent regardless of the input order of the points. This eliminates the effect of point alignment. Max pooling takes the maximum of all points for each feature dimension. This ensures robustness to noise and sparse data while effectively capturing the most significant geometric features in the point cloud.

7.: Feature Dimension Expansion

This step extends the feature dimension from 128 to 1024 using a linear layer, which improves global feature representation. Higher dimensionality allows for better representation of the complex geometric and semantic information of the point cloud. In complex tasks, point clouds may contain complex shapes and structures, and the higher the feature dimensions, the better the model can express these complex structures. More information capacity could capture more global patterns and semantics. The original PointNet algorithm has a fixed global feature dimension of 1024. The design has been experimentally validated and performs well in the task. Furthermore, this study uses the multi-head self-attention mechanism and multi-scale feature fusion, which could increase the richness of features. If the final global feature dimension is too low, it may limit the effectiveness of these modules. Extending to 1024 dimensions will allow the full potential of these modules to be utilized.

The core idea of PointNet is to consider each point as an independent input, perform feature extraction on the point cloud using a multilayer perceptron, and then apply max pooling techniques to obtain global features. PointNet does not explicitly capture local relationships or spatial dependencies between points but rather obtains an overall feature representation through global pooling. The local feature extraction module introduced in this study is a direct improvement of PointNet, whereas PointNet treats each point independently, this module captures the local geometric features of the point cloud through neighborhood construction embedding. Such neighborhood aggregation introduces local structural information in the feature extraction process.

In PointNet, global features are extracted from the features of all points by max pooling. In this model, the attention-based network mechanism enhances the ability of the model to capture global features. Through the self-attention mechanism, the model can dynamically focus on the relationships between points and assign higher weights to important features, thus effectively creating long-range dependencies in the point cloud data.

In this study, after the extraction of global features, the final output features are obtained through global pooling. This step maintains the design concept of PointNet, which is to use global features for decision-making output.

These improvements bring benefits to the model. First, by adding KNN and attention-based networks, the model is able to better understand the relationships between points, which is especially useful when passing through complex scenarios such as crowded intersections. In addition, the attention mechanism allows the model to dynamically adjust its focus to highlight the most relevant parts of the point cloud. This facilitates autonomous driving, where the importance of features changes with the environment. Even though the tasks and scenarios performed in this research are not complex, improved point cloud processing algorithms are used to increase the extensibility of the techniques developed in this research. It supports the effective implementation of the architecture in more complex road situations in the future.

4. Experiments and Results

This study explores how 3D-LiDAR can be used as the sensor to implement an autonomous driving task with reinforcement learning in both virtual and real environments. Section 4.1 is the implementation of an autonomous driving task in the simulator. Section 4.2 describes the completion of the task in a real environment.

4.1. Implementation in Carla Simulator

For an iterative strategy algorithm, the research needs to easily measure the performance of the algorithm. Training cars in real environments can be damaging and costly. It is necessary to ensure the algorithm can operate safely and correctly in common scenarios before deploying it in real-world environments in the future. Additionally, in practical applications, it is not feasible to frequently update algorithms in real environments, so this process needs to be first deployed and tested in simulators. The debugging and performance evaluation of autonomous driving algorithms should be conducted in highly realistic simulation environments before real environments. It is important to have an autonomous driving simulator that can reliably reflect real scenarios. In this study, Carla is chosen as the simulation environment for autonomous driving development and testing [36].

Carla has a 3D city scene that supports perception, planning, and control. Figure 3 shows the top view of the city map in the Carla simulator used in this experiment. It runs on Unreal Engine 4 with a server–client architecture. In addition, it is open source and can be installed on Linux and Windows systems. It allows the creation of environments, agents, and required sensors via Python API, which meets the requirements of this study. In terms of experimental configuration, this study uses a Tesla V100 as the GPU and runs in a Linux environment. The system code is mainly based on the Python language.

Carla can simulate common scenarios in the environment such as lane conditions, road conditions, obstacle distribution, and weather. Three-dimensional LiDAR can also be simulated in Carla, as shown in Figure 4. To maintain consistency with sensor configurations employed in subsequent real-world applications, in this study, the 3D-LiDAR parameters were set in the simulator as shown in Table 1.

The 3D-LiDAR is configured with 16 channels, which is consistent with the model in the real environment of this research. The set rotation frequency is 10 revolutions per second, which can be adjusted to a higher frequency in the future when capturing more rapidly changing environments, such as high-speed scenes. The number of points per second can be calculated by multiplying the number of channels, points per revolution, and rotation frequency. It determines the resolution of the point cloud image and the density of the point cloud. To reduce the computational burden, the detection distance is reduced to 50 m, even though the default maximum detection distance is 100 m. The upper fov and lower fov together define the vertical field of view range. The difference between these must match the number of channels to ensure accurate simulation. The horizontal fov and vertical fov range define the scanning range. The point cloud data are generated every 0.1 s. The simulated environment works in parallel with the RL environment, and Carla independently updates and transmits data into the RL network at this fixed interval. As a result, the agent obtains the latest environment data in real time and outputs actions, while the simulator executes the latest actions output by the agent.

The experiment establishes a fundamental task, where the vehicle drives on urban roads while avoiding collisions. The objective is to maximize cumulative rewards by maintaining street driving for as long as possible. The experimental implementation follows these steps:

Network Initialization: The first step is to define the structure of the Actor and Critic networks, including the input, hidden, and output layers. The learning rate of the actor is 0.001. The learning rate of the critic is 0.0001. The discount factor γ is 0.99. The range of weights is [−0.003, 0.003]. The weights and biases of the network are initialized in this process. The weight matrix is initialized with random values to control the effect of the input data on each neuron, with a core role as a linear transformation. Bias vectors are initialized to zero, adjusting neuron outputs to enhance model expressiveness. During subsequent training, the optimization algorithm minimizes the loss function by calculating the gradient and constantly adjusting both. Target networks for training stability are initialized with weights identical to the main network. The replay buffer is initialized empty with a 10,000-capacity storage. Each experience corresponds to a state transition tuple, containing the current state, action, reward, next state, and terminal state Boolean.
Agent Generation: To prevent policy stagnation and enhance generalization, vehicles are spawned at random map locations for each episode, simulating real-world randomness and improving exploration efficiency. To prevent the model’s strategy from falling into a fixed pattern and to enhance the generalization ability of the agent, the vehicle is spawned at random map locations for each episode. It also simulates the randomness of the real world and enhances the agent’s exploration efficiency.
Environmental Information by sensor: The 3D-LiDAR inputs point cloud data containing three-dimensional coordinates.
Point Cloud Processing: Global features are extracted using the study’s point cloud processing algorithm.
Actor-network Input: The 1024-dimensional features pass through two fully connected layers with 64 neurons each, activated by ReLU, outputting action vectors a₁ and a₂. a₁ represents acceleration, ranging between −1 and +1, while a₂ represents steering angle, ranging between −15 degrees and +15 degrees. Due to the nature of the DDPG, the output of actions allows for smooth steering angles and throttle control instead of the simple left or right direction in the DQN method. The state selects the action output after adding Gaussian noise according to the current strategy. Following the task objective, the reward function is designed as follows: the agent obtains a reward of 0.01 if it chooses to go forward; when it collides with an obstacle, the reward is −1 and the episode is terminated. Since pulling out of the lane also results in a collision with the obstacle, there is no need to set it separately. After executing an action, it receives signals from the environment about the next state, the reward, and whether to terminate or not. Next, the transfer data are stored in the replay buffer.
Sample Data from the Replay Buffer: When the amount of data in the buffer exceeds the minimum batch size, a small random batch is sampled from the buffer.
Critic-network Implementation: This inputs state and action to output the scalar expected returns under the current policy. States are represented by point cloud global feature vectors. Actions come from the actor-network output. Both pass through fully connected layers for feature extraction, and then the state features and action features are fused into a vector that outputs the Q-value through the last layer of the fully connected network. The critic network uses the mean square error loss to update the weights. The output Q-value is used to guide the actor network to improve the driving strategy of the vehicle.
Actor-network Implementation: Following the critic network’s performance evaluation, this step calculates actor network gradients and updates the parameters.
Target Network Updates: Soft updates maintain stability after the main network updates with $τ = 0.001$ .

The training process repeats the above steps, and the results after 100 episodes are as follows. From the visual observation, the agent will quickly collide with the obstacles and start the next episodes in the early stage. After more training, the agent can gradually keep driving normally for a long time without collision damage. Figure 5 shows the eval average reward graph generated after training. The light orange line is the raw evaluation average reward, which fluctuates significantly due to the stochastic nature of the environment and exploration. The dark red line is a smoothed version of the reward, computed using a moving average, to highlight the overall trend of improvement. It reflects the average cumulative reward of the agent. In an episode, the cumulative reward is the total reward obtained by the agent from the initial state to the termination state. Eval average reward is the average of the cumulative rewards of multiple episodes. As shown in the figure, the rewards gradually increase as the training progresses, which indicates a successful implementation of reinforcement learning.

Figure 6 shows the loss curves of the experimental vehicle after DDPG training. The green curve represents the critic loss, and the red curve represents the actor loss. These two main loss curves are commonly monitored during DDPG training. The loss curves of the critic network and actor network reflect the optimization states of different networks. The critic network is used to evaluate the value of an action under the current strategy. The critic loss uses the mean square error (MSE) to calculate the difference between the current estimated Q-value and the target Q-value. In the early stages of training, the critic loss tends to be higher because the agent has limited understanding of the environment. As training progresses, the critic loss should gradually decrease and stabilize. The actor network loss is calculated differently, using feedback from the critic network to optimize the strategy so that it selects actions that maximize the critic’s estimated Q-value. The actor loss should be analyzed in conjunction with critic loss. If the actor loss does not converge, the target value of the critic may always be biased. In this experiment, both the critic loss and the actor loss curves demonstrate clear convergence. This proves that the training is effective, and the algorithm can successfully implement the target task in the simulator.

In addition, Figure 7 shows the noise curve generated during DDPG training in this experiment. The light orange line represents the raw value, and the dark red line is a smoothed version of the curve, showing the overall trend of decreasing noise. The downward trend of the curve indicates that the noise gradually decays throughout the training process, reflecting the agent’s transition from exploration-focused behavior to exploitation of the learned optimal policy.

4.2. Implement in Real Environment

After the successful implementation in the simulated environment, this study was extended to implement the system in a real-world setting. The experimental vehicle used in this study is the F1tenth car, which is a widely recognized platform for research in autonomous driving due to its affordability, flexibility, and scalability [37]. The F1tenth car used in this research consists of two main layers that are essential for achieving autonomous functionality.

The lower level serves as the foundation of the vehicle, providing the structural support for various components. This layer is equipped with the following.

¯: Chassis: The chassis used in this experiment is from Traxxas, a simplified version of the chassis structure that is as close as possible to a real car. The size of the experimental car is one-tenth of the vehicle on the road. It uses the Ackermann steering mechanism, which is commonly employed in automotive design to minimize tire slip and ensure that each wheel follows its optimum path during turns. This setup is crucial for replicating the real dynamics of vehicle steering and control.
¯: Battery: A lithium-polymer battery with a capacity of 5000 MAH was used in this experiment.
¯: Motor: A high-performance brushless motor that provides precise control over the car’s speed and maneuverability.

The upper level is mounted on a laser-cut acrylic panel. It contains all the components responsible for autonomous decision-making and control. This layer includes the following.

¯: NVIDIA Jetson TX2: The primary computing unit that runs all the deep learning models, including the DDPG algorithm and the point cloud processing module. The Jetson TX2 was selected for its computational power while maintaining a small size, which is critical for real-time processing.
¯: Communication: The vehicle communicates with the station through an antenna for remote input of commands, monitoring, data logging, and algorithm tuning.
¯: VESC (Vedder Electronic Speed Controller): A component responsible for controlling the speed and steering of the vehicle. The VESC is directly interfaced with the NVIDIA Jetson TX2, enabling precise control based on the actions determined by the DDPG model.
¯: Powerboard: Distributes power to all components, ensuring stable operation and proper voltage levels.

To enable the vehicle to perceive its surroundings, this study used a Velodyne VLP-16 3D-LiDAR sensor to meet the research requirement, which is well suited for capturing detailed spatial information. The VLP-16 provides a 360-degree horizontal field of view and a vertical field of view of approximately 30 degrees. It has 16 laser channels that allow for high-density point cloud generation, producing up to 300,000 points per second, with a range of up to 100 m. Its lightweight and compact design also makes it suitable for integration into smaller autonomous platforms like the F1tenth car. To integrate the LiDAR with the F1tenth car, a custom load-bearing platform was designed using 3D printing technology. This custom platform was optimized to reduce the overall weight while ensuring that the LiDAR remained stable during motion, thus enhancing the reliability of the sensor data. The use of 3D printing also allowed for rapid prototyping and customization, which were particularly useful for adapting the sensor setup to meet specific research needs. The car’s system operates using ROS Melodic, a popular version of the Robot Operating System, which provides a flexible framework for communication between different sensors and computational units. ROS Melodic was chosen due to its stability and compatibility with the Jetson TX2 and other components of the F1tenth platform. Figure 8 shows the agent used in this research.

The system architecture of the real-world experiment is illustrated in Figure 9. Initially, the 3D-LiDAR acquires environmental information and inputs the data in point cloud format into the NVIDIA Jetson TX2 processor. The processor performs the point cloud processing stage and subsequently transfers the processed information to the deep reinforcement learning algorithm. The computed action commands are then transmitted via VESC to control the motor and execute agent movements. Throughout this process, communication between the agent and station is maintained through a WiFi antenna. The station transmits commands (such as initiation or termination of experiments) to the agent, while the agent returns experimental data and results back to the station.

The real-world experiments were conducted in a controlled laboratory environment. The algorithms and models used in the experiments were kept the same as in the simulator. Since this study focuses on testing the effectiveness of the algorithms applied in real-world environments and there is a limitation of the battery capacity in the hardware conditions, the task objective of the experiment was simplified to basic obstacle avoidance and driving behavior selection in a defined environment. The effective implementation of the basic task is the basis for future in-depth research on difficult tasks in complex environments. The map used for the experiment is shown in Figure 10. The closest obstacle is in front of the right side of the experimental vehicle, and the open area is on the left side. The experimental vehicle freely explores the map and selects action outputs to form its own driving strategy.

The experimental process consisted of different stages, where the system’s ability to self-learn and adapt was observed through multiple trials. In the initial stage, the agent failed to avoid the first obstacle and collided with a chair (Figure 11a). This highlights the need for additional training to refine its decision-making capabilities. In the subsequent trials, after more than 200 steps, the vehicle was able to bypass the first obstacle but encountered difficulties with the second chair, colliding with it (Figure 11b). After additional training iterations, around 600 steps, the vehicle successfully avoided both obstacles. Instead of continuing toward the obstacle on the right, the vehicle adjusted its trajectory to move toward the open space on the left (Figure 11c). This showed the system’s ability to learn from previous experiences and improve its strategy through reinforcement learning. Figure 12 shows the reward curve of the experimental vehicle following DDPG training. It illustrates the reward dynamics of the agent during the training process. The reward obtained by the agent is increases as the training progresses. During the initial training phase, the reward values remained relatively low. When the steps reached about 300, the reward value increased significantly, but it was not stable enough and fluctuated a lot. Beyond 600 steps, the reward values stabilized and consistently maintained a high level. This trend demonstrates the effectiveness of the DDPG training approach for the agent.

This research has successfully transitioned from implementing autonomous driving tasks through reinforcement learning in virtual environments to effectively executing algorithms and completing tasks in real-world environments. This provides a foundation and starting point for applying reinforcement learning to achieve complex autonomous driving in real-world scenarios. This experiment is intended to demonstrate the practical feasibility of the algorithm, so basic experimental tasks were set and implemented in a simple environment. Future applications and implementations could enhance the capacity of the battery to perform more complex tasks and set up more sophisticated reward mechanisms based on diverse road conditions and environmental scenarios. By establishing more complex reward mechanisms and parameter adjustments, the agent can potentially achieve superior performance and more intelligent behavior in real-world driving scenarios. Such refinements would make it closer to real-world road conditions and even fully applicable to real traffic scenarios.

5. Conclusions and Future Work

This study demonstrates a deep reinforcement learning solution for autonomous driving tasks using 3D-LiDAR sensors, achieving both simulation and real-world implementations. By integrating the DDPG algorithm with a PointNet-based point cloud processing approach enhanced by self-attention mechanisms, the research provides a scalable framework for unsupervised autonomous driving systems. The study addresses several key challenges, including the limitations of discrete action spaces in traditional reinforcement learning algorithms and the complexities of processing irregular and dense 3D point cloud data in dynamic environments. Moreover, by eliminating the need for labeled datasets, the approach reduces the overall cost and labor compared to supervised learning methods for autonomous driving. The system demonstrated consistent training performance through simulation experiments conducted in the CARLA environment. Real-world tests using F1tenth cars further enable the completion of basic autonomous driving tasks in a real-world environment. The effectiveness of reinforcement learning in real-world applications was demonstrated, bridging the gap between virtual simulation and actual implementation.

There remains considerable room for improvement in this research. Future enhancements could explore more complex autonomous driving tasks and scenarios, particularly in adapting to highly dynamic or previously unseen environments. The reliance on predefined reward functions may also limit the generalization to more diverse driving tasks. Applications in complex road environments require well-developed reward mechanisms as well as longer exploratory training processes. They also place higher demands on hardware such as battery capacity. In terms of sensors, the point cloud processing framework proposed in this study allows for a higher density of information input in order to cope with complex road environments. Future research can be flexible to further improve the feature extraction capability by adding different feature processing modules. More comprehensive environmental information from multi-sensor fusion can also be considered.

In conclusion, this study lays a foundation for advancing unsupervised learning techniques in autonomous driving, promoting safer, more efficient, and cost-effective solutions for future smart transportation systems.

Author Contributions

Conceptualization, Y.C., C.T.L. and G.P.; methodology, Y.C. and G.P.; formal analysis, W.K.; investigation, C.T.L.; writing—original draft preparation, Y.C.; writing—review and editing, W.K. and G.P.; supervision, C.T.L. and G.P.; project administration, W.K.; funding acquisition, W.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Macao Polytechnic University (project code: RP/FCA-14/2022).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Igliński, H.; Babiak, M. Analysis of the potential of autonomous vehicles in reducing the emissions of greenhouse gases in road transport. Procedia Eng. 2017, 192, 353–358. [Google Scholar] [CrossRef]
Uzzaman, A.; Muhammad, W. A Comprehensive Review of Environmental and Economic Impacts of Autonomous Vehicles. Control Syst. Optim. Lett. 2024, 2, 303–309. [Google Scholar] [CrossRef]
Yurtsever, E.; Lambert, J.; Carballo, A.; Takeda, K. A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE Access 2020, 8, 58443–58469. [Google Scholar] [CrossRef]
Chen, R.C.; Saravanarajan, V.S.; Chen, L.S.; Yu, H. Road Segmentation and Environment Labeling for Autonomous Vehicles. Appl. Sci. 2022, 12, 7191. [Google Scholar] [CrossRef]
Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Hassabis, D. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Zhou, J. A Review of LiDAR Sensor Technologies for Perception in Automated Driving. Acad. J. Sci. Technol. 2022, 3, 255–261. [Google Scholar] [CrossRef]
Liu, Y.; Diao, S. An Automatic Driving Trajectory Planning Approach in Complex Traffic Scenarios Based on Integrated Driver Style Inference and Deep Reinforcement Learning. PLoS ONE 2024, 19, e0297192. [Google Scholar] [CrossRef]
Yuan, M.; Shan, J.; Mi, K. From Naturalistic Traffic Data to Learning-Based Driving Policy: A Sim-to-Real Study. IEEE Trans. Veh. Technol. 2024, 73, 245–257. [Google Scholar] [CrossRef]
Wang, C.; Cui, X.; Zhao, S.; Zhou, X.; Song, Y.; Wang, Y.; Guo, K. A Deep Reinforcement Learning-Based Active Suspension Control Algorithm Considering Deterministic Experience Tracing for Autonomous Vehicles. Appl. Soft Comput. 2024, 153, 111259. [Google Scholar] [CrossRef]
Bosello, M.; Tse, R.; Pau, G. Train in Austria, Race in Montecarlo: Generalized RL for Cross-Track F1 Tenth LiDAR-Based Races. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; pp. 290–298. [Google Scholar]
Li, Z.; Yuan, S.; Yin, X.; Li, X.; Tang, S. Research into Autonomous Vehicles Following and Obstacle Avoidance Based on Deep Reinforcement Learning Method under Map Constraints. Sensors 2023, 23, 844. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Zhang, Y.; Shi, X.; Qiu, F. A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors 2022, 22, 8055. [Google Scholar] [CrossRef] [PubMed]
Yang, K.; Tang, X.; Qiu, S.; Jin, S.; Wei, Z.; Wang, H. Towards Robust Decision-Making for Autonomous Driving on Highway. IEEE Trans. Veh. Technol. 2023, 72, 11251–11263. [Google Scholar] [CrossRef]
Bellotti, F.; Lazzaroni, L.; Capello, A.; Cossu, M.; De Gloria, A.; Berta, R. Explaining a Deep Reinforcement Learning (DRL)-Based Automated Driving Agent in Highway Simulations. IEEE Access 2023, 11, 28522–28550. [Google Scholar] [CrossRef]
Al-Sharman, M.; Dempster, R.; Daoud, M.A.; Nasr, M.; Rayside, D.; Melek, W. Self-Learned Autonomous Driving at Unsignalized Intersections: A Hierarchical Reinforced Learning Approach for Feasible Decision-Making. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12345–12356. [Google Scholar] [CrossRef]
Wang, J.; Sun, H.; Zhu, C. Vision-Based Autonomous Driving: A Hierarchical Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2023, 72, 11213–11226. [Google Scholar] [CrossRef]
Mao, Z.; Liu, Y.; Qu, X. Integrating Big Data Analytics in Autonomous Driving: An Unsupervised Hierarchical Reinforcement Learning Approach. Transp. Res. Part C Emerg. Technol. 2024, 162, 104606. [Google Scholar] [CrossRef]
Pérez-Gil, Ó.; Barea, R.; López-Guillén, E.; Bergasa, L.M.; Gómez-Huélamo, C.; Gutiérrez, R.; Díaz-Díaz, A. Deep reinforcement learning based control for Autonomous Vehicles in CARLA. Multimed. Tools Appl. 2022, 81, 3553–3576. [Google Scholar] [CrossRef]
Petryshyn, B.; Postupaiev, S.; Ben Bari, S.; Ostreika, A. Deep Reinforcement Learning for Autonomous Driving in Amazon Web Services DeepRacer. Information 2024, 15, 113. [Google Scholar] [CrossRef]
Wang, W.; Luo, H.; Zheng, Q.; Wang, C.; Guo, W. A Deep Reinforcement Learning Framework for Vehicle Detection and Pose Estimation in 3D Point Clouds. In Proceedings of the 6th International Conference on Artificial Intelligence and Security (ICAIS 2020), Hohhot, China, 17–20 July 2020; pp. 405–416. [Google Scholar]
Lee, Y.; Park, S. A Deep Learning-Based Perception Algorithm Using 3D LiDAR for Autonomous Driving: Simultaneous Segmentation and Detection Network (SSADNet). Appl. Sci. 2020, 10, 4486. [Google Scholar] [CrossRef]
Chen, Y.; Tse, R.; Bosello, M.; Aguiari, D.; Tang, S.K.; Pau, G. Enabling Deep Reinforcement Learning Autonomous Driving by 3D-LiDAR Point Clouds. In Proceedings of the Fourteenth International Conference on Digital Image Processing (ICDIP 2022), Wuhan, China, 20–23 May 2022; Volume 12342, pp. 362–371. [Google Scholar]
Zhai, Y.; Liu, Z.; Miao, Y.; Wang, H. Efficient Reinforcement Learning for 3D LiDAR Navigation of Mobile Robot. In Proceedings of the 41st Chinese Control Conference (CCC 2022), Hefei, China, 25–27 July 2022; pp. 3755–3760. [Google Scholar]
Sánchez, M.; Morales, J.; Martínez, J.L. Reinforcement and Curriculum Learning for Off-Road Navigation of a UGV with a 3D LiDAR. Sensors 2023, 23, 3239. [Google Scholar] [CrossRef] [PubMed]
Bellman, R.; Kalaba, R.E. Dynamic Programming and Modern Control Theory; Academic Press: New York, NY, USA, 1965; Volume 81. [Google Scholar]
Sutton, R.S.; Barto, A.G. The reinforcement learning problem. In Reinforcement learning: An introduction; MIT Press: Cambridge, MA, USA, 1998; pp. 51–85. [Google Scholar]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the International Conference on Machine Learning (ICML 2014), Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
Mnih, V. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Vaswani, A. Attention Is All You Need. In Advances in Neural Information Processing Systems (NeurIPS 2017); Curran Associates, Inc.: Long Beach, CA, USA, 2017. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. PCT: Point Cloud Transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar]
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the Conference on Robot Learning (CoRL 2017), Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
O’Kelly, M.; Sukhil, V.; Abbas, H.; Harkins, J.; Kao, C.; Pant, Y.V.; Bertogna, M. F1/10: An Open-Source Autonomous Cyber-Physical Platform. arXiv 2019, arXiv:1901.08567. [Google Scholar]

Figure 1. DDPG workflow.

Figure 2. The point cloud processing workflow.

Figure 3. Top view of the experiment map.

Figure 4. Three-dimensional LiDAR view in Carla.

Figure 5. Eval avg reward diagram of the experiment.

Figure 6. Loss diagram of the experiment.

Figure 7. Noise diagram of the experiment.

Figure 8. The experimental agent.

Figure 9. The system architecture of the real-world experiment.

Figure 10. Map of the experiment.

Figure 11. Results of the experiment. (a) Initial stage (b) More than 200 steps trials (c) Around 600 steps trials.

Figure 12. Reward diagram of the experiment.

Table 1. The parameter settings of 3D-LiDAR.

Channels	16
rotation_frequency (r/s)	10.0
Range (m)	50.0
upper_fov (°)	15.0
lower_fov (°)	−15.0
horizontal_fov (°)	360.0
sensor_tick (s)	0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Lam, C.T.; Pau, G.; Ke, W. From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Appl. Sci. 2025, 15, 1423. https://doi.org/10.3390/app15031423

AMA Style

Chen Y, Lam CT, Pau G, Ke W. From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Applied Sciences. 2025; 15(3):1423. https://doi.org/10.3390/app15031423

Chicago/Turabian Style

Chen, Yuhan, Chan Tong Lam, Giovanni Pau, and Wei Ke. 2025. "From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR" Applied Sciences 15, no. 3: 1423. https://doi.org/10.3390/app15031423

APA Style

Chen, Y., Lam, C. T., Pau, G., & Ke, W. (2025). From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR. Applied Sciences, 15(3), 1423. https://doi.org/10.3390/app15031423

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Virtual to Reality: A Deep Reinforcement Learning Solution to Implement Autonomous Driving with 3D-LiDAR

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Reinforcement Learning Network

3.2. Point Cloud Processing Algorithm

4. Experiments and Results

4.1. Implementation in Carla Simulator

4.2. Implement in Real Environment

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI