1. Introduction
In recent years, there has been a growing recognition of the convenience and practicality of unmanned aerial vehicles (UAVs), leading to their widespread applications in various fields such as earth science, traffic management, pollution monitoring, and product delivery [
1]. This increasing commercialization of drones, however, does not imply a decline in civilian drone usage. Statistics indicate a steady rise in the frequency and quantity of public drones, predominantly employed for aerial photography and recreational purposes. Looking ahead, civilian drones are anticipated to replace conventional means of transport in sectors like express delivery and logistics. Yet, the extensive use of drones will inevitably present significant challenges in terms of urban drone traffic management, posing potential security threats to cities. Consequently, effectively regulating the flight behavior of urban UAVs in the future, and mitigating in-flight conflicts and collisions with urban infrastructure, has emerged as a pressing issue that requires immediate attention.
In response to these challenges, several scholars have presented their perspectives on potential solutions. Ali [
2], for instance, defines the functions of an urban UAV traffic management system and conducts exploratory research to discern the disparities between manned and unmanned driving, thereby identifying the current challenges. Based on the research findings, Ali proposes an architectural framework for urban UAV traffic management. Mohammed Faisal Bin Mohammed Salleh [
3], on the other hand, constructs three networks: the AirMatrix route network, the route network above buildings, and the route network on roads. Through an analysis of these networks in terms of capacity and throughput, Salleh evaluates their performance. Additionally, Timothy McCarthy [
4] examines the approaches taken by the United States and Europe in unmanned transportation. He introduces a comprehensive method of airspace modeling and a traffic management platform developed by his research team, offering unique insights for urban UAV traffic management.
To address these challenges, this paper proposes a fundamental concept of establishing an urban low-altitude UAV channel network, designs an air traffic network based on the existing ground road network, formulates the flight methodology for the air channels, and provides a detailed analysis of the utilization of the air channel network. The second chapter of the paper presents the specific details of these approaches. Furthermore, to effectively implement this design concept, this paper focuses on designing a single UAV air channel guided by the aforementioned ideas. This single air channel serves as a foundational element for the subsequent formation of an air channel network comprised of multiple interconnected air channels.
When establishing a single air channel, it is crucial to consider the impact of surrounding high-rise buildings in the city. Entering such areas not only jeopardizes the safety of UAV flights but also poses significant risks to urban structures. Classical methods employed in air channel planning under similar circumstances encompass the genetic algorithm [
5], artificial potential field algorithm [
6], A* algorithm [
7], antcolony algorithm [
8], and particle swarm algorithm [
9]. In recent years, scholars have made noteworthy advancements in this field. For instance, Seyedali Mirjalili introduced the whale algorithm, which mimics the behavior of whales, offering fresh insights for optimization problems [
10]. Abdel-Basset Mohamed analyzed the principles and applications of the flower pollination algorithm, compared its efficacy with other algorithms, and evaluated its performance [
11]. Wenguan Luo introduced diverse enhancement strategies to the Cuckoo Search (CS) algorithm and proposed the Novel Enhanced CS Algorithm (NECSA) to improve convergence speed and accuracy [
12]. Yinggao Yue provided a comprehensive analysis of the sparrow search algorithm, highlighting its strengths, weaknesses, and influencing factors, while proposing relevant improvement strategies [
13].
When delineating a single air channel, path planning alone may not suffice to meet the requirements of different stages within the UAV air channel network. Classical path planning algorithms have some effectiveness in addressing these challenges. However, the Deep Q-learning (DQN) algorithm with a reward mechanism can offer an alternative perspective to tackle such problems. By enabling the UAV to continuously interact with its environment, the DQN algorithm helps the UAV discover optimal behaviors [
14]. To overcome the limitation of the DQN algorithm’s generalization ability, Chunling Liu [
15] proposed a multi-controller model combined with fuzzy control. This approach provided ample positive samples for the model at the initial stage of the experiment, which enhance training efficiency. Yunwan Gu [
16] resolved the M-DQN algorithm into a value function and an advantage function based on its network structure, leading to accelerated convergence and improved generalization performance. While these approaches improved pathfinding, convergence of the final reward and experimental results varied. Siyu Guo [
17] optimized the traditional reward function by incorporating a potential energy reward for the ship at the target point. This approach facilitated the ship’s movement towards the target point, significantly reducing calculation time. However, the final test and simulation processes employed relatively simplistic environmental designs, making it challenging to verify the experimental efficacy under more complex environments and constraints. Lei Luo [
18] introduced the A* algorithm into the deep reinforcement learning framework, creating the AG-DQN algorithm, which achieved promising results in solving resource management and flight scheduling problems. Therefore, to address the complexities of path planning in continuous action environments, this paper integrates the artificial potential field algorithm into the reward mechanism and action space of deep reinforcement learning and presents a viable approach to achieve the desired outcomes.
2. Design Principle of the Air Channel Network
Numerous studies have been conducted on the establishment of unmanned aerial vehicle (UAV) channels in urban airspace. For instance, Qingyu Tan et al. [
19] proposed a city air route planning system consisting of three components: traffic network construction, route planning, and take-off time scheduling. When a flight request is received, route planning is performed, and takeoff time scheduling is conducted to minimize conflicts, thus presenting a novel approach to urban UAV airspace management.
In this paper, we present the construction of a new aerial corridor structure and flight regulations to address flight conflicts. We engage in discussions aimed at alleviating these conflicts.
When UAVs fly in urban areas, they must adhere to pre-programmed rules based on the established air channel network, as illustrated in
Figure 1. The air channel network comprises three components: Class I channel, Class II channel, and Class III channel. The delineation of these channels is organized in accordance with the requirements of UAV air control. This paper assumes the following requirements for UAV air control:
Adherence to Air Channel Network: The UAV must strictly follow the designated air channel network. It should not deviate from the pre-defined routes and paths outlined in the network.
Diversion Areas: The air channel network should include diversion areas to accommodate the flight requirements of UAVs heading to different destinations.
Varying Heights: The air channel network has different height restrictions in different regions. The single air channel planning considers only the two-dimensional plane situation, without accounting for vertical dimensions.
Ground Identification Devices: UAVs need to be equipped with the capability to respond to ground identification devices. These devices ensure proper communication and identification between the UAV and ground control systems.
Directional Flight: When operating within the air channel network, UAVs must strictly adhere to the prescribed direction of the channel. They should not change their flight direction at will while flying within the designated channels.
Figure 1.
Diagram of air channel.
Figure 1.
Diagram of air channel.
According to the above requirements, the air channel principle is defined as follows.
The air channel network consists of three main components: Class I channel, Class II channel, and Class III channel, shown in
Table 1. Class I Channel is comparable to the arterial roads in a ground road network. It serves as the primary component of the air channel network. UAVs can fly at relatively high speeds within this channel while maintaining a safe distance from other drones. The Class II channel connects the take-off or landing areas to the Class I channel. It functions similar to the connecting roads that link destinations to the arterial roads in a ground road network. In this channel, there is a requirement for information identification of drones to prevent destination errors caused by system errors. Additionally, to enhance safety in the take-off or landing area, drones should operate at lower speeds within this channel. The Class III channel primarily serves as a diversion area within the air channel network. It plays a role similar to intersections and overpasses in a ground road network. UAVs navigate through these areas and adjust their flight direction based on their destination. Class III channels are usually found at intersections and only a few channels connect these intersections. The height changes depicted in the
Figure 1 are not significant in practical terms.
To accommodate the needs of UAV direction changes and prevent collisions at intersections, the air channel network is designed with specific height and width parameters. Additionally, regulations regarding UAV steering are specified to ensure safe and efficient operations. This paper focuses on the example of right flight.
To minimize interactions between different aircraft within the same channel, an air channel is designed to take inspiration from ground highways. The configuration of the air channel is depicted in
Figure 2.
In
Figure 2, the air channel is divided horizontally into three parts: the one-way channels on both sides and the anti-collision channel in the middle. The anti-collision channel is specifically designed to minimize interactions between drones flying in opposite directions. It ensures a sufficient horizontal flight interval and enhances safety.
The air channel network differs from ground roads in that drones cannot wait in the air, making it impossible to install traffic lights at intersections to coordinate steering demands. To meet the requirements of drone flight, this paper proposes the configuration shown in
Figure 3.
In the intersection, the flight channel is positioned in the middle, with up and down steering channels on either side. The flight channels for north-south and east-west directions are set at different heights to eliminate potential conflicts. However, conflicts between left and right directions cannot be resolved through this method. Therefore, the following provisions are proposed:
The right turn needs to be made in the flight channel layer;
The east-west drone needs to turn left in the upper side turn channel, and the north-south drone needs to turn left in the lower side turn channel.
As shown in
Figure 3, when a UAV follows route 1 into the intersection, it lifts height and enters the green steering channel prior to reaching the intersection. It then proceeds into the upper steering channel along the left designated area and merges into the north-south air channel. The same procedure applies for a north-south UAVs entering the lower steering channel.
By designing the aforementioned concept of a UAV air traffic network, flight conflicts can be effectively resolved, and the flight behavior of urban UAVs can be regulated. To design the air traffic channel network, this paper analyzes a single air channel, considering its consistency between the final result and theoretical principles to verify the scientific basis of the theory.
4. Aerial Channel Planning Based on APF-DQN
To effectively avoid tall buildings during flight and reach the destination efficiently, a combination of the artificial potential field (APF) algorithm and deep Q-learning (DQN) algorithm is proposed. The APF algorithm is known for its effectiveness in obstacle avoidance and attraction towards the destination. In this approach, the advantages of APF are integrated into the action space and reward mechanism of the DQN algorithm. Designing a reward that attracts the UAV to the destination continuously aligns the resultant force direction with the direction of the destination. Furthermore, by designing rewards for aircraft undergoing continuous actions and considering the selection of different actions at each step, we address the limitations of DQN when dealing with discrete spaces. Additionally, the repulsion generated by other building elements within the APF has a significant impact on action choice. Thus, the UAV learns to automatically avoid building areas.
In the course of flight return, it is assumed that the safe distance between the aircraft and the airspace is
, the navigation and positioning error is
, the minimum response distance of the pilot is
, and the normal flight between the aircraft and the airspace is
. When the distance between the aircraft and the airspace is within
, this constitutes the safe zone (SZ). When the distance between the aircraft and the airspace is
, the aircraft is diverted to ensure safety, defining the diverted area. When the distance between the aircraft and the airspace is
, compulsory actions will be taken to avoid the airspace and alter the flight area, as depicted in the
Figure 5.
4.1. Improve Action Space
The original DQN algorithm excels in handling discrete spaces within a raster map environment, but it falls short when it comes to continuous action spaces [
28]. Consequently, the action selection mechanism of the DQN algorithm is modified to utilize a unit vector that represents the desired flight direction of the aircraft. The resultant force
F of APF is put into the activation function
, and the resultant force
F is mapped to the value
between
is multiplied by the basic action step
to obtain the final action step, namely the distance of each action. The activation functions
and
are as follows:
When the aircraft is in the safe area, it is only affected by the target gravity, and its motion direction can be obtained by adding the output direction
of DQN and the direction
of gravity. Thus, the motion of the aircraft can be written as follows:
When the aircraft approaches the airspace and is in the rerouting area, the movement of the aircraft can be expressed as follows:
As evident from the equation above, the closer the aircraft is to the airspace, the longer the step length of the action becomes, which is more advantageous for quick avoidance. When the aircraft approaches the airspace and is in the mandatory course change area, assuming that
is the direction of avoiding collision, the action of the aircraft is as follows:
4.2. Improve the Reward Mechanism
In the traditional DQN algorithm, only when the agent contacts the obstacle or reaches the end point can there be a reward, and other actions do not have any effective feedback. Therefore, to solve this problem, a new reward mechanism is proposed, as follows:
In order to make the aircraft reach the end point faster, the reward mechanism is set according to the distance and Angle between the end point and the aircraft. Set the distance from the starting point to the end point as
, the distance between the current position and the end point as
, the direction Angle of gravity in the artificial potential field as
, and the direction Angle of the current aircraft as
. In accordance with the premise of the question, the closer the aircraft is to the end point, the higher the reward value. Thus, the reward calculation is expressed as follows:
When the aircraft is located in the rerouted area, the reward value should decrease as the distance between the aircraft and the airspace decreases. In this region, the aircraft receives the combined action of the gravitational and repulsive forces in the artificial potential field, the resultant force is
F, and the resultant direction Angle is
. Thus, the reward function is as follows:
5. Simulation Verification
During the flight of UAVs, the buildings significantly affect the airspace, requiring the avoidance of such areas when designing the layout of air channels. According to potential situations during return flights, higher building areas should be added into the environment as constraining factors. This enables the creation of an urban environment model, as shown in
Figure 6.
In
Figure 6, areas 1, 2, 3, 4 and 5 represent the higher buildings in the city. In order to ensure flight safety and not affect the normal operation of other tasks, the UAV should pay attention to avoid these areas when returning.
The deep neural network diagram in this method is illustrated in
Figure 7. The input information comprises the UAV’s position, the distance between the endpoint and the UAV, angle and the perceived obstacle information surrounding the UAV. The network processes this information to derive the UAV’s current status, reward, and ultimately output action.
To ensure the UAV can locate the destination and minimize the path length, the method set specific rewards and penalties. Returning to the endpoint is rewarded with a value of 200, while mistakenly entering another building area carries a penalty of −200. Moreover, each action undertaken by the UAV incurs a reduction in the final reward value. The parameters for the method are determined in
Table 2.
5.1. DQN Training
Based on the given parameters and environment, the simulation results depict the number of steps taken in each episode, as depicted in
Figure 8. Initially, the first 200 flights fail to reach the destination stabilized, resulting in big fluctuations in the number of steps. Throughout this learning process, the aircraft continually learns from the experience repository and explores different available air channels. After undergoing over 400 training iterations, the number of steps gradually stabilizes, eventually converging within a consistent range. This indicates that the aircraft has achieved stable navigation towards the destination, with relatively optimal channel selection.
As the
Figure 9 shows, in the first image, which represents the initial stage, the aircraft struggles to arrive at the endpoint accurately. Consequently, it mistakenly enters the building area, resulting in a penalty. However, as the training progresses, the aircraft gradually improves its ability to arrive at the endpoint and begins to receive rewards for successfully reaching the destination, as depicted in the second image. While the reward value changes based on the length of the channel, the optimal channel is yet to be discovered at this stage. The training process focuses on learning and exploring various routes, gradually refining the aircraft’s navigation strategy.
Figure 10 presents the evolution of reward values as training progresses. Analysis of the figure reveals several observations: In the initial 100 training episodes, the reward experiences big fluctuations. This is attributed to the aircraft frequently straying into other building areas and failing to locate the optimal air channel. At this stage, the aircraft is still exploring different areas and struggling to find the most efficient route. Between 100 and 500 training iterations, the aircraft gradually improves its ability to arrive at the endpoint and receives rewards accordingly. However, there are still instances where the aircraft enters the building areas. These results indicate that the process still needs to be optimized. After 500 training iterations, the aircraft successfully identifies the optimal solution, resulting in a stable reward.
5.2. Comparative Analysis
The APF-DQN algorithm offers several advantages compared to simple APF, DQN, and Q-Learning algorithms. It effectively addresses the limitations of traditional DQN and Q-Learning algorithms in continuous space problems, while also overcoming issues related to weak generalizability and slow solving speed. The three algorithms can be compared based on their convergence speed and total displacement degree:
Figure 11 demonstrates the significant progress made by the improved algorithm in terms of the total distance traveled. It effectively reduces the redundant steering observed in the APF algorithm’s path selection, while also avoiding the issue of traditional machine learning algorithms taking longer routes. Consequently, the final number of steps in the improved algorithm is considerably lower than the other two algorithms, indicating improved efficiency.
Figure 12 provides further insights into the improved algorithm’s performance. The line color in the figure is the same as in
Figure 11. The first image illustrates that the improved algorithm reaches the final goal at a much faster speed compared to the other two algorithms. Its convergence speed is also notably faster. The second image showcases the superior performance of the improved algorithm, as it achieves the least number of steps and the smallest total distance traveled.
Figure 11 showcases that the aircraft has successfully navigated through the urban environment while avoiding buildings, and has made a shorter overall air channel length. This aligns with the analysis presented in Chapter 2, which explores the various components of the air channel network, and it also can be observed and analyzed in
Figure 11. For example, in
Figure 11 during the turning process, if there are multiple endpoints in the environment, forming a network of air channels composed of a single channel, the turning segment in the figure belongs to an intersection, which can be classified into the Class III channel. The channel used for progressing towards the endpoint after completing the turn belongs to the Class I channel. The channel near the endpoint belongs to the Class II channel. The design demonstrates reasonable characteristics and exhibits strong adaptability in guiding the aircraft through the urban environment.