Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning

Lin, Junyi; Wang, Hongjun; Wu, Tao; Shen, Zhexian; Jiang, Ruhao; Fan, Xiaochen

doi:10.3390/drones9020081

Open AccessArticle

Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning

by

Junyi Lin

¹

,

Hongjun Wang

^1,*

,

Tao Wu

^1,*,

Zhexian Shen

¹,

Ruhao Jiang

¹ and

Xiaochen Fan

^2,3

¹

College of Electronic Engineering, National University of Defense Technology, Hefei 230031, China

²

Institute for Electronics and Information Technology in Tianjin, Tsinghua University, Tianjin 300467, China

³

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

^*

Authors to whom correspondence should be addressed.

Drones 2025, 9(2), 81; https://doi.org/10.3390/drones9020081

Submission received: 19 November 2024 / Revised: 17 January 2025 / Accepted: 20 January 2025 / Published: 21 January 2025

(This article belongs to the Special Issue Drone Communication, Networking, and Trajectory Control in Urban Environments)

Download

Browse Figures

Versions Notes

Abstract

:

An aerial Radio Environment Map (REM) characterizes the spatial distribution of Received Signal Strength (RSS) across a geographic space of interest, which is crucial for optimizing wireless communication in the air. Aerial REM construction can rely on Unmanned Aerial Vehicles (UAVs) to autonomously select interesting positions for sampling RSS data, enhancing the quality of construction. However, due to the lack of prior information about the environment, it is challenging for UAVs to determine suitable sampling positions online. Additionally, achieving efficient exploration of the target area through collaboration among multiple UAVs is difficult. To address this issue, this paper proposes a multi-stage approach to data sampling and path planning with multiple UAVs. Specifically, the UAVs’ data sampling task over the target area is divided into multiple stages. By selecting an appropriate stage position, we use the RSS values at that position to determine whether additional data need to be sampled in a specific local area. At each stage, the area is divided into Voronoi diagrams based on the current position of each UAV, assigning each UAV its own region to explore. In our sampling strategy, the probability distribution for sampling is obtained by estimating the RSS and uncertainty of unsampled positions and then taking the weighted sum of these two values. To obtain the shortest flight path for selected sampling positions, we employ a network structure based on self-attention as the policy network, which is trained through the actor–critic framework to obtain an improvement heuristic strategy, replacing traditional manually designed strategies. Experimental results across three different scenarios indicate that the approach improves the quality of aerial REM construction while efficiently planning the shortest paths for UAVs between sampling positions.

Keywords:

radio environment map; unmanned aerial vehicle; sampling strategy; deep reinforcement learning

1. Introduction

A Radio Environment Map (REM) characterizes the spatial distribution of the Received Signal Strength (RSS) across a geographic space of interest [1]. It has been widely used for communication network optimization, such as the deployment of base stations [2], reducing communication interference [3], and user localization [4]. With the continuous expansion of human activities towards the sky and the usage of Unmanned Aerial Vehicles (UAVs) for various applications [5], constructing an aerial REM helps improve the communication service quality in the sky. Some research works have been conducted on how to use an aerial REM to optimize communication between aerial devices and ground-based stations [5,6,7].

A typical REM is constructed by ground-based sensors with measured RSS data, which are not suitable for aerial REM construction. Fortunately, UAVs equipped with Spectrum Monitoring Devices (SMDs) can be used to sample the RSS data in air space, enabling the construction of a high-quality aerial REM. UAVs can dynamically plan flight paths, selecting the most informative measurement positions for data sampling. In the task of UAV-assisted RSS data sampling, most methods assume the partial prior information of the environment is known. For instance, in the work [8], it is assumed that the locations of the radiation sources are known, and then a data-driven deep learning algorithm is utilized to measure the uncertainty at positions within the target area, indicating the informative measurement at each position for UAV selection. The authors in [9] predetermine some areas as regions of interest, assigning higher informative value to the positions within these regions. When there is a lack of prior information, most work involves sampling by randomly selecting positions [10] or following a fixed sampling trajectory [11].

However, the radiation source is unknown in most applications [12], and UAVs may struggle to determine the most informative measurement positions in an online way. Additionally, some studies do not propose a path planning method to efficiently sample data after optimizing the sampling positions. Moreover, the above works were completed by a single UAV for data sampling, without considering the scenario of multi-UAV collaboration.

To address the above challenges, in this paper, we propose a multi-stage approach to data sampling and path planning with multiple UAVs. For effective collaboration among UAVs, we employ a Voronoi partitioning method. For the challenge of unknown prior environmental information, we propose a granular sampling strategy for selecting sampling positions. This selection is based on using RSS data sampled during flight to estimate the RSS value and uncertainty at other unsampled positions. For the problem of planning the shortest flight path to the selected sampling positions, we use a self-attention-based network structure as a policy network to learn a heuristic strategy for path planning. An earlier version of this paper was accepted as a conference paper in ICCT 2024 [13]. This version is a major extension with focuses on the generalization of the algorithm across different datasets.

The contributions of this work are as follows:

A Voronoi partitioning method is proposed to efficiently coordinate multiple UAVs for data sampling in the target area. Each UAV uses the Voronoi partitioning to focus on different local regions to avoid overlapping exploration areas.
A granular sampling strategy is proposed. It utilizes RSS data sampled during flight to estimate the RSS and uncertainty at unsampled locations. The weighted sum of both is used to obtain the probability distribution for sampling, ensuring that sampling positions are distributed as widely as possible within the areas of interest.
A network architecture based on a self-attention mechanism is proposed as a policy network, which is trained offline using the actor–critic framework to obtain an improved heuristic strategy, replacing traditional manually designed strategies.
Extensive simulations show that the proposed approach facilitates cooperation among multiple UAVs, enhances the construction accuracy of aerial REMs, and enables the planning of shorter flight paths for UAVs.

2. Related Works

2.1. REM Construction

The methods for constructing REMs are primarily divided into two categories: model-driven and data-driven approaches. Model-driven methods are based on physics simulations, calculating wireless channels using Maxwell’s equations. By inputting the 3D model of the environment along with the positions of transmitters and receivers, software can generate the REM of the input environment [14]. However, model-driven methods have high computational complexity and require a significant amount of prior environmental information. On the contrary, data-driven methods do not require prior environmental information; they construct the REM directly from sampling data. In recent years, geological interpolation algorithms, such as the inverse distance weighting (IDW) algorithm [15], Gaussian Process Regression (GPR) [16], and the kriging algorithm (KGA) [17], have been widely used in the construction of REMs. With the advancement of artificial intelligence, there have been some recent works utilizing data-driven methods based on machine learning to construct REMs, such as the generative adversarial network (GAN) [18], the graph neural network (GNN) [19], the autoencoder [20], and the convolutional neural network (CNN) [21]. However, machine learning-based methods rely heavily on the quality of the dataset, and their robustness can be compromised when the environment changes.

In addition to various REM construction methods, some studies have focused on optimizing the distribution of sampling positions to improve the accuracy of REM construction. Some methods optimize the sampling positions by utilizing partial prior information of the environment [8,9,22,23,24]. For example, by assuming the position and power of the radiation source are known, a Bayesian estimation-based deep learning algorithm is used in [8] to estimate the uncertainty of each sampling position. The authors in [9] predefine certain areas as regions of interest and distributed the sampling positions within these regions as much as possible. In [22,23], the authors assume that some RSS data are collected in advance in the target area. In the work [24], the REM constructed from the previous time step is used as prior information. However, it is difficult to obtain the prior information in practice. When in an unknown environment, some work involves sampling through random sampling or by setting a fixed flight trajectory [1,10,11,25], while other work involves selecting sampling positions based on certain assumed conditions [26,27,28,29]. In [26], the authors assume an exponential decay model with respect to distance to represent the characteristics of the propagation channel, thereby determining suitable sampling positions. The authors in [27,28,29] model the distribution of the data to be sampled within the target area as a Gaussian process. However, due to interference between multiple radiation sources and the complexity of the propagation environment, this assumption is often difficult to establish.

2.2. UAV Trajectory Design

When using the sampling strategy proposed in this paper to determine the sampling points for UAVs, the trajectory planning problem of the UAV can be transformed into a path planning problem, similar to the classic Traveling Salesman Problem (TSP). Due to the NP-hard nature of the path planning problem [30], solving it presents a significant challenge. Classic approaches can be categorized into approximation methods and heuristic methods. Although approximation methods can quickly find solutions, the obtained solutions may deviate significantly from the optimal solution, especially for certain problems where the approximation quality can be poor [31]. Heuristic methods find local optimal solutions within a reasonable computation time by selecting a heuristic strategy [32]. However, the effectiveness of traditional heuristic algorithms depends on the chosen heuristic strategy and initial solution. Recently, there has been a growing trend toward applying deep learning to automatically discover heuristic algorithms for solving routing problems [33]. Through the learning capabilities of deep networks, Deep Reinforcement Learning (DRL) algorithms can be generated to solve problems using heuristic strategies instead of those designed by humans, such as the recurrent neural network (RNN) [34] and the attention mechanism [35].

2.3. Motivation

In summary, the use of UAVs to assist in REM construction presents various challenges. In addition to the selection of sampling positions and the design of UAV trajectory, issues such as multi-UAV communication synchronization, environmental interference, and adaptability to complex terrain must also be addressed. However, many existing works have explored relevant solutions. Regarding multi-UAV communication synchronization, Jin et al. [36] proposed a scheme for assisting time synchronization in multi-UAV networks by utilizing frequency offset information. In terms of environmental interference, Li et al. [37] developed an alternating optimization approach to ensure reliable communication for UAVs in complex environments with low signal-to-noise ratios. The authors in [38] also investigated the impact of adverse weather conditions on UAV communication channels. Concerning adaptability to complex terrain, Zhang et al. [39] proposed an approach that combines reinforcement learning and artificial potential field methods, enabling UAVs to achieve safe flight in dynamic and unknown environments. Unlike these studies, the focus of our work is on addressing the challenges of coordination among multiple UAVs, selecting appropriate sampling positions, and designing suitable trajectories for the UAVs. In existing related works, the sampling positions of UAVs are either random or fixed, or optimized based on prior information or assumptions, without considering the possibility of dynamically optimizing the sampling positions using RSS data sampled by the UAVs during flight. Moreover, when the sampling locations are determined, many studies rely on manually designed heuristic strategies to plan flight trajectories. Not only do these approaches result in long computation times, but the solution quality also depends on the initial solution and the designed strategy. To the best of our knowledge, none of the studies have considered scenarios involving multiple UAVs assisting in REM construction while simultaneously addressing both the sampling position optimization and the path planning problem. To fill these gaps, this paper proposes a multi-stage approach to data sampling and path planning. It achieves multi-UAV coordination through Voronoi partitioning, optimizes sampling positions based on uncertainty, and designs a policy network based on self-attention. Using the actor–critic training method, it learns a heuristic strategy offline to quickly plan the shortest path between UAV sampling positions.

3. System Model

We consider the urban scenario with multiple-UAV-enabled REM construction, as depicted in Figure 1. There are a total of U UAVs and S radio sources, located within a square region with a side length of Q. Let

U

= {1, ..., U} and

S

= {1, ..., S} denote the sets of UAVs and radiation sources, respectively. We divide the target region into small square grids with side length l, resulting in a total of

D \times D

grids within the region, where

D = Q / l

. In order to construct the aerial REM, UAVs equipped with SMD collect the RSS data at the centers of small square grids. Table 1 lists the notations in this paper.

3.1. REM Construction Model

The grid for discrete positions of the region area can be represented as

G_{D} = {G_{d} (i, j) \in R^{2 \times 1} : i, j \in D}

, where

D = {1, \dots, D}

. Here,

G_{d} (i, j)

represents the position of the (i, j)-th grid and is expressed by the following equation:

G_{d} (i, j) = {[i, j]}^{T} l, i, j \in D .

(1)

As for UAV u, it moves horizontally at a fixed altitude H. Denote

Ω^{u}

as the set of sampling positions for UAV u, and the path length of UAV u is denoted as

\sum_{k = 1}^{|Ω^{u}|} ∥Ω_{k}^{u} - Ω_{k - 1}^{u}∥

, where

Ω_{k}^{u}

represents UAV u sampling at the k-th sampling location. Furthermore, denote

a^{u} = {a_{i j}^{u} : i, j \in D}

as the vector of the sampling selection of UAV u, where

a_{i j}^{u} = 1

denotes that UAV u samples the RSS at

G_{d} (i, j)

, which means that

G_{d} (i, j) \in Ω^{u}

and

a_{i j}^{u} = 0

otherwise. The RSS value sampled by the UAV at

G_{d} (i, j)

is defined as

P_{i j}

.

The principle of the method for constructing an REM is to find an optimal function

f (\cdot)

, using the set

X

of measurement data points, where

X = {{[i l, j l, P_{i j}]}^{T} ∣ a_{i j}^{u} = 1, \exists u \in U}

, to construct the REM

\tilde{p} \in R^{D \times D}

. The overall process of REM construction can be expressed as min

{∥\tilde{p} - p∥}_{F}

, where

p \in R^{D \times D}

is the real REM and

{∥\cdot∥}_{F}

means the Frobenius norm for matrices.

3.2. Problem Formulation

Based on the established system model, multiple-UAV-enabled REM construction mainly focuses on two aspects, i.e., how to find appropriate sampling positions for UAVs to improve the quality of aerial REM construction and how to find a shortest path when determining the positions of sampling points for UAVs. These goals are achieved by optimizing the positions and sequence of UAV sampling points. The problem can be expressed as:

min_{a^{u}, Ω^{u}} σ \sum_{u = 1}^{U} \sum_{k = 1}^{|Ω^{u}|} ∥Ω_{k}^{u} - Ω_{k - 1}^{u}∥ + τ ∥\tilde{p} - p∥ F,

(2)

s . t . \tilde{p} = f (X),

(3)

s . t . \sum {u = 1}^{U} \sum_{i = 1}^{D} \sum_{j = 1}^{D} a_{i j}^{u} \leq ε D^{2},

(4)

In Equation (2), the first term aims to minimize the UAV flight path while optimizing the sampling positions. The second item focuses on minimizing the discrepancy between the constructed REM and the real REM by optimizing the selection vector

a^{u}

. Constraint (3) stipulates that the construction of the REM must utilize data collected from all UAVs, as processed by the function

f (\cdot)

. Constraint (4) states that the total volume of data sampled by all UAVs must not exceed the sampling rate

ε

, reflecting the practical limitation that not every position in the target area can be sampled.

In fact, Equation (2) represents a multi-objective optimization challenge. It is particularly difficult to determine optimal sampling positions without knowing the locations of radiation sources. Moreover, even after identifying these sampling positions, planning the shortest route between them is an NP-hard problem. To tackle these issues, we propose a multi-stage approach to data sampling and path planning that utilizes only the RSS measurements and the locations of UAVs to identify suitable sampling positions. Once these positions are established, we employ a DRL approach to design the most efficient flight path.

4. Multi-Stage Approach to Data Sampling and Path Planning

In the task of aerial REM construction in an unknown environment, data-driven methods often emphasize the spatial characteristics of the data [40], which usually implies that the RSS values exhibit a certain degree of spatial continuity on the map. However, various factors during signal propagation can disrupt the continuity, such as interference between multiple radiation sources and obstruction by obstacles [41].

To improve the construction quality of the REM, we can adopt a granular sampling strategy in areas where RSS values change dramatically, meaning sampling more data in these areas. This can be effectively achieved by dividing the data sampling task with UAVs for the target area into multiple stages. Multi-stage refers to the process of selecting several appropriate stage positions for each UAV. In this way, the entire target area is divided into multiple local areas. Each UAV can determine whether a granular sampling strategy is needed for the local area corresponding to the current stage based on the change in RSS values between the two consecutive stage positions. The steps in each stage are as follows:

At the beginning of each stage s, based on the current positions $O_{s}$ of each UAV, Voronoi partitioning is applied to assign a region of responsibility to each UAV, restricting it to select the starting position $O_{s + 1}$ for the next stage within its partitioned region.
When UAV u flies from $O_{s}$ to $O_{s + 1}$ , it will decide whether to adopt a granular sampling strategy based on the magnitude of the change in RSS values between its current position $O_{s}$ and its previous position $O_{s - 1}$ . The granular sampling strategy uses the already-sampled RSS data to construct the RSS values and uncertainties at unsampled positions. The two values are weighted and summed to obtain a probability distribution for sampling. Then, UAV u selects appropriate sampling positions based on the probability distribution.
After determining the next stage position $O_{s + 1}$ and sampling positions for UAV u at stage s, a DRL method is proposed for planning the shortest path at stage s. We first randomly connect these positions into a path and then train our strategy using the actor–critic algorithm. The trained strategy is used to perform 2-opt optimization on the randomly generated path, ultimately planning the shortest path for the UAV u to fly from $O_{s}$ to $O_{s + 1}$ and complete the sampling task at stage s.

4.1. Stage Position Selection Based on Voronoi Partitioning

In selecting stage positions, we want the UAV to choose an appropriate position to determine whether the local area in that stage requires additional sampling. However, if multiple UAVs are present, they might move in the same direction, exploring overlapping areas, which can lead to inefficiency and longer task completion times. To avoid this, we use a Voronoi partitioning [42] approach. For any given UAV u, a Voronoi partition is generated based on its own position, and it only considers boundary positions within its own Voronoi partition for the next move. This will effectively prevent the UAVs from exploring the same areas repeatedly.

For UAV u, each stage s focuses on a circular area centered at

O_{s}

with radius r, where r depends on the length Q of the mission region. UAV u will select the starting position

O_{s + 1}

for the next stage along the edge of the circle. We partition the area into multiple sections using Voronoi partitioning at the beginning of each stage s. Each UAV is restricted to selecting its stage position

O_{s + 1}

from within its own partitioned area. Additionally, to prevent the UAVs from revisiting previously explored regions, we designate these areas as explored in subsequent stages. The partitioning of the UAVs is depicted in Figure 2.

By discretizing the reachable positions of the entire area for the UAV, UAV u can obtain a coordinate set (

X_{t i}^{u}

,

Y_{t i}^{u}

) for the selectable stage positions at each stage s. To select the most suitable stage position from the set, we define the following utility function

F_{t}^{u}

:

F_{t}^{u} = α d_{i n t} + (1 - α) d_{n},

(5)

where

d_{i n t}

denotes the distance between the candidate target position and the initial position of UAV u (i.e., the position of the first stage) and

d_{n}

denotes the distance between the candidate target position and the nearest neighboring UAV position. Note that when

α

= 1 , the UAV performs a depth-first exploration, allowing it to explore deeper into the area. When

α

= 0, the UAV aims to maximize its distance from other UAVs, which helps prevent repeated exploration of the same area. Regarding the task environment, we set an appropriate value for

α

, and the UAV iteratively searches for the position with the minimum

F_{t}^{u}

as the next stage position

O_{s + 1}

. When the entire area is marked as explored, it indicates that the task is completed, and UAVs conclude the cooperative data sampling mission. The stage position selection based on Voronoi partitioning is written in Algorithm 1.

Algorithm 1 Stage Position Selection Based on Voronoi Partitioning

1:: Input: Current positions $O_{s}$ of each UAV.
2:: Output: Next stage positions $O_{s + 1}$ of each UAV.
3:: while the entire area is not marked as explored do
4:: Partition via Voronoi diagrams based on the current positions $O_{s}$ of each UAV u.
5:: for each UAV $u \in {1, \dots, U}$ do
6:: Under the constraint of radius r, obtain the set of selectable next stage positions ( $X_{t i}^{u}$ , $Y_{t i}^{u}$ ) for UAV u within its own partition and the unexplored area;
7:: Calculate $F_{t}^{u}$ of each coordinate in the set ( $X_{t i}^{u}$ , $Y_{t i}^{u}$ ) through (5);
8:: Select the position with the minimum $F_{t}^{u}$ as the position $O_{s + 1}$ for UAV u.
9:: end for
10:: end while

4.2. The Granular Sampling Strategy Based on Uncertainty

The decision of whether to adopt a granular sampling strategy in stage s is based on the magnitude of the change in RSS values between its current position

O_{s}

and its previous position

O_{s - 1}

, which can be defined as:

\{\begin{matrix} Directly fly to O_{s + 1}, & if | R_{s}^{u} - R_{s - 1}^{u} | < R_{t h} \\ \begin{matrix} Adopt a granular \\ collecting strategy, \end{matrix} & if | R_{s}^{u} - R_{s - 1}^{u} | \geq R_{t h} \end{matrix}

(6)

where

R_{s}^{u}

and

R_{s - 1}^{u}

are the RSS values of UAV u at positions

O_{s}

and

O_{s - 1}

and

R_{t h}

is the set threshold. When the magnitude of the change in RSS values is greater than

R_{t h}

, UAV u will sample

n_{s e t}

points in the current local area before moving to the next stage position

O_{s + 1}

, where

n_{s e t}

depends on the sampling rate

ε

. The local area is a square region with the stage position

O_{s}

as the center and side length

2 r

.

In our multi-UAV system, we assume that each UAV is capable of broadcasting its stored point information and current position, as well as receiving broadcasts from other UAVs. Based on the above assumptions, there may be

n_{s}^{u}

already-sampled points within the local area of UAV u at stage s. To choose suitable sampling positions, we will use the sampled points to construct the REM of the local area by the KGA, not only constructing the REM but also representing the uncertainty of each position through variance values. The KGA can be expressed as:

\hat{P_{0}} = \sum_{n = 1}^{n_{s}^{u}} λ_{n} P_{n},

(7)

where

\hat{P_{0}}

is the estimated RSS value at the unsampled position

(x_{0}, y_{0})

and

λ_{n}

is the weight coefficient of the sampled position

(x_{n}, y_{n})

. Due to the probability of establishing LoS between the UAV and ground radiation sources being greater in high altitude, we assume that for the RSS at any position in space, there is the same expected value c and variance

σ^{2}

. Then, we have:

P_{n} = c + R (n),

(8)

where

R (n)

is the random deviation at point

(x_{n}, y_{n})

, satisfying

Var [R (n)] = σ^{2}

. It is important to note that in real-world environments, LoS conditions may not hold in dense urban areas with high interference. When LoS conditions fail, additional losses due to shadowing and gains from small-scale fading and unmodeled effects must be considered, alongside

R (n)

in Equation (8). In the KGA implementation, these various losses are modeled as a noise. As a result, KGA’s performance in dense urban environments with high interference will be worse compared to environments with lower interference. To overcome this limitation, it is necessary to combine model-based methods with more precise interference modeling of the environment or to rely on ray tracing methods to obtain more accurate numerical results.

Based on the assumptions stated above, the optimal weight coefficient should satisfy:

[\begin{matrix} r_{11} & \dots & r_{1 n_{s}^{u}} & 1 \\ ⋮ & ⋱ & ⋮ & ⋮ \\ r_{n_{s}^{u} 1} & \dots & r_{n_{s}^{u} n_{s}^{u}} & 1 \\ 1 & \dots & 1 & 0 \end{matrix}] [\begin{matrix} λ_{1} \\ ⋮ \\ λ_{n_{s}^{u}} \\ - ϕ \end{matrix}] = [\begin{matrix} r_{10} \\ ⋮ \\ r_{n_{s}^{u} 0} \\ 1 \end{matrix}],

(9)

where

ϕ

is the Lagrange multiplier and

r_{i j}

is the semivariance function value between points

(x_{i}, y_{i})

and

(x_{j}, y_{j})

, defined as:

r_{i j} = \frac{1}{2} E [{(P_{i} - P_{j})}^{2}] .

(10)

Since the RSS value at

(x_{0}, y_{0})

is unknown, the key to computing the weight

λ_{n}

is to solve for

{[r_{10}, \dots, r_{n_{s}^{u} 0}]}^{T}

. In general, the RSS values between two positions in space are a function of the distance d between the two positions, which requires finding an optimal fitting curve to fit the relationship between d and r. Similar to [17], we assume that the shadow loss at each position is spatially correlated. When the shadow values at two positions are

W_{i}

and

W_{j}

, respectively, the correlation coefficient

ρ_{i j}

between them can be expressed as:

ρ_{i j} = \frac{E [W_{i} W_{j}]}{σ^{2}} = \exp (- \frac{d_{i j}}{δ}),

(11)

where

d_{i j}

is the distance between the two positions,

σ^{2}

is a constant, and

δ

is the distance at which the correlation decays to 1/2. Because of the assumption of correlated shadowing expressed in (11), an exponential semivariogram model is used for fitting:

r_{i j} = c_{0} + c_{1} (1 - \exp (- \frac{d_{i j}}{a})),

(12)

where

c_{0}

,

c_{1}

, and a are the parameters that need to be fitted.

When we derive the weights

{[λ_{1}, \dots, λ_{n_{s}^{u}}]}^{T}

and the semivariance function r, apart from using (7) to estimate the RSS value at position

(x_{0}, y_{0})

, we can also estimate the uncertainty

S_{0}

, which is as follows:

S_{0} = \sum_{n = 1}^{n_{s}^{u}} λ_{n} r (d_{n 0}),

(13)

where

d_{n 0}

is the distance between the sampled position

(x_{n}, y_{n})

and the unsampled position

(x_{0}, y_{0})

. As shown in Figure 3, Figure 3a is the actual REM of the current area for UAV u at the stage s, Figure 3b represents the sampled points

n_{s}^{u}

collected by other UAVs, Figure 3c shows the constructed REM based on the sampled points

n_{s}^{u}

, and Figure 3d depicts the uncertainty of each position.

In order to select appropriate sampling positions for UAV u at stage s, we normalize the RSS and uncertainty, respectively, and integrate them by adding with a weight

η

to generate a probability map of sampling point distribution. The map indicates the probability distribution that UAV u will choose that positions for sampling, as shown in Figure 4. Since randomly located sampling points are suitable for parameter estimation [43], when UAV u adopts a granular sampling strategy at stage s, it will randomly select

(n_{s e t} - n_{s}^{u})

sampling positions based on the probability map. If

n_{s}^{u}

is greater than

n_{s e t}

, UAV u does not perform sampling but instead proceeds directly to the next stage position

O_{s + 1}

.

4.3. The Path Planning Method Based on DRL

After determining the starting position

O_{s}

, the next stage position

O_{s + 1}

, and sampling positions for UAV u at stage s, a DRL method is proposed for planning the shortest path at stage s. We formulate the Markov decision process as follows:

$S t a t e$ : The $s_{t}$ represents a solution to the path planning at time step t, i.e., $s_{t}$ = ( $O_{s}$ , $n_{t}^{1}$ , ..., $n_{t}^{i}$ , ..., $O_{s + 1}$ ), where $n_{t}^{i}$ denotes the i-th sampling position for UAV u at stage s.
$A c t i o n$ : The action $a_{t}$ is represented by a point pair ( $n_{t}^{i}$ , $n_{t}^{j}$ ). We update the state $s_{t}$ using the 2-opt method [44].
$R e w a r d$ : To optimize the initial solution as much as possible within time step T, we set the reward as follows:

$r_{t} = r (s_{t}, a_{t}, s_{s + 1}) = D (s_{t}^{*}) - min {D (s_{t}^{*}), D (s_{t + 1})},$

(14)

where D(s) represents the sum of Euclidean distances between consecutive points in sequence s. $s_{t}^{*}$ is the best solution found before step t, which is updated only when $D (s_{t}^{*})$ < $D (s_{t + 1})$ . The cumulative reward for time step T to maximize is expressed as $G_{T} = \sum_{t = 0}^{T - 1} γ^{t} r_{t}$ , where $γ$ is the discount factor.

To train a policy

π_{θ}

to consistently generate reasonable actions

a_{t}

at each time step, we employ the actor–critic algorithm. Inspired by [33], our actor network design is shown in Figure 5. Since positions

O_{s}

and

O_{s + 1}

are fixed, the input to the actor network is the position

f (n_{i})

of each collecting point in the current state s. Since linear transformation alone cannot capture the position of each collecting position, we apply sinusoidal positional encoding to add the node position embedding information. Then, we use the Transformer [45] architecture to encode input features, obtaining the embedding result

h^{1} (n_{i})

, with a dimensionality of 128. In order to effectively integrate global information into each node, we aggregate

{h^{1} (n_{i})}_{i = 1}^{I}

by max-pooling to obtain

h_{g}

, embedding

h^{1} (n_{i})

into

h^{c} (n_{i})

through

h^{c} (n_{i})

=

W^{1} h^{1} (n_{i}) + W^{g} h_{g}

, where

W^{1}

,

W^{g} \in R^{128 \times 128}

. Given node embeddings

H^{c} = [h^{c} (n_{1}), \dots, h^{c} (n_{I})]

, we obtain a compatibility matrix

Y \in R^{I \times I}

through a compatibility layer [45], which reflects the scores for selecting each pair of nodes. We transform Y into selection probabilities

P \in R^{I \times I}

for each node pair using a softmax layer. Since selecting the same two nodes is meaningless, we mask the diagonal elements. The elements

p_{i j}

in matrix P represent the probability of selecting (

n^{i}

,

n^{j}

) for local operations.

The design of the critic network is similar to that of the actor network, with the following differences: (1)

h^{c} (n_{i})

is obtained through an average pooling layer, and (2) the score

v_{ϕ}

is outputted through a fully connected layer. The complete algorithm is given in Algorithm 2.

Algorithm 2 Actor–Critic Algorithm-Based Path Planning

1:: Input: Initial actor network parameters $θ$ , critic network parameters $ϕ$ , number of epochs E, batches B, points M, step limit T, update step N.
2:: Output: Trained policy $π_{θ}$ .
3:: for each episode do
4:: randomly generate B batches of problem instances for path planning with M positions.
5:: for $b = 1, \dots, B$ do
6:: select the problem instances for the b-th batch and randomly generate state $s_{0}$ ;
7:: $t \leftarrow 0$ ;
8:: while $t < T$ do
9:: Reset the gradients of $θ$ and $ϕ$ , $t_{s} = t$ , and obtain state $s_{t}$ .
10:: while $t - t_{s} < N a n d t \neq T$ do
11:: sample $a_{t}$ based on $π_{θ} (a_{t} | s_{t})$ ;
12:: obtain reward $r_{t}$ and the next state $s_{t + 1}$ .
13:: $t \leftarrow t - 1$
14:: end while
15:: R = $v_{ϕ} (s_{t})$ ;
16:: for $i \in {t - 1, \dots, t_{s}}$ do
17:: $R \leftarrow r_{i} + γ R$ ; $δ \leftarrow R - v_{ϕ} (s_{i})$ ;
18:: $d θ \leftarrow d θ + \sum_{S_{b}} δ \nabla log π_{θ} (a_{i} | s_{i})$ ;
19:: $d ϕ \leftarrow d ϕ + \sum_{S_{b}} δ \nabla log v_{ϕ} (s_{i})$
20:: end for
21:: update $θ$ and $ϕ$ ;
22:: end while
23:: end for
24:: end for

In the actual deployment of the algorithm, since we directly use the trained policy to output the UAV’s flight path, the computational overhead mainly includes the forward propagation process. First, during the node embedding phase, we perform a linear transformation on the features of each node, which has a complexity of

O (N \cdot d \cdot h_{0})

, where N is the number of sampling positions for UAV u at stage s, d is the feature dimension, and

h_{0}

is the embedding dimension. Additionally, the complexity of adding position encoding is

O (N \cdot h_{0})

. Next, in the self-attention mechanism, we need to compute the relationships between nodes, resulting in a complexity of

O (N^{2} \cdot h_{0})

, as we generate an

N \times N

attention weight matrix and perform weighted summation. Subsequently, the complexity of action selection is

O (1)

. By aggregating these analyses, we conclude that the overall computational complexity of the policy network is

O (N^{2} \cdot h_{0})

. This complexity indicates that as the number of sampling positions for UAV u at stage s increases, the computational cost of the self-attention mechanism will significantly impact the efficiency of the algorithm. Therefore, the goal of our method is to construct a high-quality REM with a limited amount of RSS data. Subsequent experimental setups will also focus on discussing the quality of the REM constructed under low sampling rates.

5. Simulation Result

In this section, the performance of the proposed multi-stage approach to data sampling and path planning with multiple UAVs is evaluated. We compare the performance of our approach with other shortest path planning algorithms while also comparing the quality of the constructed REMs with other sampling strategies.

5.1. Simulation Setup and Parameter Settings

Simulation analysis and validation are conducted in an urban environment, where the RSS distributions are generated by the COST 231 extended WalfischIkegami empirical propagation model. The radiation source operates at a frequency of 2.4 GHz, with a transmission power of 43 dBm. Additionally, the target area is a square region with an area of 1000 × 1000 m², where the grid side length l is set to 5m, dividing the area into 200 × 200 small grids. Given the extensive area, three radiation sources are strategically positioned, each at an elevation of 30 m. The UAV is fixed at a height of H = 100 m to sample RSS data. When the sampling rate is too low, it is difficult to ensure the accuracy of REM construction, while a high sampling rate leads to excessively long sampling times. To demonstrate the superiority of our method in constructing high-quality REMs with minimal data, similar to other works on building REMs using UAV-sampled data, such as [24,40], the sampling rate

ε

for our experiments ranges from 0.25% to 2%, and

n_{s}^{u}

is set to 12.5% of

ε D^{2}

. The algorithm parameters are as shown in Table 2. In addition to experimenting with the REM generated through simulation, we also test the algorithm’s performance on two other datasets. The first dataset is the DeepREM dataset [46], which contains several urban scenario REMs. The other dataset is the Gudmundson dataset [8], which generates the RSS values using a path loss model and a shadowing model.

We used Pytorch 2.1.0 to implement our proposed algorithm, and all the codes are run on Microsoft Windows 11 with 13th Gen Intel(R) Core(TM) i9-13900KF CPU @3.00 GHz and NVIDIA GeForce RTX 4090 GPU. We trained 100 epochs for the policy

π

of solving the shortest path between 50 points, with an initial learning rate of

10^{- 4}

, a step limit T of 1000, and a batch size of 5120. After training the policy, we conducted testing with the policy’s step limit T set to 100.

5.2. Illustrative UAV Trajectories

The goal of path planning in this paper is to determine the shortest path for UAVs to traverse specified positions for accomplishing data sampling tasks. In Figure 6, we demonstrate the flight trajectories of the UAVs when U = 3 and U = 4, revealing a clear coordination and exploration among the UAVs. Each UAV departs from the starting position and is responsible for different parts of the target area during the data sampling mission. When the entire area is marked as explored, the UAVs will return. In Figure 6a, when U = 3, each UAV independently completes the data sampling of the region with dramatic RSS changes around the radiation source. However, in Figure 6b, when U = 4, the UAVs collaboratively perform the primary data sampling tasks in areas with dramatic RSS changes. These effects are a result of stage position selection based on Voronoi partitioning and a granular sampling strategy for choosing sampling positions.

5.3. Sampling Strategy Performance

Three construction methods including different sampling positions and different recovery methods were conducted via different sampling rates. The reconstruction method of this paper is denoted as Proposed-KGA, which indicates that after optimizing the sampling positions using the sampling strategy proposed in this paper, the KGA method is employed to reconstruct REM. The IDW algorithm [15] and the GPR algorithm [16] are also included as the representation of data-driven methods.

In order to compare the REM construction performance of the data sampled by our method, we compared it with the following sampling strategies:

Random sampling strategy [47]: Data are sampled at random positions, modeled by a homogeneous Poisson point process.

Grid sampling strategy [47]: Data are sampled at grid positions separated by a certain distance.

In the simulation environment, due to the sparse presence of buildings, the overall path loss is predominantly governed by free space loss. Regions where signal strength varies significantly are primarily located near the radiation sources. Thus, the distribution of sampling points in Figure 7b is mainly concentrated around the three radiation sources. In the Gudmundson dataset, the overall path loss incorporates a spatially correlated shadow as shadow fading on top of free space loss. Although the distribution of sampling points in Figure 7g is also around the radiation sources, it is not as densely distributed as in the simulation environment, but rather more scattered to capture the shadow fading effects. In the DeepREM dataset, which features a more complex urban environment with numerous buildings, significant variations in signal strength often result from building obstructions. Consequently, the distribution of sampling points in Figure 7l is scattered throughout the entire environment.

From Figure 7, it can be seen that the REM construction using the sampling strategy proposed in this paper exhibits better construction performance compared to other sampling strategies. This is because the proposed sampling strategy achieves a coordinated exploration of the entire area and increases the distribution of sampling positions in regions with dramatic RSS changes. The sampling strategy not only effectively restores the variations in RSS near the radiation source but also captures sudden changes in signal strength caused by obstacles, thereby enhancing the overall construction effectiveness.

To evaluate the impact of sampling strategies on the construction quality of the REM, this paper uses the following two metrics for assessment:

Structural Similarity (SSIM): SSIM focuses on the structure and visual quality of the constructed REM, specifically whether it visually resembles the actual REM.
Root Mean Square Error (RMSE): RMSE is defined as the difference between the actual REM values and the constructed REM values, given by

$R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(Y_{i} - {\hat{Y}}_{i})}^{2}},$

(15)

where $Y_{i}$ and ${\hat{Y}}_{i}$ are the true and the estimated RSS values in dBm at the ith grid, respectively.

As shown in Figure 8, under the condition of a low sampling rate, the SSIM of the REM constructed by the method in this paper is higher than that of reconstruction methods based on other sampling strategies. This is attributed to the granular sampling strategy employed in regions with significant variations in RSS values, effectively capturing the large fluctuations in RSS caused by obstructions or proximity to the radiation source.

Intuitively, the larger the spatial sample density is, the more likely it is to find the data points that are close and hence highly correlated with the channel at the target location [47]. When the number of sampling points is fixed, the grid sampling strategy provides a higher spatial sampling density compared to other sampling strategies, resulting in the optimal RMSE when constructing the REM using interpolation methods. The sampling strategy proposed in this paper increases the spatial sampling density in regions where the RSS values change significantly due to obstruction by obstacles or proximity to radiation sources, making the RMSE of the REM constructed with this strategy close to that of the REM constructed with grid collecting strategy. The experimental results are reflected in Figure 9.

5.4. Path Planning Performance

The goal of path planning in this paper is to determine the shortest path for UAVs to traverse specified positions for accomplishing data sampling tasks. Regarding the DRL action

a_{t}

, we attempted 2-opt and node swap (i.e., directly swapping the positions of two nodes), and the rewards during the training process are shown in Figure 10. It can be observed that the final reward value obtained by 2-opt is higher.

We analyze the influence of each component in our actor network on the performance through ablation studies. The results are as shown in Table 3. It can be seen that each component affects the performance of path planning.

To compare algorithm performance, we also implemented the following algorithms:

Greedy algorithm: The greedy algorithm selects the nearest unsampled point from the current position of the UAV greedily, until all positions have been collected.
Genetic algorithm (GA): It initializes several paths, with each path representing a chromosome. For each generation, it implements a crossover and mutation process. After several generations, the path with the shortest distance is considered the final solution.
Simulated annealing (SA) algorithm: It sets an initial temperature for starting annealing and initializes a path as the initial solution. During each cooling process, it perturbs the current path to obtain a new path and updates the shortest path as the optimal solution. When the temperature decreases to the termination temperature level, it outputs the optimal solution.
Recurrent neural network (RNN) [34]: The algorithm is based on a DRL framework, where a recurrent neural network is used to learn a construction heuristic policy, where each output is a position, gradually constructing a path.
Attention model (AM) [35]: The algorithm is also used to learn a construction heuristic policy based on a DRL framework, and the network is an attention model.

For a fair comparison, multiple experiments are conducted to find the most suitable hyperparameters for the comparative algorithms. For the GA, the population size is set to 100, and we run for 50 generations. As for the SA, the initial temperature is set to

10^{6}

, with 500 iterations per cooling step and a cooling rate set to 0.98. Due to randomness, we independently conduct five rounds of experiments for each algorithm and average the output results. For the two DRL algorithms, RNN and AM, we start with the hyperparameters suggested in the original paper and continuously adjust them to train the highest-quality policy.

From Figure 11, it can be observed that as the sampling rate increases, our algorithm can find shorter flight paths compared to other methods. Figure 12 illustrates that the time required for running DRL algorithms is much shorter than for traditional evolutionary algorithms. This is because DRL algorithms use pre-trained policies during deployment, and the computation only involves forward propagation through the network. This indicates that our algorithm can find a shorter flight path within a lower time cost compared to other algorithms.

To demonstrate the scalability of our method, Figure 13 shows the trend of the average flight path length per UAV as the number of UAVs increases, with a fixed sampling rate of 1%. Due to the multi-UAV collaboration implemented by the method in this paper, as the number of UAVs increases, the number of positions each UAV needs to sample decreases, which in turn reduces the corresponding flight path length.

5.5. Impact of Parameter Settings

The parameters of our method mainly include the parameter

α

in (3), the threshold

R_{t h}

, and the weight

η

for generating the probability map.

For

α

, different choices of

α

mainly affect the selection of stage positions. However, regardless of the value of

α

, the entire area will always be explored, so its primary impact is on the length of the UAV’s flight path. The selection of

α

also needs to be designed differently in different environments. Figure 14 demonstrates the effect of

α

on the UAV’s flight path length in two environments: our simulation environment with fewer obstacles and the complex urban environment of the DeepREM dataset. In our simulation environment, as the value of

α

increases, the flight path length of the UAV decreases. This is because the area of signal strength variation is mainly near the radiation source, requiring the UAV to explore deeper into that region. However, in the complex urban environment, due to the impact of obstacles on signal strength, the UAV needs to strike a balance between deep exploration and collaborative exploration to avoid redundant exploration of the same areas.

To select an appropriate value for

R_{t h}

, the SSIM values of the REMs constructed in two different environments are shown in Figure 15. As can be seen, in our simulation environment with fewer obstacles, a relatively larger

R_{t h}

value results in the highest SSIM. This is because the signal strength variations are primarily concentrated near the radiation source, and concentrating sampling points in these regions helps improve the accuracy of the REM construction. However, in the complex urban environment of the DeepREM dataset, a smaller

R_{t h}

value allows the UAV to focus on areas with significant signal variation caused by obstacles. However, a value that is too small leads to the UAV performing additional sampling in every region, which results in no available sampling points in later stages, thus affecting the overall reconstruction accuracy.

The purpose of the weight

η

is to strike a balance between focusing on regions with stronger signal intensity and reducing uncertainty across the entire area. Figure 16 and Figure 17 illustrate the impact of different

η

values on the accuracy of REM construction.

As shown in Figure 16, with an increase in the weight

η

, the SSIM value gradually increases. This is because larger

η

values make the sampling strategy more focused on reducing uncertainty across the entire region, thereby improving the accuracy of REM construction. Figure 17 indicates that when the weight

η

= 0.3, the RMSE reaches its minimum value. Sampling more points closer to the radiation source can effectively reduce the RMSE of REM construction [48]. Intuitively, regions with stronger signal intensity are usually closer to the radiation source. Therefore, appropriately increasing the number of sampling points in these areas helps reduce the RMSE of REM construction, thereby improving reconstruction accuracy.

6. Conclusions

In this paper, we propose a multi-stage approach to data sampling and path planning for multiple-UAV-enabled aerial REM construction. The collaboration among multiple UAVs is achieved by using Voronoi partitioning method. To improve the quality of constructing aerial REM, a granular sampling strategy is proposed, where the selection of sampling positions is based on the construction results and uncertainties. A self-attention-based policy network is designed and trained through the actor–critic framework to obtain a heuristic strategy for solving the shortest path problem for selected sampling positions. The simulation results illustrate that the algorithm achieves collaboration among multiple UAVs, reducing the length of UAV flight paths while improving the SSIM of the aerial REM and reducing the RMSE.

Author Contributions

Conceptualization, J.L. and T.W.; methodology, J.L. and T.W.; software, J.L.; validation, J.L.; investigation, J.L. and Z.S.; resources, H.W.; writing—original draft preparation, J.L.; writing—review and editing, J.L., T.W., Z.S., R.J. and X.F.; supervision, H.W. and T.W.; project administration, T.W.; funding acquisition, H.W. and T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NSFC with No. 62372456, in part by the Hong Kong Scholars Program with No. 2021-101, and in part by Hefei Comprehensive National Science Center.

Data Availability Statement

The simulation data presented in the study are openly available at https://github.com/iiiwan/simulation-environment, accessed on 15 January 2025. The data of DeepREM dataset presented in this study are available at https://zenodo.org/records/7839447, accessed on 23 May 2024, reference number [46]. The data of Gudmundson dataset presented in this study are available at https://github.com/uiano/spectrum_surveying_with_UAVs, accessed on 10 April 2024, reference number [8].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, T.; Huang, Y.; Chen, J.; Wu, Q.; Gong, Z. 3D radio map reconstruction based on generative adversarial networks under constrained aircraft trajectories. IEEE Trans. Veh. Technol. 2023, 72, 8250–8255. [Google Scholar] [CrossRef]
Romero, D.; Viet, P.Q.; Shrestha, R. Aerial base station placement via propagation radio maps. IEEE Trans. Commun. 2024, 72, 5349–5364. [Google Scholar] [CrossRef]
Han, Z.; Yang, Y.; Wang, W.; Zhou, L.; Gadekallu, T.R.; Alazab, M.; Gope, P.; Su, C. RSSI map-based trajectory design for UGV against malicious radio source: A reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4641–4650. [Google Scholar] [CrossRef]
Yapar, Ç.; Levie, R.; Kutyniok, G.; Caire, G. Real-time outdoor localization using radio maps: A deep learning approach. IEEE Trans. Wirel. Commun. 2023, 22, 9703–9717. [Google Scholar] [CrossRef]
Zeng, Y.; Xu, X.; Jin, S.; Zhang, R. Simultaneous navigation and radio mapping for cellular-connected UAV with deep reinforcement learning. IEEE Trans. Wirel. Commun. 2021, 20, 4205–4220. [Google Scholar] [CrossRef]
Chen, Y.-J.; Huang, D.-Y. Joint trajectory design and BS association for cellular-connected UAV: An imitation-augmented deep reinforcement learning approach. IEEE Internet Things J. 2021, 9, 2843–2858. [Google Scholar] [CrossRef]
Chen, Y.; Yang, D.; Xiao, L.; Wu, F.; Xu, Y. Optimal Trajectory Design for Unmanned Aerial Vehicle Cargo Pickup and Delivery System based on Radio Map. IEEE Trans. Veh. Technol. 2024, 73, 11706–11718. [Google Scholar] [CrossRef]
Shrestha, R.; Romero, D.; Chepuri, S.P. Spectrum surveying: Active radio map estimation with autonomous UAVs. IEEE Trans. Wirel. Commun. 2022, 22, 627–641. [Google Scholar] [CrossRef]
Wu, Q.; Shen, F.; Wang, Z.; Ding, G. 3D spectrum mapping based on ROI-driven UAV deployment. IEEE Netw. 2020, 34, 24–31. [Google Scholar]
Shen, F.; Ding, G.; Wu, Q. Efficient remote compressed spectrum mapping in 3-d spectrum-heterogeneous environment with inaccessible areas. IEEE Wirel. Commun. Lett. 2022, 11, 1488–1492. [Google Scholar] [CrossRef]
Mao, K.; Zhu, Q.; Ye, X.; Huang, Y.; Li, H.; Li, H.; Liu, X.; Lin, Z.; Wu, Q.; Song, M. Demo Abstract: A UAV-Based Real-Time Channel Knowledge Mapping System. In Proceedings of the IEEE INFOCOM 2024–IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 20 May 2024; pp. 1–2. [Google Scholar]
Ruan, T.; Huang, Y.; Zhu, Q.; Hao, C.; Wu, Q. Multi-stage RF emitter search and geolocation with UAV: A cognitive learning-based method. IEEE Trans. Veh. Technol. 2023, 72, 6349–6362. [Google Scholar] [CrossRef]
Lin, J.; Wang, H.; Wu, T.; Zhu, Q.; Shen, Z.; Jiang, R. Multi-Stage Data Collection and Path Planning for Multiple UAV-Enabled Aerial REM Construction. In Proceedings of the 2024 IEEE 24th International Conference on Communication Technology (ICCT), Chengdu, China, 18–20 October 2024. [Google Scholar]
Zhang, F.; Zhou, C.; Brennan, C.; Wang, R.; Li, Y.; Xia, G.; Zhao, Z.; Xiao, Y. A Radio Wave Propagation Modeling Method Based on High Precision 3D Mapping in Urban Scenarios. IEEE Trans. Antennas Propag. 2024, 72, 2712–2722. [Google Scholar] [CrossRef]
Liu, K.; Lin, Z.; Liu, Y.; Zhu, Q.; Wu, Q.; Cai, X. A New Spectrum Map Fusing Method Based on Difference Group Sparsity. In Proceedings of the 2023 IEEE/CIC International Conference on Communications in China (ICCC), Dalian, China, 10–12 August 2023; pp. 1–6. [Google Scholar]
Zhang, Y.; Wang, S. K-nearest neighbors gaussian process regression for urban radio map reconstruction. IEEE Commun. Lett. 2022, 26, 3049–3053. [Google Scholar] [CrossRef]
Sato, K.; Fujii, T. Kriging-based interference power constraint: Integrated design of the radio environment map and transmission power. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 13–25. [Google Scholar] [CrossRef]
Wang, H.; Lin, D.; Shen, Z.; Jia, M. Two highly accurate electromagnetic map reconstruction methods. IEEE Trans. Veh. Technol. 2022, 71, 12419–12424. [Google Scholar] [CrossRef]
Chen, G.; Liu, Y.; Zhang, T.; Zhang, J.; Guo, X.; Yang, J. A graph neural network based radio map construction method for urban environment. IEEE Commun. Lett. 2023, 27, 1327–1331. [Google Scholar] [CrossRef]
Teganya, Y.; Romero, D. Deep completion autoencoders for radio map estimation. IEEE Trans. Wirel. Commun. 2021, 21, 1710–1724. [Google Scholar] [CrossRef]
Levie, R.; Yapar, Ç.; Kutyniok, G.; Caire, G. RadioUNet: Fast radio map estimation with convolutional neural networks. IEEE Trans. Wirel. Commun. 2021, 20, 4001–4015. [Google Scholar] [CrossRef]
Wei, Y.; Zheng, R. A reinforcement learning framework for efficient informative sensing. IEEE Trans. Mob. Comput. 2020, 21, 2306–2317. [Google Scholar] [CrossRef]
Wei, Y.; Zheng, R. Multi-robot path planning for mobile sensing through deep reinforcement learning. In Proceedings of the IEEE INFOCOM 2021–IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Liu, C.; Zhu, K.; Tao, C.; Chen, B.; Zhao, Y. UAV-Assisted Active Sparse Crowdsensing for Ground Signal Map Construction Based on 3-D Spatial-Temporal Correlation. IEEE Internet Things J. 2024, 11, 27260–27274. [Google Scholar] [CrossRef]
Qiu, Y.; Chen, X.; Mao, K.; Ye, X.; Li, H.; Ali, F.; Huang, Y.; Zhu, Q. Channel Knowledge Map Construction Based on a UAV-Assisted Channel Measurement System. Drones 2024, 8, 191. [Google Scholar] [CrossRef]
Wang, J.; Zhu, Q.; Lin, Z.; Wu, Q.; Huang, Y.; Cai, X.; Zhong, W.; Zhao, Y. Sparse bayesian learning-based 3D radio environment map construction—Sampling optimization, scenario-dependent dictionary construction and sparse recovery. IEEE Trans. Cogn. Commun. Netw. 2023, 10, 80–93. [Google Scholar] [CrossRef]
Rückin, J.; Jin, L.; Popović, M. Adaptive informative path planning using deep reinforcement learning for uav-based active sensing. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 4473–4479. [Google Scholar]
Westheider, J.; Rückin, J.; Popović, M. Multi-UAV adaptive path planning using deep reinforcement learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 649–656. [Google Scholar]
Cao, Y.; Wang, Y.; Vashisth, A.; Fan, H.; Sartoretti, G.A. CAtNIPP: Context-aware attention-based network for informative path planning. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; pp. 1928–1937. [Google Scholar]
Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W. H. Freeman & Co.: New York, NY, USA, 1979. [Google Scholar]
Bansal, N.; Blum, A.; Chawla, S.; Meyerson, A. Approximation algorithms for deadline-TSP and vehicle routing with time-windows. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago IL, USA, 13–15 June 2004. [Google Scholar]
Liu, B.; Wang, L.; Jin, Y.H.; Huang, D.X. An effective PSO-based memetic algorithm for TSP. In Intelligent Computing in Signal Processing and Pattern Recognition; Lecture Notes in Control Information Sciences; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Wu, Y.; Song, W.; Cao, Z.; Zhang, J.; Lim, A. Learning Improvement Heuristics for Solving Routing Problems. IEEE Trans. Neural Networks Learn. Syst. 2021, 33, 5057–5069. [Google Scholar] [CrossRef] [PubMed]
Nazari, M.; Oroojlooy, A.; Snyder, L.V.; Takáč, M. Reinforcement Learning for Solving the Vehicle Routing Problem. arXiv 2018, arXiv:1802.04240. [Google Scholar]
Kool, W.; Van Hoof, H.; Welling, M. Attention, learn to solve routing problems! arXiv 2018, arXiv:1803.08475. [Google Scholar]
Jin, X.; An, J.; Du, C.; Pan, G.; Wang, S.; Niyato, D. Frequency-offset information aided self time synchronization scheme for high-dynamic multi-UAV networks. IEEE Trans. Wirel. Commun. 2023, 23, 607–620. [Google Scholar] [CrossRef]
Li, R.; Zhang, Q.; Ma, D.; Yu, K.; Huang, Y. Joint Target Assignment and Resource Allocation for Multi-Base Station Cooperative ISAC in UAV Detection. IEEE Trans. Veh. Technol. 2025, 1–15. [Google Scholar] [CrossRef]
Ibrahim, R.W.; Rodrigues, T.K.; Kato, N. Impact of UAV Failure and Severe Weather Conditions in mmWave and Terahertz Signals for AeriaL Edge Computing. In Proceedings of the 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), Hong Kong, China, 10–13 October 2023; pp. 1–7. [Google Scholar]
Zhang, X.; Zong, H.; Wu, W. Cooperative obstacle avoidance of unmanned system swarm via reinforcement learning under unknown environments. IEEE Trans. Instrum. Meas. 2024, 74, 7500615. [Google Scholar] [CrossRef]
Wang, J.; Zhu, Q.; Lin, Z.; Chen, J.; Ding, G.; Wu, Q.; Gu, G.; Gao, Q. Sparse bayesian learning-based hierarchical construction for 3D radio environment maps incorporating channel shadowing. IEEE Trans. Wirel. Commun. 2024, 23, 14560–14574. [Google Scholar] [CrossRef]
Liu, W.; Chen, J. UAV-aided radio map construction exploiting environment semantics. IEEE Trans. Wirel. Commun. 2023, 22, 6341–6355. [Google Scholar] [CrossRef]
Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 14413–14423. [Google Scholar] [CrossRef]
Xu, Y.-Q.; Zhang, B.; Zhao, B.; Guo, D. Radio environment map construction with spatially distributed sensors. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), Nanjing, China, 29 March 2021; pp. 1–7. [Google Scholar]
Watson, J.; Ross, C.; Eisele, V.; Denton, J.; Howe, A. The Traveling Salesrep Problem, Edge Assembly Crossover, and 2-opt. In Parallel Problem Solving from Nature—PPSN V; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Chaves-Villota, A.; Viteri-Mera, C.A. DeepREM: Deep-Learning-Based Radio Environment Map Estimation from Sparse Measurements. IEEE Access 2023, 11, 48697–48714. [Google Scholar] [CrossRef]
Xu, X.; Zeng, Y. How Much Data is Needed for Channel Knowledge Map Construction? IEEE Trans. Wirel. Commun. 2024, 23, 13011–13021. [Google Scholar] [CrossRef]
Romero, D.; Ha, T.N.; Shrestha, R.; Franceschetti, M. Theoretical analysis of the radio map estimation problem. IEEE Trans. Wirel. Commun. 2024, 23, 13722–13737. [Google Scholar] [CrossRef]

Figure 1. The scenario of data sampling in urban environments assisted by multiple UAVs.

Figure 2. An example of three UAVs selecting the next target position based on Voronoi partition at stage s.

Figure 3. An example of constructing the REM of the local area of UAV u at stage s.

Figure 4. The probability map of sampling point allocation in the current area for UAV u at stage s.

Figure 5. The architecture of the actor network.

Figure 6. Illustrative trajectories of UAVs under different quantities.

Figure 7. (a–e) are the experimental results on the simulation map set in the paper, with (f–j) being the experimental results from the Gudmundson dataset and (k–o) being the experimental results from the DeepREM dataset.

Figure 8. SSIM at different sampling rates.

Figure 9. RMSE at different sampling rates.

Figure 10. Reward comparison between 2-opt and node swap.

Figure 11. The average UAV path length.

Figure 12. Algorithm execution time.

Figure 13. The average UAV path length at different numbers of UAVs.

Figure 14. The average UAV path length at different

α

.

Figure 14. The average UAV path length at different

α

.

Figure 15. SSIM at different

R_{t h}

.

Figure 15. SSIM at different

R_{t h}

.

Figure 16. SSIM at different

η

values.

Figure 16. SSIM at different

η

values.

Figure 17. RMSE at different

η

values.

Figure 17. RMSE at different

η

values.

Table 1. List of notation.

Symbol	Definition	Symbol	Definition
$U, S, G_{D}$	Sets of UAVs, radio sources, and grids, respectively.	$Q, l$	Side length of the region and small grids, respectively.
$G_{d} (i, j)$	Position of the (i, j)-th grid.	$U, H$	Number and altitude of UAVs, respectively.
$Ω^{u}$	Set of collecting positions for UAV u.	$P_{i j}$	RSS value at the ( $i_{s}$ , $j_{s}$ )-th position in the grid.
$a^{u}$	Vector of the collecting selection of UAV u.	$X$	Set of measurement data points by all UAVs.

Table 2. Algorithm parameters.

Parameter	Value	Parameter	Value
$R_{t h}$	5 dB	$α$	0.8
r	125 m	$η$	0.3

Table 3. Ablation studies.

Self-Attention Layer	Compatibility Layer	Positional Encoding	Path Length	Reward
✓	✓	✓	6.137 ± 0.042	20.78
—	✓	✓	8.869 ± 0.086	18.51
✓	—	✓	9.293 ± 0.037	18.87
✓	✓	—	26.59 ± 0.005	9.16

Note: The ✓ indicates the inclusion of that specific component in the Actor Network.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, J.; Wang, H.; Wu, T.; Shen, Z.; Jiang, R.; Fan, X. Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning. Drones 2025, 9, 81. https://doi.org/10.3390/drones9020081

AMA Style

Lin J, Wang H, Wu T, Shen Z, Jiang R, Fan X. Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning. Drones. 2025; 9(2):81. https://doi.org/10.3390/drones9020081

Chicago/Turabian Style

Lin, Junyi, Hongjun Wang, Tao Wu, Zhexian Shen, Ruhao Jiang, and Xiaochen Fan. 2025. "Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning" Drones 9, no. 2: 81. https://doi.org/10.3390/drones9020081

APA Style

Lin, J., Wang, H., Wu, T., Shen, Z., Jiang, R., & Fan, X. (2025). Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning. Drones, 9(2), 81. https://doi.org/10.3390/drones9020081

Article Menu

Unmanned Aerial Vehicle-Enabled Aerial Radio Environment Map Construction: A Multi-Stage Approach to Data Sampling and Path Planning

Abstract

1. Introduction

2. Related Works

2.1. REM Construction

2.2. UAV Trajectory Design

2.3. Motivation

3. System Model

3.1. REM Construction Model

3.2. Problem Formulation

4. Multi-Stage Approach to Data Sampling and Path Planning

4.1. Stage Position Selection Based on Voronoi Partitioning

4.2. The Granular Sampling Strategy Based on Uncertainty

4.3. The Path Planning Method Based on DRL

5. Simulation Result

5.1. Simulation Setup and Parameter Settings

5.2. Illustrative UAV Trajectories

5.3. Sampling Strategy Performance

5.4. Path Planning Performance

5.5. Impact of Parameter Settings

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI