1. Introduction
Pure inertial navigation systems develop errors that increase with time. For long duration flights, position update from external sources (e.g., global navigation satellite system (GNSS) aiding, star tracker celestial localization systems, aerial video tracking of landmarks) is necessary to bound the inertial errors [
1].
Navigation of an unmanned aerial vehicle (UAV) using georeferencing methods other than GNSS is rather challenging [
2]. Available techniques include simultaneous localization and mapping (SLAM) and video odometry [
3,
4,
5], also known as video tracking [
6,
7]. These techniques extract the vehicle’s relative location from the geometric scene, sometimes fusing this with the outputs of other coherent sensors onboard (e.g., data from a lidar or ultrawideband device) to build a live map incrementally via Bayesian filtering [
8]. However, external reference is required if navigation is based on a global coordinate system. The idea behind this work is to implement an online image processing system that can automatically provide UAV steering controls driving the UAV into a scene of known location by comparing onboard aerial video frames with a georeferenced image. A georeferenced image may be acquired, for example, from an aerial landmark image or from Google Maps, which are prerecorded with help from external reference sources. In the literature, navigation methods of this type are referred to as vision-based navigation [
9,
10], indirect georeferencing [
10,
11], or landmark georeferencing [
4,
12]. Such methods aim to find ground georeference points using onboard aerial image processing and, therefore, enable the position of a vehicle on a global map to be determined in a GNSS-denied environment.
One of the key challenges of building a landmark-based UAV self-localization system is the requirement of an efficient image registration algorithm. Apart from the robustness and accuracy for affine transformations, the algorithm should be operationally autonomous with great computational efficiency, both of which are crucial properties for applications involving real-time processing. In the proposed system, handling a complex warped scene is considered less important, with the priority being that the registration algorithm must be efficient and automated to satisfy the need for real-time control and signal processing.
While area-based image registration techniques are generally more accurate and robust, their computational complexities are also higher than those of feature-point-based image registration techniques [
13]. Some computationally efficient algorithms such as the enhanced correlation coefficient (ECC) maximization algorithm [
14] and discrete-Fourier-transformation-based subpixel image registration [
15] are also available in the literature. A more recent work by [
16] describes an elastic registration method for dealing with images of different modalities and nonuniform illumination. These approaches have manifested some advanced features, such as invariant to photometric distortions in contrast and brightness that coped with a nonlinear function of parameters. Nevertheless, their overall accuracy and efficiency are highly dependent on the application and the robustness over mapping conditions is fragile compared with feature-point-based approaches in our experimental test. For feature-point-based algorithms, although the scale-invariant feature transform (SIFT) algorithm [
17] may be a good candidate for this work, we found that the algorithm efficiency needed to be improved; for example, the rotational invariancy was not always reliable. The speeded up robust features (SURF) algorithm proposed in [
18] has greatly rectified these problems and its efficiency is remarkable.
In this paper, we propose a UAV self-localization system that acquires position information by an image-registration-driven control process that recursively steers the UAV to align with the scene of a georeferenced image. As illustrated in
Figure 1, at a sampling interval, an aerial image is taken by the aerial view camera onboard the UAV and is compared with a georeferenced image for similarity by an online image registration algorithm. When a common scene between the two images is found via online image registration, we estimate the associated transformation matrix. The latter contains information on how this UAV should be steered such that the dissimilarity between the image pair is minimized. The UAV control signal generator interprets the estimated transformation matrix into a series of UAV steering controls, which, in turn, drive the UAV toward the scene of the georeferenced image. The process can be seen as a first order feedback control and will be terminated if a desired registration quality is stabilized. The UAV of the proposed system has two flying modes. While in “search mode”, the UAV is navigated by an onboard inertial navigation system (INS). Once a common scene is detected, the system releases the UAV steering control to the UAV control signal generator and enters into the “locked-up self-localization mode”.
In a more generic scenario, the proposed UAV self-localization system may work on a set of georeferenced images taken on a predefined flight path and use the localization outcomes to recalibrate the onboard inertial navigation system. It is in favor of those applications where the georeferenced images contain many distinguishable features, such as the 3D localization/tracking inside a complex building, or in a low-altitude urban environment in the absence of GNSS.
In the proposed system development, we have three major contributions. First, an autonomous fast speeded-up-robust-features (SURF)-based image registration algorithm for the underlying application is designed. Second, UAV controls for the localization process are derived from the estimated transformation matrix under linear mapping. Third, we implement a virtual aerial camera and build a near-online testing platform using Google 3D Maps that enables a Monte-Carlo-based simulation to examine the performance of the proposed UAV self-localization system in terms of effectiveness, robustness, and localization accuracy. The proposed method differs from existing vision-based approaches, as it finds both the vehicle position and attitudes using the entire scene of a georeferenced image.
Following the introduction section, the automated SURF-based image registration algorithm design is described in
Section 2. The UAV control signal generation is derived from the outcomes of image registration in
Section 3. Registration error analysis and localization experiment results are presented in
Section 4, followed by conclusions in
Section 5.
4. Experiment and Results
In this work, we focus on the evaluation of the system in the form of Monte Carlo simulations, where the UAV camera is simulated and aerial images are taken from Google Earth Pro© in a virtual test environment. In this experiment, we aim to:
provide a comprehensive understanding of the proposed system functionality, the way of the best functioning and additional control needed, etc.
check the robustness of the proposed algorithm as well as potential localization strategies.
4.1. Simulation Scenario
A flying UAV is guided by an inertial navigation system (INS) that calculates the UAV dead reckoning position and attitude from the observations of an onboard inertial measurement unit (IMU). Without an external aiding source, INS suffers from growing bias and drift over time, propagated from IMU errors. Typically, the INS integrates external location references, such as GNSS signal, via a Kalman filter [
1]. In the proposed system, as show in
Figure 6, we use aerial video to regularly obtain accurate location information as an alternative source of aiding in a GNSS-denied environment. We assume that aerial images taken by the aerial camera onboard the UAV are of the same optical properties (e.g., view angle, image size, etc.) as those of the georeferenced images. Otherwise, a proper image processing technique may be applied to retrieve the difference of the optical properties so that a consistent image registration can be performed. At each localization process, the UAV moves toward the area of the georeferenced image guided by onboard INS before the system is switched to localization mode. The UAV motion is then controlled by the image registration output until the scene of the onboard aerial camera is matched with that of the georeferenced image.
Figure 7 shows the entire scene of the UAV flying zone used in the simulation. It is a high-resolution image of size
pixels and corresponds to approximately 0.87 m per pixel. In this image, we marked the UAV takeoff location using a red square and the georeferenced scene using a green circle, both of which are from a single realization of 100 Monte Carlo simulations. The corresponding aerial image at the UAV start location and georeferenced image are shown in
Figure 8a,b, respectively.
In the Monte Carlo simulations, a virtual camera of resolution
pixels is simulated to take aerial images from the flying UAV. We assume that both aerial and georeference image sizes are identical. In each run, the locations of both the UAV starting point
and the scene of the geo-reference image
are uniformly drawn from the full scene area. In each run, the UAV enters the scene of the georeferenced image navigated by the onboard INS. Bias and drifts of inertial sensors yield navigation error of INS. Here, this navigation error is defined by the distance between the center of the scene of the georeferenced image and the actual position that the UAV has reached, and it is modelled as a uniformly distributed random variable whose distribution is
, where
The setup of (
8) ensures that when the UAV switches into localization mode from INS navigation, a partially overlapped scene can be found from both the aerial and georeferenced images, which is a reasonable assumption in practice. In our experiment,
and
, the UAV will enter into localization mode at a “pixel distance” distribution
in
x direction and
in
y direction to the center of the scene of the georeferenced image at each run.
In the localization mode, the aerial images taken from UAV camera are compared with a georeferenced image and the cost
is calculated based on the image registration output. In this experiment, when
, it is found that at least three consecutive transformation matrices can be estimated consistently by image registration, and thus we let the UAV lock into localization iteration, where UAV controls are generated from every output of the image registration. The UAV is then steered by these controls and progressively approaches the scene of the georeferenced image. The localization process is completed if the stopping criterion described in
Section 3.3 is met, that is, if the matching error (see
Figure 5) is no longer decreasing. The proposed localization procedure is a trial-and-error process. The first attempt of the UAV self-localization starts from the INS-computed location
(see
Figure 6). In the case where
has not fallen below 0.8 for a predefined short period, say, 10 sampling periods, another attempt of the UAV self-localization will be triggered and the UAV will be steered to a new starting location
, where
is drawn from a zero-mean Gaussian distribution with standard deviation described by nominal position error of INS. The system is designed to repeat this process until the UAV enters into the (locked-up) localization iteration and completes the localization process.
Figure 9 shows a snapshot of the experiment from a single run. In this example, the georeferenced image was taken from the point marked with a green circle and it was rotated
clockwise. It was also enlarged by a scale factor of 1.46 as if it were taken from a lower altitude than that of the UAV actually flying. Furthermore, white noise of
was added to the intensity of the georeferenced image. All of these added additional challenges to the UAV self-localization process.
4.2. Localization Error Analysis
Apart from randomly drawing the locations of the UAV starting point, the georeferenced scene, and the point at which the UAV enters into localization mode, we also warp the georeferenced image at each run to cover potential image variations caused by environment changes as follows:
Assume a random angle between the directions of the UAV heading and georeferenced image scene. The angle distribution is of Gaussian zero-mean with a standard deviation .
Assume that height at which the georeferenced image was taken is random such that the corresponding scale factor between the UAV aerial image and the georeferenced image follows from a uniform distribution .
A white noise of is added to intensities of the georeferenced image.
As mentioned earlier, the full scene of the UAV flying area shown in
Figure 7 is a high-resolution image of size
pixels with a scale of 0.87 m per pixel. In the controller design, we allow the final scale accuracy
, where
is the estimated scale in (
5) and
is the true scale. Therefore, the actual distance resolution is
meter per pixel.
From the matching error in (
6), and assuming that the errors in
x and
x directions are of the same scale, we may roughly estimate the matching error bound in terms of distance
as
For the current experiment setup, we have (m), where is the average cost at final localization iteration.
Table 3 lists statistical results obtained from 100 Monte Carlo runs for the proposed UAV self-localization process. In the experiment, the average UAV speed in INS navigation stage is 10 m/s and in the localization locked up (iteration) stage is 3 m/s.
As mentioned earlier, the localization iteration is a trial-and-error process. A single trial of localization process is deemed as an attempt. In 81 out of 100 runs, the UAV locks into localization iteration from INS navigation mode in the first attempt, and complete localization process successfully. The average time from the UAV taking off to the localization process completion is 72 s at a frame rate 5 frames per second. About 14% of the runs completed with two attempts, 4% with 3 attempts, and 1% with 4 attempts.
It is observed that the deformation level of reference image has a direct impact on the localization accuracy. The more attempts made for the UAV entering into the (locked-up) localization mode and thus completing the localization process, the larger the average localization error is. In those simulations in which more than one attempt have been made, the warp level of the georeferenced image is often more severe than average.
The UAV localization locked-up control process is demonstrated in
Figure 10, which was recorded from a single run. The left column shows the aerial image sequence (along the time line from top to bottom) taken by the UAV during the process and the right column displays the sequence of registered images on top of the georeferenced image, where we also indicate the operation status of UAV controller (turn right, flying down, and moving direction, etc.). Although the entire locked-up control process takes up about 50 frames, only six frames are shown here to illustrate the process. In this particular example, the UAV motion is driven by controls generated by the registration algorithm involving right turn, flying down, moving left, and forward operations during the locked-up localization process.
We carried out 100 further Monte Carlo runs where the georeference images and UAV aerial images are taken at different times of the day. In each run, the georeferenced image is drawn from the full scene image taken at 4:30 p.m. in the afternoon, whereas the UAV localization is performed at 10 a.m. in the morning. As shown in
Figure 11, this leads to the image pair to be registered with different lighting conditions caused by different sun illumination directions.
Similar to
Table 3, we present the statistical results of the 100 Monte Carlo runs for image pairs with different lighting conditions in
Table 4. While the localization process for all runs have successfully completed, the average time used is longer than those without lighting issues. The mean accuracy and localization error are slightly larger as well.
This reflects the fact that the difference of sun direction has considerable negative impact on the image registration algorithm. Overall, there is no localization failure case in our experiment. The proposed UAV localization system is therefore resilient against potential failure, showing a strong robustness and effectiveness, in particular, in the latter test results.