1. Introduction
Landing is a critical phase in unmanned aerial vehicle (UAV) flight. Currently, there are three main navigation methods used during UAV landings: instrument landing system (ILS), microwave landing system (MLS), and Global Positioning System (GPS). However, these navigation methods heavily rely on external equipment. Furthermore, they have drawbacks such as expensive equipment, poor maneuverability, difficulty in installation, susceptibility to signal interference, and vulnerability to deception. Therefore, the development of a fully autonomous, reliable, and stable autonomous landing navigation system has become an urgent problem.
With the continuous development of visual perception and navigation technologies, the application of visual navigation in the autonomous landing process of unmanned aerial vehicles (UAVs) has gained widespread attention. Visual navigation offers several advantages: (1) It does not require establishing an information link with the outside world, making it a completely autonomous navigation system that is immune to interference. (2) There is no need to set up expensive communications equipment on the ground, which is less costly. (3) It requires minimal prior information about the landing airport, allowing UAVs to land in relatively unfamiliar or temporary airfields.
UAV visual guidance landing image processing is particularly important, the current research methods on image processing can be divided into traditional methods to detect the runway line, traditional methods to detect cooperative signs and deep learning methods to detect the runway. Traditional methods are faster and easier to deploy but are more sensitive to the environment; traditional methods to detect cooperative signs are more accurate but require certain labeling conditions; the deep learning methods used in this paper is environmentally robust and require fewer external conditions. These three methods have their own advantages and disadvantages and are used in different application scenarios.
The detection of runway lines using traditional methods generally involves five steps: image preprocessing, feature extraction, feature selection, runway line fitting, and runway line detection and classification. In [
1], the authors employ multi-sensor image fusion to obtain image data of the runway. They use support vector machine (SVM) for runway recognition and extract runway edges and ground lines using edge detection and the Hough transform. Finally, they obtain the aircraft’s attitude data. In [
2], the authors extract the horizon and runway edges using the Hough transform. They estimate the aircraft’s pose separately using the horizon and runway edges and track the runway using template matching. In [
3], the authors detect runway lines using the Canny edge detector and Hough transform.
The detection of cooperative markers using traditional methods is similar to the process of detecting runway lines. However, adjustments need to be made based on the characteristics and requirements of the cooperative markers. Generally, it involves five steps: image preprocessing, feature extraction, feature selection, marker detection and localization, and marker classification and recognition. In [
4], a monocular single-frame vision measurement algorithm for autonomous landing of unmanned helicopters is derived. By detecting square cooperative targets, the position and attitude of the unmanned helicopter are calculated, totaling six parameters. In [
5], the landing guidance process for unmanned aerial vehicles (UAVs) is divided into two stages. In the initial stage of landing, runway corner features are used as guidance markers, and in the later stage of landing, Apriltag labels are recognized to guide the UAV’s landing. In [
6], a fixed-wing UAV autonomous landing method based on binocular vision is proposed.
Using deep learning for runway line detection involves several steps, including data preparation and preprocessing, building the deep learning model, model training, test evaluation, model optimization, and model inference. Deep learning methods enable end-to-end training using a large amount of image data, allowing for the learning of more complex feature representations and improving the performance and robustness of runway line detection. Additionally, deep learning methods can adapt to different lighting conditions, complex backgrounds, and exhibit good generalization capability. In [
7], a deep learning approach is used for UAV detection and tracking. By performing triangulation and filtering calculations on the detected objects in a binocular vision system, the spatial position of the UAV is estimated. In the positioning stage, a Kalman filter is used to smooth the spatial trajectory, approximating the area where the target is likely to appear in the current frame. This improves the accuracy of estimation while reducing the difficulty of tracking. In [
8], an onboard-YOLO algorithm suitable for lightweight and efficient usage on UAV onboard systems is proposed. It utilizes separable convolutions instead of conventional convolutional kernels, effectively improving the detection speed.
However, single visual navigation alone may not satisfy the requirements for precision and reliability in autonomous UAV landing. It is necessary to complement the limitations of visual navigation by leveraging measurement information from other sensors. Traditional methods of combined navigation include GPS/INS(combined navigation), INS/visual combined navigation, GPS/INS/visual combined navigation, and so on. As the number of sensors increases, the accuracy and robustness of the navigation system continuously improve, enhancing the performance of autonomous UAV landing. In [
9], a visual–inertial navigation fusion algorithm is proposed, where position and attitude alignment are achieved using Kalman filtering. The position alignment estimates velocity errors and accelerometer biases, while the attitude alignment estimates attitude errors and gyroscope drift. The estimated alignment errors and the attitude information output by the visual navigation system are used to correct the inertial navigation attitude. In [
10], YOLOv3 is used to detect the runway region of interest (ROI), and an RDLines algorithm is employed to extract the left and right runway lines from the ROI. A visual/inertial combined navigation model is then designed within the framework of square-root unscented Kalman filtering.
In a visual navigation system, accurate detection of the runway and runway lines is crucial for the system’s performance [
11]. Traditional methods for runway line detection, such as those based on the Hough transform and LSD line detection, offer good real-time performance. However, their generalization to different scenarios is poor and they heavily rely on manually designed features and parameters. Although researchers have proposed more comprehensive feature description methods (e.g., SIFT, ORB), their robustness and accuracy still cannot fully meet the demands of practical applications. Therefore, traditional line detection methods are not suitable for use in visual navigation systems. With the advancement of parallel computing capabilities brought about by hardware such as GPUs, deep neural networks have greatly improved detection accuracy and robustness. This has led to a new stage in object detection and the development of a series of deeper, faster training, and more accurate deep neural networks. Utilizing deep learning for runway line detection is a promising choice. Furthermore, considering the limitations of single visual navigation, integrating IMU (inertial measurement unit) information and visual localization results can be a good solution. This combination can provide complementary information and enhance the overall performance of the navigation system.
This paper proposes a deep-learning-based UAV localization method to address the navigation problem in autonomous UAV landing. The simulation and experimental results demonstrate that the proposed algorithm exhibits good robustness, accuracy, and real-time performance. These findings suggest that the algorithm can be used effectively for autonomous UAV landing.
The main contributions of this paper are as follows:
(1) A runway line detection and visual positioning system during visual guidance landing is constructed. The system is divided into four parts: runway ROI selection, runway line detection, visual positioning and combined navigation, thereby providing an end-to-end navigation solution for UAV visual guidance landing.
(2) In view of the requirements of navigation accuracy and real-time performance in this application scenario, the image processing end algorithm is optimized and designed, including optimizing the loss function, optimizing the feature extraction network and feature fusion network, adding an attention mechanism, and optimizing the network structure.
(3) In order to further improve the visual positioning accuracy, the Kalman filter algorithm is used to fuse the IMU information and the visual positioning information. The simulation results show that the combined navigation algorithm can effectively improve the positioning accuracy.
The rest of this paper is organized as follows: The
Section 2 of the paper presents the framework of the visual-guided landing localization algorithm. The
Section 3 focuses on the runway ROI selection algorithm in the visual landing detection, while the
Section 4 explores the runway line detection algorithm for airport runways within the visual landing detection. The
Section 5 discusses the visual localization and combined navigation algorithm. Finally, the
Section 6 describes the deployment of the algorithm on edge computing devices and presents the results of simulation experiments.
2. Vision-Guided Landing Positioning Algorithm Framework
This paper focuses on visual-guided landing for small fixed-wing unmanned aerial vehicles (UAVs). The main research objectives are image processing algorithms and visual localization algorithms for UAV landing. The specific research content includes runway ROI selection networks, runway line detection networks, UAV position estimation algorithms, and combined navigation algorithms. The effectiveness of these algorithms is then validated in a constructed simulation system. The overall algorithm framework is illustrated in
Figure 1.
The algorithm system designed in this paper consists of two main components: the image processing side and the pose estimation combined navigation side. In the image processing side, the images captured by the camera are processed. The YOLOX runway ROI selection network is used to perform rough positioning of the runway lines, which helps exclude interfering objects and ensures that the runway is evenly distributed in the image. The detected bounding boxes are then input into the runway line detection network (RLDNet) for line detection. In this stage, instead of using image segmentation techniques, a specific row (or column) classification method is employed, reducing computational complexity and improving real-time inference. The line detection outputs information such as the slope and intercept of the runway lines. The pose estimation and combined navigation side mainly consist of two algorithms: visual localization and combined navigation. The visual localization algorithm utilizes prior information about the runway, camera intrinsic parameters, UAV attitude information, and information obtained from the detected runway lines to estimate the UAV’s position. The visual localization algorithm in this study utilizes a vision-based localization algorithm based on the concept of homogeneous transformation. The position information derived from visual localization is fused with the position information obtained from the IMU using Kalman filtering, resulting in accurate and reliable localization results.
The algorithm designed in this study is applicable to UAV visual landing guidance scenarios. However, most currently available datasets that include airport runways are composed of aerial images, such as the runway images in NWPU-RESISC45, which are all remote sensing aerial images. These datasets cannot meet the design requirements of the algorithm proposed in this study. Therefore, this study combines runway images from the landing perspective in the virtual simulation system Vega Prime, real airport runway images, affine-transformed NWPU-RESISC45 runway images, and simplified runway images. After manual processing and annotation, these images form the dataset used in this study, which includes a total of 2500 landing perspective runway images. Additionally, for the purpose of simulating and validating the visual positioning algorithm and combined navigation algorithm, the dataset collected from the virtual simulation system Vega Prime includes UAV’s real poses and IMU data corresponding to the images. The four types of airport runway images included in this dataset are shown in
Figure 2.
Furthermore, in order to save annotation time, this study initially annotates the left and right runway lines, as well as the starting runway line. Then, by calculating the position of the rough runway localization box based on the annotated pixel coordinates of the runway line endpoints, the airport runway rough localization dataset is automatically generated. After cropping the original image using the rough localization box, the resolution is adjusted to generate the airport runway line dataset. This allows for the simultaneous generation of datasets for both the airport runway rough localization and the airport runway line detection tasks through a single annotation process.
In order to effectively utilize the dataset and validate the algorithm’s performance, this study divided the constructed runway rough localization dataset and runway line detection dataset into training and testing sets using a ratio of 4:1. The training set is utilized to train the runway rough localization network and runway line detection network. The testing set is then used to evaluate the prediction performance of the rough localization network and runway line detection network, as well as to calculate their performance metrics. This division helps with the optimization and adjustment of the networks.
3. Airport Runway Rough Localization Algorithm
As a general-purpose object detection framework, YOLOX has high detection accuracy and speed for most object detection applications. However, due to the higher requirements for image processing accuracy and real-time performance in the scenario of this study, further optimization and design of YOLOX are needed.
3.1. Design of Probability Prediction Loss Function
The probability prediction loss of YOLOX,
, is calculated using the binary cross-entropy loss [
12]. For a given sample, the binary cross-entropy loss is computed as Equation (
1).
where
represents the ground truth and
represents the predicted value. For all samples, the binary cross-entropy loss function value is the average of the loss function values for all positive and negative samples. The calculation method is as Equation (
2).
where
N represents the total number of positive and negative samples.
Focal Loss is a solution for addressing the issue of sample imbalance [
13]. Its calculation method is as Equation (
3).
The expression for Focal Loss can be uniformly represented as Equation (
5).
where
reflects the degree of proximity between the predicted value and the ground truth. The larger the value of
, the closer the predicted value is to the ground truth, indicating a more accurate classification. Where
is an adjustable factor. Similarly, the expression for the binary cross-entropy loss function can be uniformly represented as Equation (
6).
Compared to the binary cross-entropy loss function, Focal Loss does not modify the loss function value for inaccurately classified samples, while reducing the weight of the loss function value for accurately classified samples. This ultimately increases the weight of inaccurately classified samples in the overall loss function.
The calculation method of the Focal Loss loss function used in the training process is as Equation (
7).
That is, in the traditional Focal Loss, a coefficient is introduced, and , . At this point, the model accuracy will slightly improve. In all subsequent experiments in this study, the binary cross-entropy loss is replaced with Focal Loss by default.
3.2. Design of Regression Loss Function
YOLOX calculates the position regression loss for predicting bounding boxes and ground truth boxes using the IoU loss. When the predicted box and the ground truth box do not intersect, the IoU loss function cannot reflect the distance between the predicted box and the ground truth box. In this case, the loss function is non-differentiable, making it unable to optimize the scenario where the two boxes do not intersect. Therefore, this paper replaces the calculation method of the position regression loss with EIoU. The EIoU loss, which reduces the contribution of a large number of anchor frames with less overlap area with the target frame to the predictor frame regression, makes the regression of the predictor frame more focused on high-quality anchor frames. The EIoU is calculated as Equation (
8).
where
denotes the centroid loss,
is the width loss,
is the height loss, and
and
are the widths and heights of the smallest outer bounding box containing the prediction box and the target box.
3.3. Design of Feature Extraction Network
The feature extraction network in YOLOX is a multi-branch residual structure called CSPDarknet53. Since the algorithm in this paper needs to be deployed on edge devices, in order to further compress the model parameter size while improving model accuracy, the feature extraction network of YOLOX is replaced with EfficientRe [
14], GhostNet [
15], MobileNetV3-Large, and MobileNetV3-Smallin [
16] in separate experiments. The performance of different feature extraction networks is tested, and the experimental results are shown in
Table 1.
From the experimental results, it can be seen that for the dataset used in this paper, introducing EfficientRep does not significantly improve the model’s performance. On the contrary, the introduction of EfficientRep leads to a significant increase in model size and computational complexity. When using MobileNetV3-Large as the feature extraction network, the model’s performance is significantly improved; however, the trade-off is a substantial increase in both model size and computational complexity. As a comparison, this paper uses MobileNetV3-Small as the feature extraction network, which significantly reduces the model size and computational complexity. Although is slightly higher than EfficientRep, it is much lower than MobileNetV3-Large. When using GhostNet as the feature extraction network, there is a significant improvement in , recall, and precision. In terms of parameter size, the model is comparable to using CSPDarknet53 as the feature extraction network, but the computational complexity is substantially reduced. Given the requirements of the application scenario in this paper regarding model performance and computational complexity, from this point forward, the experiments in this paper default to using GhostNet as the feature extraction network.
In addition, the feature extraction network in YOLOX includes the SiLU activation function. As the network deepens, models that use the SiLU activation function tend to experience a noticeable decrease in classification accuracy. In this paper, on the basis of the YOLOX network structure, the SiLU activation function is replaced with the Mish activation function. With the deepening of the network, the Mish activation function can still maintain a higher classification accuracy. Equation (
9) is the expression of the Mish activation function [
17].
3.4. Feature Fusion Networks and Channel Attention Mechanisms
In this paper, the ordinary convolutions in the YOLOX feature fusion network are replaced with group shuffle convolution (GSConv). GSConv reduces the model’s parameter count while preserving the connections between channels in the feature layers, ensuring that the model’s accuracy is not compromised [
18,
19]. After the feature layers go through ordinary convolutions, GSConv applies depth-wise separable convolutions, and then, concatenates the feature layers before the depth-wise separable convolutions in the channel direction. Finally, the shuffle structure is used to fuse the feature layers from both ordinary convolutions and depth-wise separable convolutions. Additionally, if GSConv is used throughout the entire model, the model will become deeper and may have an impact on real-time performance. Therefore, in this paper, only the ordinary convolutions in the YOLOX feature fusion network are replaced with GSConv, specifically, replacing the BottleNeck in CSPLayer with GSBottleNeck.
Attention mechanism can allocate computational resources to more important tasks without increasing computational complexity significantly, especially when resources are limited.
The efficient channel attention (ECA) mechanism builds upon the SE channel attention mechanism [
20] by replacing the fully connected layer with a
convolutional layer. This allows for learning the weight information between channels without reducing the channel dimension, and it also helps reduce the number of parameters [
21]. The ECA mechanism first applies global average pooling to the input feature layer to obtain
-dimensional feature maps. Then, through
convolutional operations, it learns the importance of different channels.
The size of the convolutional kernel affects the receptive field, and larger convolutional kernels are needed for feature layers with a larger number of channels. Therefore, the kernel size can be dynamically adjusted using a function. The calculation method for the convolutional kernel is as Equation (
10).
In this context, k represents the number of channels in the convolutional kernel, C represents the number of channels in the input convolutional layer, and indicates that the size of the convolutional kernel must be an odd number.
In this paper, we added the channel attention mechanism ECA in the middle of YOLOX’s feature extraction network and PAFPN.
4. Airport Runway Line Detection Algorithm
4.1. Detection Principle
In this paper, the idea behind designing the runway line detection algorithm is to select the correct location of the left and right runway lines in a predefined row anchor box and the start location of the runway line in a predefined column anchor box using global features. Therefore, the first step is to partition the input image into row anchor boxes and column anchor boxes. Then, each row and column anchor box is further divided into grid cells. In this way, runway line detection can be defined as selecting specific cells within the predefined row/column anchor boxes to represent the positions of the left and right runway lines and the starting runway line.
Assume the maximum number of runways is
C, the number of row anchor boxes is
h, and the number of grid cells in each row/column anchor box is
w, and let
X denote the global features of the image. Let
represent the classifier for the runway line positions on the
ith row/column anchor box of the
jth runway. Then, the prediction of the runway line can be expressed as Equation (
11).
where
,
,
is an S−dimensional vector that represents the probability of the Nth grid of the Mth runway line; F denotes the global features of the image, and it is a
−dimensional vector. It represents the probability of the
th grid cells for the
ith runway line. For each grid in every row/column anchor box, the network predicts the probability of the corresponding grid. Thus, the grid with the highest probability represents the predicted position of the runway line. If no runway line is predicted on a particular row/column anchor box, then the probability of the last grid in that anchor box is set to 1.
4.2. Network Structure
The network architecture consists of three parts: feature extraction, classification prediction, and segmentation. The feature extraction part is responsible for extracting the features of the runway lines from the image. The classification prediction part is used to classify these features, while the segmentation part helps to fuse multi-scale features, improving the detection accuracy. To improve the network’s inference speed, the segmentation part is only used during training and not utilized during the inference prediction stage [
22]. The network structure is illustrated in
Figure 3.
The role of the feature extraction part is to extract the features of the runway lines and provide them to the classification prediction part. Common feature extraction networks, such as ResNet, VGG, MobileNet, ShuffleNet, have been proven to exhibit strong feature extraction capabilities for classification tasks. In this algorithm, ResNet is used as the feature extraction network. ResNet is a type of residual network that addresses the problem of increased loss with increasing network depth [
23]. Considering the need for extracting a relatively limited set of features and the requirement for real-time processing on board computers, the algorithm utilizes the lightest variant of ResNet, which is ResNet18 (18 represents the number of layers that require parameter updating through training).
In the classification prediction, the last feature layer of ResNet18 is initially downsampled by the convolutional operation, reducing the number of channels. Then, the resulting feature layer is flattened into a column, resulting in a dimension of
. Next, the feature layer dimension is transformed to
using a fully connected layer and the ReLU activation function. Finally, the dimension of the fully connected layer is reshaped to
using the reshape operation. In this context,
represents the number of grid cells for each row/column anchor box,
h represents the number of row/column anchor boxes, and 3 corresponds to the total number of runway lines. And the condition in Equation (
12) needs to be satisfied.
Performing softmax on each row/column anchor box for the three runway lines can compute the grid with the highest probability within each anchor box. This is used as the predicted position of the track line and is utilized for calculating classification loss, structural loss, and association loss.
In the segmentation network, the last three feature layers of ResNet18 are first subjected to convolution and upsampling operations. These three feature layers are then concatenated along the channel dimension. Subsequently, convolution is applied to reduce the number of channels in the feature layer to four, these are used for calculating the segmentation loss.
4.3. Loss Function
The classification loss during the network training process can be represented as Equation (
13).
Here, represents the cross-entropy loss, and represents the ground truth of the track line position on the jth row/column anchor box of the ith track line.
In addition to the classification loss, several other loss functions are used in the algorithm based on the structural prior information of the runway lines. These loss functions are utilized to represent the position relationships of the runway lines, allowing the neural network to learn the structural information of the runway lines. Since each track line must be continuous, the predicted points of the runway lines in adjacent row/column anchor boxes should be as close as possible. Therefore, the continuity of the predicted runway lines can be achieved by constraining the distribution of the classification vectors on adjacent row anchors. The loss function can be represented as Equation (
14).
Here, represents the predicted track line position on the jth row/column anchor box of the ith track line, represents the ground truth of the track line position on the th row/column anchor box of the ith track line. In the loss function, the distance between the predicted track line positions and the ground truth is minimized through an -norm constraint.
Additionally, based on the prior information that each track line is a straight line, the predicted track points can be constrained using second-order differences. The formula for the second-order difference can be represented as Equation (
15).
Here,
represents the predicted point on the
jth row/column anchor box of the
jth track line, and its calculation method is given as Equation (
16).
Here,
represents the probability of the
ith track line in the
kth grid of the
jth row/column anchor box, and its calculation method is given as Equation (
17).
Based on the above, the overall structural loss of the network can be represented as Equation (
18).
In addition to the classification loss and structural loss, this paper incorporates an auxiliary segmentation task that utilizes multi-scale features for local feature modeling during the training process. The auxiliary segmentation loss is calculated using the cross-entropy function. To improve the performance of the algorithm, this segmentation task is removed during the testing phase.
Real runway lines are parallel to each other, but due to perspective, the left and right runway lines in the image become closer as they move upward. Based on this prior condition, this paper designs the association loss for the runway lines. The design logic is as follows: if the left and right runway lines above the image are farther apart compared to the left and right runway lines below, a loss is generated; otherwise, no loss is generated. The calculation process of the association loss is shown in Algorithm 1, where
represents the tolerable error threshold, which is defined in terms of vertical grids.
Algorithm 1 Correlation Loss Calculation Process |
Input: The predicted track line point on the row anchor box of the track line: |
. |
Output: The value of the association loss: |
1: for to do |
2: |
3: |
4: |
5: |
6: ; |
7: end for |
In summary, the overall loss of the algorithm can be represented as Equation (
19).
Here, represents the classification loss, represents the structural loss, represents the segmentation loss, and represents the association loss. , , , and represent the weights assigned to the classification loss, structural loss, segmentation loss, and association loss, respectively.
4.4. Evaluation
In order to ensure the stability of distance calculation and accurately reflect the differences between the ground truth and predicted points, this paper first uses the predicted points on the runway lines for least squares fitting to obtain the slope of the fitted line (
k). This further allows us to calculate the distance threshold (
) between the predicted and ground truth points. The calculation method is as Equation (
20).
where RealDistance represents the pixel distance between the predicted points and the ground truth points in the horizontal or vertical direction. Considering the actual angle between the left/right runway lines and the
x-axis of the pixel coordinate system is close to 90 degrees, and the angle with the
y-axis is smaller; the starting track line is close to 90 degrees with the
y-axis, and the angle with the
x-axis is smaller. Therefore, when calculating the slope (k) of the line, the left/right runway lines adopt the line equation ’
’, and the starting track line adopts the line equation ’
’. This means that the angle with the
y-axis of the pixel coordinate system is used when calculating the threshold for the left/right runway lines, and the angle with the
x-axis is used when calculating the threshold for the starting track line.
Since the runway lines predicted by the neural network are obtained by fitting the grid points on the predicted track line using the least squares method, the least squares method reduces the impact of prediction errors on track line predictions to a certain extent. Therefore, to evaluate the accuracy of the track line predictions, it is necessary to quantitatively calculate the similarity between the predicted and ground truth runway lines. The evaluation metrics include accuracy, miss rate, and over-detention rate. Accuracy represents the similarity in slope between the predicted and ground truth runway lines, and its calculation method is as Equation (
21).
Additionally, the miss rate represents the proportion of the dataset that has ground truth runway lines but no corresponding predicted results. The over-detection rate represents the proportion of the dataset that has predicted runway lines but no corresponding ground truth.
7. Conclusions
In recent years, UAVs have been used in various fields on a large scale. Landing as an important stage of flight, realizing unmanned autonomous landing is of great significance for UAV intelligence. This paper takes the visual navigation algorithm for autonomous UAV landing as the research purpose, constructs an end-to-end visual guidance landing navigation system, and optimizes the detection algorithm at the image processing end, and fuses the IMU information and visual localization data at the localization end by combining the navigation algorithms, in response to the requirements of this paper’s application scenarios for accuracy and real-time. The innovations of this paper are as follows:
(1) To meet the requirements of UAV visual-guided landing, a deep-learning-based system for runway ROI detection, runway line detection, visual localization, and combined navigation is constructed.
(2) The paper optimizes the runway ROI detection algorithm and runway line detection algorithm to meet the navigation accuracy and real-time performance requirements in the application scenario.
(3) To further improve visual localization accuracy, the paper utilizes the Kalman filtering algorithm to fuse IMU information and visual localization results.
Simulation and experimental results demonstrate the significant advantages of the proposed algorithms in terms of detection accuracy, real-time performance, and generalization ability. The paper provides a reliable solution for the visual navigation problem in UAV landing.