1. Introduction
The advent of Unmanned Aerial Vehicles (UAVs), commonly known as drones, has had a profound impact on multiple industries, including weather monitoring, package delivery, and search and rescue. These high-potential applications have made drone technology an active area of research in recent years [
1,
2,
3,
4]. Notably, the functionality of mini-drones in indoor environments brings unique challenges in terms of navigation and localization due to the absence of Global Positioning Systems (GPS) [
5], GLONASS [
6] and Galileo [
7] signals, leading to the dependence on alternate localization methods such as Visual Simultaneous Localization and Mapping (V-SLAM) technologies [
8].
In the constantly evolving field of Visual Simultaneous Localization and Mapping (V-SLAM), a comprehensive understanding of the most current algorithms is critical for both the technical implementation and the historical context. In a detailed examination of this arena, Kazerouni et al. [
9] encapsulated the state of the art in V-SLAM, offering a robust comparison of recent algorithms and their performances [
10,
11].
Their research underlines the inherent challenges faced by V-SLAM algorithms, chiefly among them the intensive data processing demands and the reliance on camera-based inputs. These factors collectively contribute to slower computational speeds, which can potentially hinder real-time drone localization and navigation applications. Hence, they highlight the pressing need for a fast and efficient V-SLAM algorithm, one that is capable of short computation times while maintaining high performance accuracy; for example, in [
12], Grubesic discusses the challenges faced by Visual Simultaneous Localization and Mapping (V-SLAM) algorithms, particularly in the context of urban spatial analysis using Unmanned Aerial Vehicles (UAVs). The authors emphasize two main challenges: the intensive data processing demands and the reliance on camera-based inputs. These factors contribute to slower computational speeds, which can hinder real-time drone localization and navigation applications. This paper delves into the specific computational challenges, explores existing V-SLAM algorithms, and highlights the need for faster and more efficient algorithms to ensure real-time performance while maintaining high accuracy. In [
13], Hassanalian provides an overview of various aspects of drones, including classifications, applications, and design challenges. In the context of V-SLAM, the authors discuss the challenges associated with computational demands and camera-based inputs. They emphasize the need for V-SLAM algorithms that can deliver fast computation times without compromising performance accuracy. This paper covers different design considerations and strategies to address these challenges, such as hardware optimization, algorithmic improvements, or sensor fusion techniques. In [
14], Shi focuses on the development of an anti-drone system that incorporates multiple surveillance technologies; while the paper does not directly address V-SLAM algorithms, it discusses the challenges posed by drones, including their reliance on camera-based inputs and the computational demands of V-SLAM for accurate localization. The authors highlight the importance of efficient and fast V-SLAM algorithms in the context of anti-drone systems. Additionally, the paper explores the implementation challenges and considerations associated with integrating multiple surveillance technologies to counter drones effectively.
Subsequent studies have taken up this call to action, leading to the introduction of several fast V-SLAM algorithms. For example, in [
15], Zou provides a survey of collaborative Visual SLAM techniques for multiple agents. It explores the challenges and solutions related to the simultaneous mapping and localization of multiple agents in a shared environment. The author discusses various approaches and algorithms that enable multiple agents to collaboratively build a consistent map and estimate their individual poses in real time. The survey covers both centralized and decentralized methods and discusses the advantages and limitations of each approach. In [
16], Vidal introduces a novel approach to Visual SLAM that combines events, images, and Inertial Measurement Unit (IMU) data to achieve robust and accurate localization and mapping in challenging scenarios. The author explores the use of event cameras, which are capable of capturing changes in brightness over time, in conjunction with traditional cameras and IMU sensors. This combination allows for improved performance in High Dynamic Range (HDR) and high-speed scenarios, where traditional cameras struggle due to motion blur or limited dynamic range, and in [
17], Yang focuses on the development of a monocular vision-based SLAM system for Unmanned Aerial Vehicles (UAVs) to achieve autonomous landing in emergency situations and unknown environments. The author proposes an approach that utilizes a single onboard camera to estimate the UAV’s pose and simultaneously build a map of the landing area. By leveraging SLAM techniques, the system enables UAVs to safely and autonomously land even in challenging or unfamiliar environments where traditional landing systems are not feasible. These papers contribute to the field of SLAM by addressing specific challenges and proposing innovative approaches for navigation, mapping, and localization in various scenarios, including indoor environments, multi-agent systems, challenging lighting conditions, high-speed scenarios, and emergency situations. These have been broadly categorized into two primary methods: feature-based approaches and direct methods.
Feature-based approaches identify and use specific features in the environment to estimate the position and orientation of the drone. This technique can provide accurate position estimates but often faces challenges in environments with few distinctive features, such as bare walls or open spaces. For example, in [
18], Wang focuses on the development of a visual odometry technique for Unmanned Aerial Vehicles (UAVs) operating in indoor environments. The author proposes a multi-feature-based approach, which identifies and utilizes multiple distinctive features in the environment to estimate the position and orientation of the drone. By leveraging multiple features, the method aims to improve the accuracy and robustness of the position estimation process and discusses the algorithmic details, feature selection, and experimental evaluation of the proposed approach. Furthermore, in [
19], Al-Kaff provides a survey of computer vision algorithms and applications specifically tailored for Unmanned Aerial Vehicles (UAVs). The author explores various computer vision techniques and algorithms that are commonly used for UAV applications, including feature-based approaches for position and orientation estimation, and presents an overview of different feature-based algorithms, their advantages, limitations, and potential applications in UAV scenarios. In [
20], Trujillo focuses on the development of a monocular visual Simultaneous Localization and Mapping (SLAM) system based on a cooperative UAV–Target system. The author proposes a cooperative framework where a UAV and a target object work together to improve the accuracy and robustness of visual SLAM. The system utilizes a monocular camera onboard the UAV to estimate the relative position and orientation of the target object in real time and describes the system architecture and the algorithmic details, and presents experimental results to evaluate the performance of the proposed approach.
On the other hand, direct methods make use of pixel intensity values directly from the images captured by the drone’s camera, avoiding the feature extraction step. This allows for more efficient computation, but the performance can be affected by changes in lighting conditions or rapid motion.
Among these approaches, two V-SLAM algorithms, namely Oriented FAST and Rotated BRIEF SLAM (ORB-SLAM2) and Semi-Direct Monocular Visual Odometry (SVO), have gained significant attention due to their unique capabilities and limitations in indoor drone localization. The subsequent sections of this paper will further elaborate on these two algorithms and propose a novel method for combining them to achieve improved localization accuracy.
1.1. Related Work
In their comprehensive study, Kazerouni et al. [
9] meticulously analyzed the latest V-SLAM algorithms, considering both their historical development and technical nuances, thereby encapsulating the current cutting-edge technology in this field. Their research underlined a fundamental drawback with these V-SLAM methods: the heavy reliance on camera inputs and demanding data processing tasks tend to slow down the algorithms’ operation. As such, they postulated the need for a more expedient V-SLAM algorithm, characterized by reduced computational timeframes, which could offer significant advantages. Recently released fast V-SLAM algorithms were compared and categorized in [
21,
22]; the following are two V-SLAM examples, namely feature-based approaches and direct methods.
1.1.1. Feature-Based Methods
Feature-based methods [
23], such as ORB-SLAM, extract the important details from each frame of the images, such as blobs and corners. The mapping and localization are then accomplished using the positions of each feature in the current and previous frames. Artal et al. provide one of the fastest algorithms with feature-based methods (
Figure 1) [
24].
1.1.2. Direct Methods
Direct methods [
25], such as Large-Scale Direct SLAM (LSD-SLAM), employ the entire amount of data in the image rather than just the features, giving it superior robustness and accuracy compared to feature-based methods, but requiring a higher computational effort in comparison to ORB-SLAM (
Figure 1) [
26].
Figure 1.
Feature-based methods Abstract visuals to highlight observations and exclude all extraneous data. In contrast, the suggested direct method maps and follows picture intensities directly [
27].
Figure 1.
Feature-based methods Abstract visuals to highlight observations and exclude all extraneous data. In contrast, the suggested direct method maps and follows picture intensities directly [
27].
1.1.3. Recent Works
In the scholarly work of Guanci et al. [
28], they introduce ORB-SLAM2, a variant that stands out for its lower computational cost compared to its predecessor, ORB-SLAM. Moreover, it exhibits commendable localization precision and has the capacity to function in real time without necessitating GPU processing. Concurrently, the SVO algorithm, brought forth by Forster et al. [
29], ingeniously amalgamates the strengths of both feature-based and direct methods, thereby enhancing its operational efficiency [
29]. Additionally, researchers demonstrate in [
30] that SVO is up to ten times faster than LSD-SLAM and ORB-SLAM. It cannot produce maps, however, because only the last 5–10 frames are accessible due to fast processing and memory reduction [
30,
31].
In their work, Loo et al. [
32] explore strategies to mitigate SVO’s shortcomings when detecting features during rapid movements, implementing a concept known as preceding motion. Preceding motion leverages the features from previous images to predict the current motion; however, SVO continues to face challenges in feature-poor environments. Meanwhile, ORB-SLAM incorporates an in-built error correction mechanism based on features intrinsic to the ORB algorithm [
33]. This affords it robust map generation capabilities. However, ORB-SLAM’s ability to identify features in dynamic settings remains suboptimal. Nonetheless, the unique strengths of ORB-SLAM2 and SVO lend them to being suitable for a diverse array of scenarios and conditions.
Thus, the proposed V-SLAM method presented in this research is a novel approach that aims to address the limitations of existing methods by combining two popular techniques: Semi-Direct Visual Odometry (SVO) [
29] and ORB-SLAM2 [
24]. By integrating these two methodologies, the method takes advantage of their individual strengths to improve the accuracy of position estimation for drones operating in indoor environments.
SVO is a visual odometry algorithm that utilizes direct image alignment to estimate camera motion. It excels in fast motion estimation and can handle challenging lighting conditions. However, it struggles with accurate initialization and robustness in feature-poor environments. On the other hand, ORB-SLAM2 is a feature-based visual SLAM system that relies on ORB features for localization and mapping. ORB-SLAM2 provides robustness in feature-rich environments and accurate initialization, but it has slower performance and decreased accuracy in fast-motion scenarios.
The proposed method combines SVO and ORB-SLAM2 to leverage their complementary strengths and overcome their individual limitations. This fusion is achieved through the integration of an Adaptive Complementary Filter (ACF) [
34], which intelligently merges the data generated by both algorithms. The ACF runs SVO and ORB-SLAM2 in parallel and fuses their data through a weighted average. The weights are determined based on error estimation, corresponding to the quantity of features detected in each frame. By synchronizing the data from both algorithms according to their timestamps, the proposed method ensures accurate and consistent merging through the ACF.
This integration of SVO and ORB-SLAM2 through the ACF results in an enhanced V-SLAM system that adaptively adjusts the influence of each algorithm based on their performance in different environmental conditions. The method dynamically assigns appropriate weights to the position estimates from SVO and ORB-SLAM2, allowing for improved accuracy in estimating the position of drones in indoor environments.
In comparison to the state-of-the-art V-SLAM methods, the proposed solution offers several distinct advantages. It combines the strengths of SVO and ORB-SLAM2, providing accurate motion estimation and robust feature extraction. The ACF facilitates adaptive data fusion, optimizing the influence of each algorithm based on its performance and the number of features detected. This leads to enhanced accuracy and reliability in position estimation.
Regarding the differences from the referenced papers, the proposed solution focuses specifically on indoor mini-drone localization. It introduces the integration of SVO and ORB-SLAM2 using the ACF, a unique approach not explicitly mentioned in the references. Additionally, the custom-designed mini-drone used in the Gazebo simulator software demonstrates the effectiveness of the proposed solution in realistic scenarios.
Overall, the proposed solution presents a novel and effective approach to address the challenges of drone localization in indoor environments. By combining SVO and ORB-SLAM2, leveraging the ACF for data fusion, and utilizing a custom-designed mini-drone (see
Figure 2), the method offers enhanced accuracy and adaptability, improving the positioning performance and reliability of mini-drones in various indoor scenarios.
The rest of the paper is organized as follows:
Section 1 discusses the literature,
Section 2 contains the materials and methods,
Section 3 describes the results,
Section 4 contains a discussion, and
Section 5 presents the conclusions.
2. Materials and Methods
In the current study, we propose a system for mini-drone navigation, which is bifurcated into two main components: V-SLAM and a controller (refer to
Figure 3). Within the V-SLAM segment, two parallel threads are employed that utilize the SVO and ORB-SLAM2 algorithms, respectively (see
Figure 4). The SVO algorithm is primarily responsible for estimating the mini-drone’s position, while ORB-SLAM2 caters to both localization and mapping tasks [
35]. Subsequently, data extracted from both ORB-SLAM2 and SVO are fused via a weighted average approach. The weighting factor assigned to each data set depends inversely on the error magnitude, such that the higher the error related to a specific data set, say from SVO, the smaller its corresponding weighting factor used in the Adaptive Complementary Filter (ACF) [
36,
37]. For the trajectory tracking within the controller component, we employ a PID-based controller [
38].
2.1. ORB-SLAM
Oriented FAST and Rotated BRIEF (ORB) SLAM, particularly its second iteration, ORB-SLAM2, has emerged as a notable method in the domain of visual SLAM due to its comprehensive and robust performance. This technique employs ORB features, which are computationally efficient and invariant to scale and rotation changes. ORB-SLAM2 is characterized by its three-threaded architecture comprising tracking, local mapping, and loop closing threads (
Figure 4). This approach ensures the real-time operation of the system, making it appealing for UAV applications. The tracking thread utilizes a motion model and performs frame-to-frame tracking, the local mapping optimizes newly observed features and creates a consistent local map, and the loop closing thread is responsible for detecting and correcting large-scale drift by identifying full loop closures, while ORB-SLAM2 is highly effective in a variety of conditions, it can encounter challenges in low-texture or low-light environments where feature detection becomes difficult. This shortcoming can lead to tracking losses, resulting in substantial localization errors [
31].
2.1.1. Tracking
The tracking component serves a dual purpose: it locates the camera for each frame and discerns when to incorporate a new keyframe into the system. Keyframes are integral images archived within the system, serving as repositories of valuable informational cues critical for accurate localization and effective tracking [
33]. In scenarios where tracking is compromised—for instance, due to obstructions or abrupt movements—the system re-establishes matches with local map points via reprojection, a process involving the calculation of distance between a detected pattern keypoint in a calibration image and its corresponding world point projected onto the same image [
39]. This facilitates the optimization of the camera pose with all the matches. Ultimately, it is the tracking thread’s responsibility to determine when a new keyframe is warranted for insertion.
2.1.2. Local Mapping
Local mapping constitutes a critical component within the ORB-SLAM algorithm. This process is primarily responsible for processing new keyframes, which are images stored within the system that contain informative cues for localization and tracking. The objective is to optimize the 3D reconstruction in the vicinity of the camera pose. To achieve this, local mapping searches for new correspondences among unmatched ORB features in the new keyframe, using connected keyframes in the covisibility graph as references, and subsequently triangulates new map points. Furthermore, the local mapping mechanism applies an exacting culling policy post-creation. This policy is informed by the data procured during the tracking process and is designed to preserve only high-quality map points. This culling not only applies to map points but also to the keyframes themselves, enabling the local mapping module to identify and discard redundant keyframes. By doing so, ORB-SLAM ensures an efficient and consistent local map that contributes to robust and reliable localization and mapping [
32,
33].
2.1.3. Loop Closing
Loop closing represents another vital component within the ORB-SLAM algorithm, tasked with the detection and correction of long-term drift in the estimated trajectory and the accumulated map. In essence, this component identifies when the camera returns to a previously visited location, a scenario commonly known as a “loop”. This detection is primarily achieved through a bag of visual words method to recognize similar scenes, supplemented by pose graph optimization techniques to ensure consistent trajectory and map correction. If a loop is detected, ORB-SLAM proceeds to correct the entire map and adjusts the estimated camera trajectory. This process involves the identification and correction of any false positive loop closures, ensuring that only robust and reliable corrections are made. The implementation of loop closing not only improves the accuracy of the map and the trajectory but also aids in maintaining the scalability of the system, making it a critical element of the overall ORB-SLAM pipeline [
31].
2.2. Semi-Direct Visual Odometry
SVO presents an innovative approach in the realm of visual odometry, aiming to bridge the gap between feature-based and direct methods. Contrary to traditional methods which typically rely on tracking distinct features in an image, SVO takes advantage of the advantages of both feature-based methods and the direct extraction of depth information from pixels, thus optimizing computational resources [
29].
Initially, the SVO algorithm selects a small subset of corner features to initialize its 3D map and estimate camera movement from frame to frame. Then, instead of extracting features from the entire image, SVO directly aligns pixel intensities from these selected parts to a depth map, which allows for more efficient and faster pose estimation. In this way, the SVO algorithm leverages the information contained within the pixel intensity and the depth map to generate accurate estimations of camera motion. Consequently, the algorithm exhibits superior performance in scenarios characterized by rapid motion or limited features [
30].
However, similar to other V-SLAM methodologies, SVO is not exempt from limitations, with performance potentially degrading in environments with sparse features or under poor lighting conditions. Therefore, the question arises as to how we can harness the strengths of both SVO and ORB-SLAM2 to achieve an even more robust and efficient V-SLAM system [
40,
41].
2.3. SVO-ORB-SLAM
To address the limitations of both SVO and ORB-SLAM2 and to leverage their unique strengths, we propose combining these two methodologies using an Adaptive Complementary Filter (ACF). The idea is to run SVO and ORB-SLAM2 in parallel and then fuse the data they generate through a weighted average, as depicted in
Figure 4 and Algorithm 1. The first thread is dedicated to data weighting from each of the algorithms—ORB-SLAM2 and SVO. This weighting is handled by the ACF and is founded on an error estimation corresponding to the quantity of features detected in each frame. The feature detection is conducted using the Features from Accelerated Segment Test (FAST) corner-detection algorithm in SVO and the ORB feature-detection algorithm in ORB-SLAM2.
However, before these estimations are undertaken, it is crucial that the data from each of these processes are synchronized according to their timestamps. This ensures the data from the two sources can be accurately and consistently merged via the ACF. By adhering to this procedure, the proposed solution adeptly combines the strengths of both SVO and ORB-SLAM2, using the ACF to dynamically adjust the influence of each based on their performance in different environmental conditions. This results in an enhanced V-SLAM system capable of achieving superior localization accuracy for mini-drones under varying conditions.
Algorithm 1 SVO-ORB-SLAM algorithm. |
function SVO-ORB-SLAM() |
Step 1: Capture the image from the cameras. |
Step 2: Estimate and . |
Step 3: Calculate and based on number of features. |
|
|
Step 3: Calculate based on errors. |
|
for each image frame i do |
Update the position of the and |
|
end for |
return |
end function |
2.4. ACF
This paper builds upon ACF because it involves less computation for the data fusion of SVO and ORB-SLAM2 compared to similar methods based on the Kalman filter [
42].
The complementary filter is a simple yet effective technique for fusing two or more sensor data sources to achieve a more accurate and reliable output. It is commonly used to combine data from low-pass and high-pass filters to extract the best features from each data source. The general formula for a complementary filter with two data sources is as follows: ‘output = alpha ∗ (source 1) + (1 − alpha) ∗ (source 2)’.
Here are the formulas and definitions used in the complementary filter for fusing SVO and ORB-SLAM position estimates:
Here, and represent the number of features detected by the SVO and ORB-SLAM algorithms, respectively. We calculate the error for each algorithm as the inverse of the number of features. The assumption is that more features result in a lower error and fewer features result in a higher error. Here, alpha is a constant value between 0 and 1 that determines the relative contribution of each data source to the final output. Source 1 and source 2 are the data sources being fused.
2. Alpha (
) calculation:
Alpha is a coefficient that determines the relative weighting of the SVO and ORB-SLAM algorithms. It is calculated by dividing the error of ORB-SLAM by the sum of errors of both algorithms. Alpha ranges from 0 to 1. The weight of SVO = 1 , and the weight of orb = . The sum of the weights is always 1.
3. Fused position calculation:
Here,
is the fused position,
is the position from the SVO algorithm (x, y, z) and
is the position from the ORB-SLAM algorithm (x, y, z). The alpha value should be chosen based on the desired weighting of the two position estimates in the final output; see
Figure 5. The fused position is calculated as a weighted sum of the position estimates from the SVO and ORB-SLAM algorithms. The weights (weight of SVO and weight of ORB) are determined by the errors of the algorithms; see
Figure 6.
2.5. Controller
V-SLAM algorithms have some critical aspects that need to be carefully addressed before they can be effectively deployed on UAVs. One is the initial localization which needs the camera to move about one meter to detect the image’s features. While it may not be very safe to have a one-meter error as it can cause a collision or crash, in the initial moment, the control of the robot is semi-automatic. Thus, in this controller, there is an initial trajectory command as an input to PD [
43]. That is, the initial setpoint is set for the PD controller. In the next step, the PD controller output is the PI controller input [
44]. After that, the mini-drone begins flying autonomously. Moreover, the mini-drone requires initial localization. In this part, PI uses optical follow [
45] with a bottom camera and data based on velocity for position estimation. These are the steps for the first loop for mini-drone control, after which the mini-drone can fly autonomously.
3. Results
The real-time execution of all operations—encompassing image processing, localization, and mapping—was imperative in evaluating the performance of the integrated SVO-ORB-SLAM system. Performance simulations were conducted utilizing Gazebo 11 and Rviz software platforms, implemented within the framework of the Robot Operating System (ROS) Melodic Morenia on an Ubuntu 18.04 (Bionic) operating environment. The hardware underpinning these simulations comprised an Intel Core i7-4702MQ processor, supplemented by 12 GB of RAM, and a 2 GB VGA GeForce GT 740M graphics card.
Figure 7 shows on the right the viewing angle of the drone’s front camera (provided by Gazebo), where the green spots are the features identified by the ORB-SLAM2 method; on the left, the white cloud points represent the 3D map generated by the ORB-SLAM2 algorithm.
In
Figure 8, the right-hand window illustrates the field of view of the mini-drone’s downward-facing camera within the Gazebo environment. The green markers within this frame are indicative of the image features detected by the SVO algorithm. Conversely, the left-hand window of
Figure 8 presents the Rviz environment. Here, the red pathway visualizes the estimated trajectory of the mini-drone as determined by the SVO algorithm.
Figure 9 showcases the flight of the mini-drone within the Gazebo (right window) and Rviz (left window) environments. The depicted trajectories include the actual flight trajectory denoted by the yellow line, the trajectory estimate provided by the SVO algorithm represented by the red line, the trajectory estimate derived from ORB-SLAM2 denoted by the green line, and finally, the trajectory estimate derived from our proposed technique indicated by the blue line.
3.1. Simulations
We evaluated the mini-drone in two complicated scenarios for performance: (1) low light with low surface texture and (2) high-speed flying. Furthermore, we demonstrated that our suggested solution works very rapidly and smoothly, even with poor hardware.
3.1.1. Low Light with Low Surface Texture
We tested the drone in challenging conditions. For this test, the drone initially flew in a well-lit room with many textures (see
Figure 10). After a few moments, we reduced the ambient light by 50% and removed the carpet from under the mini-drone. As shown in the left part of
Figure 11, the SVO lost the position of the mini-drone. However, our proposed method uses ORB-SLAM data, which allows us to still track the robot’s position in this situation.
3.1.2. High Speed Flying
As part of our more complex testing, we placed the robot in an environment with various objects, as shown in
Figure 12, which depicts the feature detection of the new object. In this test, we increased the flying speed of the mini-drone to twice the speed of the previous tests. As shown in
Figure 13, all of the algorithms were able to estimate the position of the robot. However, the results showed that SVO had a 50% smaller error than ORB-SLAM.
3.1.3. Mapping
In this segment, the robot flies through a corridor while utilizing both SVO and ORB data to generate a comprehensive 3D point cloud map of the environment, as depicted in
Figure 14. The SVO and ORB data allow for precise tracking of the robot’s movement and enable the construction of a highly accurate map. This map can be used for a variety of applications, such as path planning or obstacle avoidance, and is especially important in areas where visibility may be limited or potential hazards require careful navigation.
4. Discussion
The results presented in this study demonstrate the effectiveness and potential of the proposed ORB-SVO SLAM approach for mini-drone localization in GPS-denied environments. The combination of SVO and ORB-SLAM2, integrated using an Adaptive Complementary Filter (ACF), yielded superior performance compared to the individual algorithms. This section will delve into the implications and significance of these findings, address potential limitations, and explore further avenues for research and improvement.
The superior accuracy achieved by the ORB-SVO SLAM approach, as evidenced by the lower overall error rate compared to SVO and ORB-SLAM2, indicates the benefit of combining the strengths of both methodologies. By fusing their respective outputs using the ACF, the approach leverages the advantages of SVO’s computational efficiency and motion handling capabilities, alongside ORB-SLAM2’s robust mapping and localization abilities. This fusion technique effectively compensates for the limitations of each individual algorithm, resulting in enhanced position estimation accuracy and robustness. These are the results of a comparison between our suggested technique and SVO and ORB-SLAM2 in terms of the average position error in an environment that is dynamic and has low light and low texture. In
Figure 15, it is evident that initially, all the methods exhibited significant errors. However, as the drone started to move and estimate its position, the errors gradually decreased to less than 15%. This trend continued until we encountered a change in the situation. For instance, in the black box area in
Figure 15, we either turned off the lights or altered the map, which led to an increase in errors once again. Notably, our method demonstrated fewer abrupt jumps and lower overall errors compared to the other methods.
One of the notable advantages of the proposed approach is its applicability in challenging environments with low light conditions, low surface texture, and high flying speeds. These scenarios are often encountered in indoor settings or other GPS-denied areas where mini-drones are required to navigate with precision and reliability. The ORB-SVO SLAM approach demonstrated its capability to overcome these challenges and provide accurate localization estimates under such conditions.
While the results are promising, there are several potential limitations and areas for further improvement. Firstly, the proposed approach relies heavily on visual information, making it susceptible to challenges in environments with limited or ambiguous visual features. Future research could explore the integration of other sensor modalities, such as depth sensors or Inertial Measurement Units (IMUs), to enhance the robustness of the system.
Furthermore, the scalability of the ORB-SVO SLAM approach to larger environments warrants investigation. This study focused on indoor environments; therefore, assessing its performance in outdoor settings with more-complex scenes and larger-scale maps would provide valuable insights into its generalizability.
Additionally, while the proposed approach demonstrates improvements in mini-drone localization accuracy, there is still room for further optimization. Fine-tuning the parameters and exploring different fusion techniques within the ACF could potentially lead to even better results.
The proposed ORB-SVO SLAM approach represents a significant step forward in mini-drone localization under GPS-denied conditions. The combination of SVO and ORB-SLAM2, integrated using ACF fusion, offers improved accuracy and robustness. This work opens up new possibilities for mini-drone navigation applications, with potential extensions to other domains such as autonomous robotics and augmented reality. Further research and development in these areas will contribute to advancing the capabilities of mini-drones in complex environments and propel the field of visual SLAM forward.
Processing
V-SLAM algorithms are well-known for their demanding computational requirements to ensure effective operation. However, this poses a challenge for embedded systems with limited computational capabilities, such as small drones, which often have restricted processing power. In this study, we leverage two state-of-the-art methods, ORB-SLAM2 and SVO, known for their efficiency and speed. These methods have been extensively analyzed and their detailed steps are described in earlier sections.
To address the computational limitations of embedded systems, our approach carefully balances processing requirements to maintain an optimal level of efficiency. As illustrated in
Figure 16, our recommended solution utilizes only 23.3% of the total available RAM, ensuring efficient memory utilization. Moreover, the processing executed on the CPU cores is exceptionally smooth, demonstrating the effectiveness of our approach in achieving real-time performance.
By utilizing ORB-SLAM2 and SVO, we capitalize on their efficiency and speed without compromising the overall performance of the system. The utilization of these methods in conjunction with our carefully optimized processing strategy enables mini-drones with limited computational resources to perform reliable and accurate localization and mapping tasks.
The effective utilization of computational resources is crucial in enabling embedded systems to operate in resource-constrained environments. Our solution not only achieves efficient memory usage but also ensures smooth execution on the available CPU cores. These aspects are vital for small drones operating in real-world scenarios, where computational efficiency and responsiveness are critical for successful navigation and mapping tasks.
It is worth noting that the specific percentage of RAM utilization and the smoothness of CPU processing exhibited in our results demonstrate the viability of our approach in achieving an appropriate level of computational efficiency. These findings reinforce the feasibility of implementing our solution on small drones with limited resources, opening up possibilities for their deployment in various applications, including surveillance, inspection, and exploration tasks.
Our approach effectively addresses the computational challenges of V-SLAM algorithms on embedded systems, particularly small drones. By carefully optimizing resource utilization, including memory consumption and CPU processing, our solution demonstrates high efficiency and smooth execution. These findings contribute to the practical implementation of V-SLAM algorithms in resource-constrained environments, empowering mini-drones to perform complex navigation and mapping tasks with limited computational resources.
Our approach has an error rate of 14.2% on average across all situations, as shown in
Table 1. Comparatively, the ORB-SLAM2 method has an error rate of 32.7%, while the SVO method has an error rate of 40.1%.
5. Conclusions
In this paper, we have presented a novel approach tailored specifically for mini-drones operating in GPS-denied environments, such as indoor settings. Our proposed strategy, named ORB-SVO SLAM, harnesses the power of Visual Simultaneous Localization and Mapping (V-SLAM) and combines the advantages of both Semi-Direct Visual Odometry (SVO) and Oriented Rotated Brief SLAM (ORB-SLAM2) methodologies. By integrating these techniques using an Adaptive Complementary Filter (ACF), we achieve enhanced performance compared to standalone SVO and ORB-SLAM2 approaches, while maintaining a lower overall error rate.
Our evaluation reveals that the error rate of our ORB-SVO SLAM approach is 25.9 percent lower than that of SVO and 18.1 percent lower than ORB-SLAM2. This improvement demonstrates the effectiveness of our fusion method in providing more accurate position estimates, especially in challenging scenarios characterized by low-light conditions and low-surface-texture environments, as well as high flying speeds. The findings of this study highlight the practical applicability of our strategy for mini-drones operating in a range of real-world environments.
By combining the robustness of ORB-SLAM2 in mapping and localization with the computational efficiency and motion handling capabilities of SVO, our proposed ORB-SVO SLAM strategy offers a comprehensive solution for accurate and reliable mini-drone localization in GPS-denied environments. The fusion of data using the ACF further enhances the overall performance, yielding improved results compared to individual algorithms. These findings contribute to the advancement of mini-drone navigation systems, particularly in scenarios where GPS signals are unavailable or unreliable.
Further research avenues may involve investigating the scalability and generalizability of the ORB-SVO SLAM approach to larger and more complex environments, as well as exploring the integration of additional sensor modalities to further enhance the accuracy and robustness of the system. Additionally, the adaptation of our strategy for real-world applications beyond mini-drones, such as autonomous robots or augmented reality systems, could be an exciting area for future exploration.
Our proposed ORB-SVO SLAM approach, leveraging the strengths of SVO and ORB-SLAM2 through the ACF fusion, demonstrates significant improvements in mini-drone localization accuracy and performance. This work contributes to the ongoing development of visual SLAM techniques and provides valuable insights for advancing the capabilities of mini-drones in GPS-denied environments.