The approach described in this paper uses a visual inertial odometry (VIO) tracking camera to estimate the 6-DOF state of the drone relative to a fixed coordinate system attached to the ground. The use of VIO is required in indoor missions because the usual source of position estimates to the drone’s FCU is the onboard GPS sensor, and GPS signals are not available in indoor environments. Using the VIO sensor in combination with SVGS, it is possible to achieve accurate estimation of the position of the landing target in a coordinate system fixed to the ground. This enables the use of updated position setpoints during the landing mission, which works even if the SVGS device loses line of sight with the target during the landing maneuver. The performance of SVGS as a landing sensor will be compared with that obtained using AR markers for automated landing [
20,
21]. ARuCo defines a modified AR target based on an OpenCV approach, and was used as a special case of AR landing target.
2.1. Smartphone Video Guidance Sensor
The Smartphone Video Guidance Sensor (SVGS) is a low-cost, low-mass, vision-based sensor developed by NASA Marshall Space Flight Center [
18,
19] that computes the six-state position and orientation vector of a target relative to a coordinate system attached to the camera of an Android-based smartphone, by capturing images and implementing photogrammetric techniques.
Figure 1 outlines the operational concept of a drone landing mission using SVGS. SVGS operation starts with image capture of the illuminated target, which is then converted to a binary image, so that the four bright spots on the target can be identified as blobs. Estimation of the six-degrees-of-freedom position and attitude state vector is then performed using a photogrammetric algorithm [
18]. The smartphone’s CPU is used to perform both image processing and state estimation, thus alleviating the computational load on the flight or companion computer [
18].
Figure 1.
Drone landing mission using the Smartphone Video Guidance Sensor (SVGS). The mission sequence numbers correspond to the landing logic shown in
Figure 2. The SVGS can be deployed in many other applications as position/attitude sensor by use of a similar 4-point illuminated beacon [
22].
Figure 1.
Drone landing mission using the Smartphone Video Guidance Sensor (SVGS). The mission sequence numbers correspond to the landing logic shown in
Figure 2. The SVGS can be deployed in many other applications as position/attitude sensor by use of a similar 4-point illuminated beacon [
22].
SVGS Theory of Operation
The SVGS algorithm as shown in [
18] is based on an adaption of the collinearity equations developed by Rakoczy [
23]. In
Figure 3, consider L as the perspective center of the host platform camera (thin lens), and the target as A. An image of A on the image plane is represented as “a”; the image plane is located a distance, f, from the perspective center, where f is the focal length of the lens. Two coordinate frames are defined: the target frame <
X, Y, Z>, and the image frame <
x, y, z>. A vector from the perspective center to point “
A” can be defined in the object frame as
VA, while a vector to point “a” from the perspective center is
vA:
Both vectors are related by
, where
k is a scaling factor and
M is a rotation matrix representing an x, y, z rotation sequence transforming the object frame to the image frame. Dropping the “
a” and “
A” subscripts and solving for the image frame coordinates
x,
y, and
z of point “a”, followed by dividing by “
z” to eliminate the scaling factor
k, yields the following two equations for
Fx and
Fy, where the
mij values are elements of the direction cosine matrix
M:
The relative 6-DOF state vector that needs to be solved for is
V is defined below, where
φ,
θ, and
ψ represent the
x,
y, and
z rotation angles, respectively:
Linearizing
Fx and
Fy by Taylor series truncated after the second term yields:
where
V0 is an initial guess for the state vector, Δ
V is the difference between this guess and the actual state vector
, and
εx and
εy are the
x and
y error from the Taylor series approximation, respectively. Each of the four targets in the SVGS target pattern has a corresponding set of these two equations; the resulting eight equations can be represented in matrix form as
:
This equation is solved for the V that minimizes the square of the residuals, ε. This value is then added to the initial estimate of V to obtain the updated state vector. The process is iterated until the residuals are sufficiently small, yielding the final estimate of the 6-DOF state vector V.
In SVGS, the general form of the collinearity equations described above is narrowed down to reflect the state vector formulation used by AVGS [
19]. AVGS sensor measurements used angle pairs, azimuth, and elevation, measured in the image frame to define the location of each target in the image. Azimuth and elevation are measured with respect to a vector to the perspective center and the target locations in the captured image. Azimuth,
Az, and elevation,
El (Equation (7)), replace Equations (2) and (3) to yield:
The SVGS calculation begins with the capture of the illuminated pattern on the target. The image is then converted to a binary image using a specified brightness threshold value. Blob extraction is performed on the binary image to find all bright spot locations. Location, size, and shape characteristics of the blobs are captured. Depending on whether there are any other objects in the field of view that may generate bright background-noise spots, the number of blobs may exceed the number of targets. To account for noise and properly identify which target is which, a subset of four blobs is selected from among all that are identified, and geometric alignment checks derived from the known orientation of the targets are applied. The process is iterated until the four targets have been identified and properly labeled. The target centroids are then fed into the state determination algorithms. Using the collinearity formulation (Equation (7)), the relative state is determined using a least-squares procedure. The SVGS algorithm flow is shown in
Figure 4 [
18,
19].
2.2. Description of the Flight Platform
The landing experiments used a quadcopter based on the Holybro X500 frame (Holybro Ltd., Shen Zhen, China) (
Figure 5);
Figure 6 shows the corresponding interconnection diagram.
The drone system included: (i) a Pixhawk-4 flight controller (FCU) running PX4 firmware; (ii) a LattePanda Alpha 864 companion computer (64-bit Intel m3-8100Y CPU with 8 GB LPDDR3 RAM) running Ubuntu, as companion computer; (iii) a Samsung Galaxy S8, used as the SVGS device; (iv) an Intel T265 tracking camera, used as Visual Inertial Odometry (VIO) sensor; (v) a IOIO board, for serial communication between the SVGS smartphone and companion computer, and a four-cell lithium-ion polymer battery to power the system. The quadcopter included four DC brushless motors and four electronic speed controllers (ESCs) powered from a power distribution board connected to the battery. The ESCs receive signal from a PWM splitter connected to the FCU, which communicates with a ground station through a telemetry radio link, and with the companion computer through a second serial telemetry port using MAVLink (FCU communication protocol).
The companion computer receives timestamped odometry data from the Intel T265 Visual Inertial Odometry sensor (VIO) through a 3.0 USB connection, and serial SVGS data via a USB to UART connector from the IOIO board, connected to the SVGS device (Samsung Galaxy 8, Samsung, Suwon-Si, Korea) via Bluetooth. In the experiments using ArUco, the SVGS device and the IOIO board are removed, and are replaced with a webcam connected to the companion computer via USB. The landing targets for SVGS and ArUco are shown in
Figure 7. For SVGS, an illuminated LED target was used; for the AR landing, an ArUco marker was used in combination with a USB webcam.
2.3. Overview of Landing Mission
Landing missions start with a preflight safety check, as shown in
Figure 2. Take-off and approach to the landing target are controlled manually; the pilot can switch to autonomous mode at any time using a manual switch integrated with the joystick controller. After a hovering period to stabilize the drone trajectory, the drone acquires the landing target and estimates its position and attitude in the coordinate frame fixed to the ground using SVGS readings and the VIO sensor estimate. After the target is acquired, the automated landing portion of the mission is shown in block 4.1.2. The software for automated landing was developed in the companion computer in ROS/Python, as described in
Section 2.4.
The autonomous landing logic is shown in
Figure 8. The target position is saved in the memory and updated every time a reading is captured: if the target is lost from the camera view during landing, the latest valid target position estimate saved in the memory is used. After target lock in SVGS is achieved, (X,Y) position setpoints are sent to the FCU based on the current SVGS reading, directing it to the estimated landing target while maintaining a constant altitude. Once the drone is within an acceptable radius of the target location (in the horizontal plane) a timer counts to a specified number of seconds, and if the UAV location leaves the acceptable radius, the timer starts over. This ensures that a stable position above the target is achieved before descent. When the timer reaches the specified value (i.e., the vehicle has remained within tolerance above the landing target for sufficient time), the drone descends at constant vertical speed.
Figure 8 shows the flowchart for the autonomous portion of the mission.
2.4. Software Development and Filtering
Flight software in the companion computer was developed in a robot operating system (ROS). ROS consists of a set of communication and scheduling interfaces that use a publisher/subscriber communications architecture and are widely used in robotics applications. They enable to discretization of the required software functionality into software packages called nodes, which are easily modified and reused to support different tasks. The PX4 flight stack has extensive integration with ROSs, particularly through the MAVROS package. This enables the straightforward integration of several sensors, enabling the development of complex systems.
ROS works based on packages; a package is a set of programs called nodes. Nodes pass data, called topics, over specified ports to each other; ROS Kinetic with Ubuntu 16.04 was used in the companion computer. ROSs allow communication between packages; therefore, two open-source ROS packages were used: MAVROS, to support communication with the FCU using MAVLink via UART connection over USB (FCU communication protocol and serial interpreter), and px4_realsense_bridge, to fuse the Intel VIO sensor estimate with the vehicle’s IMU readings. To communicate SVGS data to the FCU, a new ROS package called “DR1” was created using Python nodes that communicate with the MAVROS and px4_realsense_bridge packages.
Figure 9 illustrates how ROS nodes communicate.
The nodes created for DR1 are:
“svgs_deserializer”: converts serial data packets from SVGS and publishes the output as a stamped twist message on the topic “target”, with a data type of twist (a data structure used by ROSs that includes a timestamped translational and angular coordinates).
“aruco_state_estimator”: performs the same task as svgs_deserializer, except it uses a Logitech C920 webcam and the ArUco target to generate the estimates.
“landing_commander”: checks that the current drone position relative to target and magnitude of the drone velocity are within their acceptable threshold values. If they are, the node starts the landing counter. Once the counter hits a threshold value, the node flips the landing flag to true, which commands the drone to start its descent.
“motion_control”: this node depends on the landing flag from the “landing_commander” node. If the landing flag is false, the node updates the setpoint to be the position of the target (from the “svgs_deserializer” node) while maintaining its current altitude. Once the landing flag is true (landing criteria is met), it updates the setpoint to be the position of the landing target.
“px4_control”: this node subscribes to the “motion_control” node. It obtains setpoint data and sends the setpoint data to the FCU at 30 updates/sec.
“data_recording”: subscribes to variables and records them in a CSV file at 25 Hz.
The SVGS is not a real-time sensor and has a high degree of variability on its update rate [
14]. SVGS data are non-deterministic for two reasons: the SVGS runs on an Android-based system, and SVGS data are processed in the Ubuntu-based companion computer. Due to non-deterministic timing inherent to any operating system, SVGS samples are not available at constant sampling rates, but instead follow a certain statistical distribution [
14]. For this reason, attempting to estimate velocity using the filtered finite-differences of the SVGS position estimate leads to highly noisy and inaccurate estimates. The proposed solution for SVGS velocity estimation is based on a second-order kinematic Kalman filter:
where Δ
t represents the average sampling time of the SVGS.
PX4 Flight Controller. The PX4 firmware provides a flight mode called “Offboard Mode”, where commands and setpoints can be sent to the FCU via the MAVLink protocol. This is used in the landing mission to send position setpoints via the MAVROS ROS package. In the Pixhawk-4′s offboard mode, the firmware provides a cascaded control architecture for the flight vehicle, with a combination of P controller for position, and a PID controller for velocity [
24], as shown in
Figure 10. In our experiments, the target’s position, velocity, and yaw estimates relative to the ground were obtained from the VIO sensor fusion with the Pixhawk internal IMU readings.
2.5. Coordinate Frames and Data Conversion
Two sources of position estimates are used: SVGS readings, which give the position of the target relative to the drone in a coordinate system attached to the camera; and the VIO sensor, which gives the position of the drone relative to a frame fixed to the ground. The VIO estimate is used by the FCU proprietary sensor fusion algorithm to generate a position estimate of the vehicle during flight relative to the ground. An outline of the PX4 VIO/IMU fusion logic to calculate position/attitude estimates is presented in
Appendix A. The VIO sensor enables automatic flight in a GPS-denied environment: in the absence of a position estimate, the PX4 FCU will not enter automatic mode. The Intel T265 VIO sensor is more accurate than GPS, but would not be suitable for long-range or high-altitude maneuvers.
When using position estimates from the VIO sensor, the PX4 FCU operates in the FLU frame (Front–Left–Up). The origin of this frame is defined as the boot position of the vehicle, with the positive Front axis pointing to the front of the drone, and the positive Left axis pointing to the left of the drone. The orientation of the FLU frame is constant, regardless of the yaw angle of the drone: the FLU frame is always fixed to the ground and does not rotate with the drone.
SVGS position estimates from the Kalman filter (Equation (10)) need to be converted to the FLU frame, as shown in
Figure 11. Since the FLU frame orientation remains the same regardless of the drone’s heading, the yaw angle from the VIO quaternion estimate is used in the SVGS frame rotation to convert SVGS readings to the drone’s FLU frame.
Once flight mode is switched to autonomous, SVGS readings are ignored if the vehicle’s velocity exceeds a maximum threshold, because SVGS sends void data packets if the target velocity exceeds the threshold (typically 7.5 cm/sec). When an SVGS reading that meets the velocity requirement is captured, it is vectorially added to the local position of the drone in the FLU frame at that moment, as shown in
Figure 11. This becomes the target’s position estimate in the FLU frame, and is updated every time an acceptable SVGS reading is captured.