Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera

Park, Soon-Yong; Lee, Ung-Gyo; Baek, Seung-Hae

doi:10.3390/app13169095

Open AccessArticle

Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera

by

Soon-Yong Park

^1,*

,

Ung-Gyo Lee

²

and

Seung-Hae Baek

³

¹

School of Electronic and Electrical Engineering, Kyungpook National University, 80 Daehak-ro, Puk-gu, Daegu 41566, Republic of Korea

²

HL Klemove, 224 Harmony-ro, Yeonsu-gu, Incheon 22011, Republic of Korea

³

KLA Corporation, 830 Dongtansunwhan-daero, Hwaseong-si 18468, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(16), 9095; https://doi.org/10.3390/app13169095

Submission received: 10 July 2023 / Revised: 4 August 2023 / Accepted: 7 August 2023 / Published: 9 August 2023

(This article belongs to the Special Issue Autonomous Vehicles and Robotics)

Download

Browse Figures

Versions Notes

Abstract

:

This study presents a visual-only odometry technique of a low-profile pallet robot using image feature tracking in ground plane images generated from a fisheye camera. The fisheye camera is commonly used in many robot vision applications because it provides a larger field of view (FoV) around a robot. However, because of the large radial distortion, the fisheye image is generally converted to a pinhole image for visual feature tracking or matching. Although the radial distortion can be eliminated via image undistortion with the lens calibration parameters, it causes several side effects, such as degraded image resolution and a significant reduction in the FoV. In this paper, instead of using the pinhole model, we propose to generate a ground plane image (GPI) from the fisheye image. GPI is a virtual top-view image that only contains the ground plane at the front of the robot. First, the original fisheye image is projected to several virtual pinhole images to generate a cubemap. Second, the front and bottom faces of the cubemap are projected to a GPI. Third, the GPI is homographically transformed again to further reduce image distortion. As a result, an accurate ortho-rectified ground plane image is obtained from the virtual top-view camera. For visual odometry using the ortho-rectified GPI, a number of 2D motion vectors are obtained using feature extraction and tracking between the previous and current frames in the GPI. By calculating a scaled motion vector, which is the measurement of the virtual wheel encoder of the mobile robot, we estimate the velocity and steering angle of the virtual wheel using the motion vector. Finally, we estimate the pose of the mobile robot by applying a kinematic model to the mobile robot.

Keywords:

fisheye lens; cubemap; ground plane image; pallet mobile robot

1. Introduction

Nowadays, almost all robots utilize visual information for their tasks such as grasping, random bin picking, autonomous navigation, and so on. In these robotic applications, visual information, similar to human vision, provides plenty of important and useful information to robots. For this reason, more and more computational or artificial vision techniques have been developed. In particular, in the field of autonomous vehicles or mobile robots, obtaining visual information with a large field of view (FoV) is advantageous because it can reduce the blind areas around the vehicles or robots. In this regard, using the fisheye camera is an efficient way for a robot to obtain images with a large field of view. In addition, a small number of fisheye cameras can be used to obtain around-view images from the robot rather than using normal FoV cameras. Therefore, the fisheye camera is commonly used nowadays in many robotic applications.

As one of the robotic applications, the vision-based or visual localization of a mobile robot is an essential technology in autonomous mapping and navigation. However, for certain types of robots, such as pallet or cleaning robots, it is inevitable that fisheye cameras are attached close to the ground level, as shown in Figure 1. Consequently, the ground area occupies half of the captured fisheye image. In this situation, vision-based localization is challenging mainly because of two reasons. First, half of image feature extraction and tracking is conducted in the ground area, while the texture pattern on the ground is generally uniform compared with other cluttered areas. Common feature extraction algorithms in computer vision are subject to extract a smaller number of features in the uniform ground area than the other surrounding areas. Thus, the ground area in a fisheye image is not cost effective in terms of feature extraction and tracking. Second, the undistortion of the fisheye image to a pinhole image causes inevitable image loss along the boundary of the original image. As a result, only the center area of the original image, where half of the center area is also the ground area, can be used in the undistorted pinhole image for feature extraction and tracking.

Even though there are several disadvantages to using the fisheye camera in the low-profile robot, the ground information can be employed efficiently for visual navigation by assuming that the ground is a planar surface. Because almost all indoor mobile robots are assumed to move on the planar ground plane, the motion direction of a mobile robot can be considered parallel to the ground plane. This study is also motivated by the fact that the low-profile robot moves on a 2D planar space, which is parallel to the ground plane, and the fisheye camera captures the ground images at a very close distance to the ground.

In this study, we propose using a visual odometry method with a fisheye camera for low-profile pallet robots. By only using the ground plane images generated from the fisheye camera, the pose of the robot is estimated. The proposed method consists of several steps, as shown in Figure 2. First, a fisheye image is converted to a cubemap image [1]. Second, a ground plane image (GPI) is obtained from the cubemap image. GPI is a virtual top-view image generated by the bottom and front faces of the cubemap. Third, a ground plane image is again converted to an ortho-rectified GPI to compensate for any distortion error due to lens distortion parameters. Fourth, image feature points are extracted and tracked using the optical flow in the ortho-rectified GPI. The robot motion is then modeled as a 2D vector in the image space [2]. Finally, as shown in Figure 2, the odometry of the robot is estimated by applying the bicycle kinematic model with the motion vector between consecutive image frames [3]. The architecture of the proposed visual odometry is named Ground Plane Odometry (GPO). In several experiments using a low-profile pallet robot, we compare the localization performance with T265 stereo odometry, GPO, and the Extended Kalman Filtering (EKF) of GPO and T265 [4,5]. By applying EKF with GPO and T265, we achieve a 0.058 m mean localization error in meters based on three indoor odometry experiments.

The structure of this study is as follows. In Section 2, we briefly describe conventional studies related to this paper. In Section 3, we explain the details of each step of the proposed method. In Section 4, we show several experiments and evaluation results. Finally, in Section 5, we conclude this paper.

2. Literature Review

In this section, we present the previous literature related to the visual odometry techniques of mobile robots, which employ image features on the surface of the ground plane with fisheye or wide-angle lens cameras.

M. Flores et al. [6] compare the visual localization performance of four well-known image features, which are extracted from fisheye images: Oriented FAST and Rotated BRIEF (ORB) [7], Speed-Up Robust Features (SURF) [8], Features from Accelerated Segment Test (FAST) [9], and KAZE [10]. They measure the translation and rotation errors of visual odometry computed from two different approaches: the standard visual odometry and the adaptive probability-oriented feature matching methods. A fisheye image dataset by Zhang et al. [11] is used for the experiment. The experimental results show that the performance of ORB and KAZE is relatively better than that of SURF and FAST.

W. Yahui et al. [1] propose a novel visual SLAM (Simultaneous Localization and Mapping) method using cubemap images. They employ ORB-SLAM [12] as the baseline of their proposed CubemapSLAM method. Cubemap images are consecutively generated from fisheye images, and they are used as the input of ORB-SLAM. The cubemap camera model can use all image information of the fisheye camera without affecting the performance of feature descriptors. In this study, CubemapSLAM is efficiently implemented, and it can run in real time. Despite the limited angular resolution of the sensor, CubemapSLAM shows better accuracy in localization compared with the pinhole camera model [13].

A. J. Swank [14] utilizes a single downward-pointing camera for the localization of a mobile robot. Image features are obtained from the camera, which is equipped with a wide-angle lens. The optical flow algorithm is employed to measure the location of the robot. In this study, the application of the proposed method was specifically targeted to robotic platforms in unseen areas. The author suggests applying the proposed method to any indoor or outdoor area when other localization sensors are not available.

J. Zhang et al. [15] present a robust localization method based on monocular visual odometry. Accurate position estimation is presented even when a mobile robot is operated in undulating terrain. Their method employs a steering model to separately estimate the rotation and translation of the robot. Moreover, they also address the localization problem in an undulating terrain. The undulating terrain is approximated by multiple patches of locally flat surface. In each local flat surface, the inclination of the surface is estimated at the same time as the robot’s motion. The surface inclination accuracy is about 1% error.

J. Jordan et al. [16] present a visual odometry technique of a mobile robot that is working on a 2D plane by applying the kinematic model. A monocular camera facing downward is used to capture the images of the floor. The motion of the robot is estimated by the alignment between two consecutive images. The motion is then combined with the kinematic model of the robot to remove the outlier image regions that do not reflect the actual motion. To obtain the ortho-rectified floor images, they use a simple image homography method.

D. Zhou et al. [17] address a ground plane-based odometry for the purpose of autonomous vehicle driving. They use a single normal vision camera to capture the ground image in front of a road-driving vehicle. To estimate the vehicle motion in real metric space, they propose a divide-and-conquer motion scale estimation method based on the ground plane and camera height. Their investigation is a typical example that the ground plane-based odometry can be used in the autonomous vehicle driving technique.

J. Jordan et al. [18] introduce a ground plane-based visual odometry using a downward-facing RGBD camera. The RGBD camera captures the ground images and 3D point clouds frame by frame, and the images and points are transformed into a virtual orthogonal camera. Then, the projected data are split into multiple orthographic grid data blocks. Each image block is registered frame by frame using Efficient Second-order Minimization (ESM) [19]. Finally, the robot motion is estimated by the weighted average of all block motions. This approach uses an RGBD camera; thus, 3D ground information helps to find accurate robot motion.

M. Ouyang et al. [20] integrate the gyroscope and wheel encoder information into visual odometry to reduce the odometry drift problem during planar motion. They use the gyroscope information to compensate for the orientation from the wheel encoder. In addition, the ground plane from camera images is segmented to extract the image features on the ground. As the backend of the odometry measurement, the graph optimization is used to minimize the measurement residuals of wheel encoder, gyroscope, vision, and ground feature constraint, respectively.

X. Chen et al. [21] introduce a SLAM technique called StreetMap. For the localization and mapping from a mobile robot, they use a downward-facing camera to utilize the ground textures. They categorize the ground textures into two groups: feature based and line based. In each category, they proposed different localization mapping methods. They used the downward-facing camera to obtain the ground texture images. However, no image transformation, such as ortho-rectification, is used.

H. Lin et al. [22] address the self-localization of a mobile robot using a single catadioptric camera. The geometric property of the catadioptric camera projection model is utilized to obtain the intersection of vertical lines and the ground plane. From the catadioptric camera image, the ground plane is segmented and followed by the vertical line extraction. Using the geometry of the catadioptric camera projection model, the 3D line equation corresponding to the extracted vertical image line is calculated. Then, the robot motion between two consecutive frames is estimated using the relationship of the 3D vertical lines.

In our previous investigation, we design a pallet-type mobile robot and mount a fisheye camera at the front of the robot to calculate the visual odometry [23]. The fisheye image is first converted to a cubemap and next a ground plane image to track the image pattern on the ground. Using the motion of the feature tracking, the visual odometry of the robot is calculated.

3. Proposed Approach

3.1. Overview

The algorithm steps of the proposed method are shown in Figure 3, and it is summarized as follows. As the first step, a fisheye image of the front view of a pallet robot is converted to a cubemap image. From the cubemap image, we generate a virtual pinhole camera image, Ground Plane Image (GPI), which is introduced in our previous study [23]. GPI contains only the images of the ground plane at the front of the robot as if a virtual pinhole camera is facing perpendicular to the ground plane, as shown in Figure 3. Ideally, GPI must be the ortho-rectified image captured from the virtual top-view camera. However, due to the image transformation error both from the fisheye to the cubemap and from cubemap to GPI, it is not perfectly ortho-rectified. The transformation error is mainly due to errors in fisheye lens modeling and errors in homography computation between the cubemap and GPI.

To perfectly generate an ortho-rectified GPI, we also propose to transform the ground plain image again to the ortho-rectified image (ortho-rectified GPI). In ortho-rectified GPI, the ground plane is perfectly rectified as the top-view camera captures perpendicular to the ground. As the next step, we track image features in ortho-rectified GPI using the Kanade–Lucas–Tomashi (KLT) optical flow algorithm to determine the motion vector of the robot [24]. The motion vector between two GPIs is then scaled to determine the robot’s motion in the metric space. Then, from the scaled vector, the velocity and steering angle of a virtual front wheel of the robot are estimated. Because the fisheye camera is mounted at the front of the robot, we employ a kinematic bicycle model for the motion estimation of the virtual front wheel of the robot [3].

3.2. Calibration of Fisheye Camera Using EUCM

In normal visual odometry, using a fisheye image as the original form is not common because of the complexity of multi-view geometry. Thus, usually, the fisheye camera model is converted to the pinhole model to simplify the multi-view geometry. However, transforming the fisheye model to the pinhole model causes a serious distortion in the undistorted pinhole image. In addition, only the center part of the undistorted image can be used in the rasterization step, and the remaining image areas are removed.

Therefore, instead of the pinhole model, we use the Cubemap [1] model to utilize all image areas after undistortion. To transform the fisheye image to a cubemap image, all calibration parameters of the fisheye camera must be calibrated. As the calibration model, we use the enhanced unified camera model (EUCM) [25], which is a nonlinear camera model that is generally used in omnidirectional cameras such as catadioptric systems or fisheye cameras. The EUCM has six parameters, [f_x, f_y, c_x, c_y, α, β]. The curvature of the fisheye lens is calculated using α and β. Therefore, by using these parameters, the EUCM can represent the fisheye camera distortion. Figure 4 shows the image projection model to the image plane through the fisheye lens. A 3D point X is projected on a curved plane P to a 3D point

X_{P}

, as shown in Equations (1)–(5).

X_{} = [x_{}, y_{}, z_{}],

(1)

ρ = \sqrt{β (x^{2} + y^{2}) + z^{2}},

(2)

X_{P} = [x_{P}, y_{P}, z_{P}],

(3)

x_{P} = \frac{x}{α ρ + (1 - α) z},

(4)

y_{P} = \frac{y}{α ρ + (1 - α) z}

(5)

Then,

X_{P}

is orthogonally projected into the M plane, which is the normal image plane, as shown in Equation (6).

X_{M} = [x_{P}, y_{P}, 1] = [x_{M}, y_{M}, 1] .

(6)

Finally, as shown in Equation (7), we multiply the normal plane point

X_{M}

with the intrinsic matrix K to obtain the projection pixel

X_{I}

on the fisheye image.

X_{I} = K X_{M} = [\begin{matrix} f_{x} \\ 0 \\ 0 \end{matrix} \begin{matrix} 0 \\ f_{y} \\ 0 \end{matrix} \begin{matrix} c_{x} \\ c_{y} \\ 1 \end{matrix}] X_{M} = [\begin{matrix} u \\ v \\ 1 \end{matrix}] .

(7)

In this study, to calibrate the fisheye camera’s EUCM, we use a calibration toolbox Kalibr [26,27,28,29,30], which is extremely useful for camera calibration. Kalibr offers a variety of camera models, including EUCM, pinhole model, omnidirectional model, and double sphere models.

3.3. Mapping Fisheye Image to Cubemap Image

After calibrating the fisheye camera, we use the calibration parameters to generate the cubemap image from the fisheye image. As shown in Figure 5a, the cubemap model consists of five image planes generated by virtual pinhole camera models. Each virtual pinhole model has the same intrinsic but different extrinsic parameters because of different viewing directions. Thus, the fisheye image is divided into five virtual pinhole cameras. Meanwhile, the intrinsic parameters of the virtual cubemap depend on the size of each cubemap face. We calibrate the intrinsic parameters [f_{Face_x}, f_{Face_y}, c_{Face_x}, c_{Face_y}] of the virtual pinhole cameras so that the pinhole image resolution is half of the cubemap image resolution. Here, f is the focal length, c is the principal point, and the subscript Face is the index of the five cubemap faces such that

F a c e = \{F r o n t, L e f t, R i g h t, B o t t o m, a n d T o p\}

.

Figure 5b shows the relationship between the cubemap image plane and the cubemap model. To generate a cubemap image, we first define the transformation from the cubemap image coordinates to the cubemap model coordinates, as shown in Equation (14). Here,

R_{F a c e}

is a 3D rotation matrix of the index Face cubemap plane.

R_{F r o n t} = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix}],

(8)

R_{L e f t} = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \begin{matrix} - 1 \\ 0 \\ 0 \end{matrix}],

(9)

R_{R i g h t} = [\begin{matrix} 0 \\ 0 \\ - 1 \end{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix} \begin{matrix} 1 \\ 0 \\ 0 \end{matrix}],

(10)

R_{B o t t o m} = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \begin{matrix} 0 \\ 0 \\ - 1 \end{matrix} \begin{matrix} 0 \\ 1 \\ 0 \end{matrix}],

(11)

R_{T o p} = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix} \begin{matrix} 0 \\ 0 \\ 1 \end{matrix} \begin{matrix} 0 \\ - 1 \\ 0 \end{matrix}],

(12)

X_{C I} = [u, v, 1],

(13)

X_{C} = R_{F a c e} X_{C I} .

(14)

We transform the cubemap image coordinates

X_{C I}

into the model coordinates

X_{C}

, which is on one of the cubemap faces, as shown in Figure 6. The concept of the EUCM-Cubemap projection model involves the use of the EUCM parameters to transform the point of the cubemap face to the pixel of the fisheye image or vice versa. First, we transform point

X_{C}

to another point

X_{P}

on the curve plane P of the EUCM, as shown from Equations (15) to (18). We replace

X

in Equation (1) with

X_{C} .

Here, the back-projection

z_{P}

is not required in this step.

X_{C} = [x_{C}, y_{C}, z_{C}],

(15)

ρ = \sqrt{β (x_{C}^{2} + y_{C}^{2}) + z_{C}^{2}},

(16)

x_{P} = \frac{x_{C}}{α ρ + (1 - α) z_{C}},

(17)

y_{P} = \frac{y_{C}}{α ρ + (1 - α) z_{C}} .

(18)

Next, we project

X_{P}

to

X_{M}

on the EUCM normal image plane M using orthogonal projection, as shown in Equation (6). Finally,

X_{M}

is projected to a pixel point

X_{I}

on the fisheye image by multiplying it with the intrinsic matrix K, as shown in Equation (7).

By following this process, pixel coordinates in the fisheye image are transformed into the cubemap image coordinates. Using the backward transformation from the cubemap image to the fisheye image, we can identify the correspondences between the cubemap and the fisheye images. Because this process is computationally expensive, the coordinate transformation was performed once when the proposed method starts. Then, the index table of the pixel coordinates mapping between images is stored and used. Figure 6 shows the EUCM–cubemap projection model, and Figure 7 shows the difference between the fisheye image and the cubemap image projections.

3.4. Generating GPI from Cubemap Image

This section describes how to generate the proposed ground plane image (GPI) from the cubemap image. As shown in Figure 8, the front and the bottom faces in the cubemap model are used to generate GPI, which is an ortho-rectified image from a virtual top-view camera. To generate a GPI in the front face, as shown in Figure 8, the front-face image is warped to the ground plane using a homography matrix. We project a point q on the front-face image to a homogeneous point Q on the ground plane image using Equation (22).

Q_{} = [\begin{matrix} X \\ Y \\ 1 \end{matrix}],

(19)

q_{} = [\begin{matrix} u \\ v \\ 1 \end{matrix}],

(20)

H = [\begin{matrix} h_{11} \\ h_{21} \\ h_{31} \end{matrix} \begin{matrix} h_{12} \\ h_{22} \\ h_{32} \end{matrix} \begin{matrix} h_{13} \\ h_{23} \\ 1 \end{matrix}],

(21)

Q_{} ≅ H q .

(22)

In Equation (22), the symbol

≅

means that the left and the right terms are homogeneously the same. The homography matrix H is calculated from the relationship between the four corner positions of a chessboard. The chessboard is placed on the ground, as shown in Figure 8, and the cubemap of the chessboard is generated. The four corners of the chessboard are selected from the cubemap image.

The bottom-face image is just cropped because the bottom-face is already obtained using a virtual top-view camera, as shown in Figure 5a. Figure 9 shows an example of GPI generation from a fisheye image. A trapezoid ground area in the front face of the cubemap image is warped to a rectangular area in a GPI, and a cropped bottom-face area is concatenated at the bottom of the GPI. In an ideal case, the generated ground plane image shows the ortho-rectified image from a virtual top-view camera.

3.5. Ortho-Rectification of GPI

Ideally, GPI is the ortho-rectified image obtained from the virtual top-view camera. Therefore, there should be no distortion in the image of the ground plane. However, due to the inherent errors in the EUCM calibration parameters and homography transformation from the front-view face of the cubemap model to the ground, there exists image distortion in GPI. An example of image distortion in GPI is shown in Figure 10. From the left in the figure, an original fisheye image is converted to a cubemap image and finally to a GPI. In this figure, we intentionally place the chessboard pattern in front of the robot to identify the distortion easily. In the generated GPI, we can notice that there is image distortion of the chessboard pattern. This kind of image distortion must be corrected. Otherwise, the visual odometry using the distorted GPI yields erroneous results.

To correct the image distortion in GPI, we transform the GPI again to a new image, ortho-rectified GPI. The method of distortion correction is as follows. We place the chessboard at the front of the robot, obtain a fisheye image, and generate the GPI, as shown in Figure 11. Then, we select four corner corners in the image, as shown in the red-colored rectangle in the figure. Finally, a homography matrix is calculated so that the red-colored rectangle area is transformed into a real rectangle, and the area of the rectangle is the same as the real metric space of the 2 × 4 squares area. To convert the pixel to metric space unit, we use the pixel-to-meter scale factor, which is used in Section 3.6.

To compare the ortho-rectification accuracy of the GPI, we measure the side length and area of the 2 × 4 chessboard squares, as shown in Table 1. As shown in Figure 11, four sides of the red-colored rectangle, a, b, c, and d, are compared before and after ortho-rectification. The area of the rectangle S is also compared in Table 1. As shown in the table, the ortho-rectification corrects image distortion in GPI, and the dimension of the ortho-rectified GPI is almost close to the true values.

3.6. Feature Tracking in Orth-Rectified GPI

The proposed visual odometry method in this study is using the ground motion in ortho-rectified GPI. Because the ortho-rectified GPI is the top-view image from the virtual camera, the ground motion in GPI can be measured in real metric dimensions using the focal length and the height of the camera. To obtain the ground motion in GPI, we use a simple and robust KLT optical flow algorithm [24]. Moreover, to obtain a reliable motion vector that represents the motion of the robot, we obtain only the median vector between previous and current frames, as shown in Figure 10.

Usually, the height of a pallet robot is very low, and the camera mounted close to the ground captures the details of the ground or floor patterns. The ground pattern in the ortho-rectified GPI is random enough to extract and track the features in consecutive image frames. Therefore, many features are usually obtained in a single image frame, and they are tracked using KLT. In an ideal case, all motion vectors via KLT are the same because the ortho-rectified GPI is a perfectly orthogonal view of the ground. However, there can be tracking errors in KLT, thus using the median of the motion vectors is the best calculation for motion estimation.

3.7. Metric Motion in Ortho-Rectified GPI

As shown in Figure 12, the median motion vector

m

obtained from the ortho-rectified GPI matches with the robot motion with an unknown scale because vector

m

is obtained in the pixel space. Thus, we need the scale factor between the pixel space to the metric space to measure the actual robot motion. To find the scale of the motion vector, some assumptions are defined. Assuming that an ortho-rectified GPI is obtained using the virtual camera that is perfectly perpendicular to the flat ground, the scale of the motion vector is calculated using the height z from the ground to the camera’s lens center and the focal length of the camera f. The scale factor is expressed in Equation (23).

scale = \frac{z}{f},

(23)

\bar{m} = s c a l e * m .

(24)

By using the scale factor, the robot’s actual motion vector

\bar{m}

, as shown in Figure 13, is calculated by multiplying the scale factor by the pixel motion

m

.

3.8. Robot Odometry Using a Kinematic Motion Model

As the motion of the camera mounted on the robot depends on the robot’s movement, we apply the motion vector of the ground plane image to a kinematic model of the robot to estimate the robot’s odometry. In this step, the motion vector of the ortho-rectified GPI is regarded as the measurement of a virtual front wheel. Thus, we employ the bicycle motion model and estimate the velocity V of the virtual front wheel and the steering angle δ using Equations (25) and (26).

V = \frac{\sqrt{{\bar{d x}}^{2} + {\bar{d y}}^{2}}}{d t},

(25)

δ = t a n^{- 1} (\frac{\bar{d y}}{\bar{d x}}) - \frac{π}{2},

(26)

where

\bar{d x}

and

\bar{d y}

are the components of the median motion vector, and

d t

is the interval time between image frames.

Figure 14 shows the bicycle motion model we employ in this study. The bicycle motion model rotates and moves around IC, the center of motion in a 2D plane. We use the median motion vector

\bar{m}

of the GPI to measure the velocity V of the virtual front wheel and the steering angle δ. As a result, the 2D odometry of the robot is estimated, and it is named Ground Plane Odometry (GPO). The bicycle motion follows three assumptions:

The wheel-based mobile robot moves in a 2D plane;
The center of mass of the mobile robot is located in the center of the robot body;
Non-slip, no longitudinal/lateral slip occurs on the wheels.

Following these assumptions, the robot’s pose is estimated by applying the velocity and the steering angle of the virtual wheel to the bicycle model of the robot as follows. All parameters of the bicycle model in Figure 14 are calculated using the following equations:

S = \frac{L}{\tan (δ)},

(27)

β = a t a n (\frac{l r}{S}) = a t a n (l r \frac{\tan (δ)}{L}),

(28)

R = \frac{S}{\cos (β)},

(29)

\dot{θ} = ω = \frac{V}{R} = V \frac{\tan (δ) \cos (β)}{L},

(30)

θ = θ + \dot{θ} d t,

(31)

where lr is the distance between the rear wheel axis center and the mass center of the robot, L is the distance between the rear wheel axis center and the front wheel axis center, and

ω

is the angular velocity of the robot that rotates around the IC. Using these parameters, the velocity of the robot is determined as shown in Equations (32) and (33).

\dot{x} = Vcos (θ + β),

(32)

\dot{y} = Vsin (θ + β) .

(33)

Finally, the visual odometry of the mobile robot is estimated using Equation (34).

[\begin{matrix} x (t + 1) \\ y (t + 1) \\ θ (t + 1) \end{matrix}] = [\begin{matrix} x (t) \\ y (t) \\ θ (t) \end{matrix}] + [\begin{matrix} \dot{x} \\ \dot{y} \\ \dot{θ} \end{matrix}] d t .

(34)

4. Experiment and Evaluation

4.1. Experimental Setup

The robot used in this experiment is a pallet-type mobile robot, as shown in Figure 15, and it operates on ROS [31]. Table 2 lists the hardware specifications of the mobile robot. As shown in Figure 15b, an Intel tracking camera T265 is mounted at the front of the robot. T265 has stereo fisheye cameras, and the FoV is 163 ± 5 degrees. We use the left camera of T265 to obtain fisheye images for our experiment. Each fisheye image has a resolution of

848 \times

800. T265 also has an IMU because it performs localization based on visual-inertial odometry (VIO).

The evaluation of the proposed method is carried out using an optical tracking system, as shown in Figure 16a. The ground truth position of the robot’s odometry is measured using the OptiTrack V120 Trio pose-tracking device [32]. To track the robot, four optical markers are attached to the four corners of the pallet robot as shown in Figure 16b. In this study, we used the trajectory evaluation toolbox proposed in [33] to calculate the odometry translation error compared with GT.

4.2. Motion Vector Evaluation

In this study, we introduce the Ground Plane Odometry (GPO) of a low-profile pallet robot using ortho-rectified GPI, a virtual top-view image. GPO is performed on the assumption that a virtual top-view camera captures the flat ground plane. If this assumption is correct, ideally, all the motion vectors on an ortho-rectified GPI will have the same direction and length. Figure 17a shows an example of the ortho-rectified GPI and motion vectors via the KLT optical flow. Figure 17b shows the distribution of the motion vectors in the (

\bar{d x}, \bar{d y})

space. As seen in the figure, the distribution of the motion vectors is almost uniformly distributed, and we take the median vector as the robot motion between frames.

In this study, we assume that the median vector is the motion of the robot. Using the median vector is more accurate than using the mean vector, which is the average of all the motion vectors. Because motion vectors can contain some outliers due to incorrect motion tracking, the mean vector is rather biased from the real robot motion. If the mean vector is erroneous due to the outlier, the final robot odometry is also affected by the erroneous motion vector. To compare the performance of GPO using the median and the mean vectors, we have three experiments using short trajectories. Table 3 shows a comparison of GPO results using median and mean vectors. The table shows the minimum, maximum, and average values of the relative translation error of GPO. Three experimental trajectories for comparison are shown in Figure 18. This experiment shows that using the median vector is more robust to noise and more accurate than using the mean vector.

4.3. Odometry Trajectory Evaluation

Figure 19 shows the comparison of three different odometry methods, T265 odometry, proposed GPO odometry, and EKF of T265 and GPO. For the evaluation of the EKF method, the T265 odometry and the GPO results are loosely coupled. To do this, the odometry outputs of the first two methods are filtered using the EKF algorithm in ROS. As shown in the figure, the EKF results best follow the ground truth trajectories. Table 4 is the translation error of three methods compared with the ground truth. In the table, GPO is better than the T265 odometry, and the EKF performs best. In addition, we also compare the GPO performance using two different ground plane images. In this table, using an ortho-rectified GPI is better than using the original GPI. This implies that ortho-rectified GPI can ideally generate the virtual top-view image from the fisheye camera.

5. Conclusions

In this study, we propose a visual odometry method for a low-profile pallet robot using the ground plane images obtained from a fisheye camera. To overcome the inherent radial distortion problem of the fisheye image, we introduce a Ground Plane Image (GPI), which is generated from two faces of the cubemap. Initially, we generate an original GPI and transform the image again to an ortho-rectified GPI to obtain a perfect top-view image from the mobile robot. Consequently, the median of the motion vectors in the ortho-rectified GPI can represent the real and accurate motion at the front head of the robot. The bicycle motion model is employed to estimate the odometry of the robot. The performance of the proposed Ground Plane Odometry (GPO) is compared with a stereo-based VIO method of a T265 camera. The proposed GPO shows better performance than T265 odometry, and the EKF of GPO and T265 achieves the best performance. In addition, it is shown that using the ortho-rectified GPI is better than using the original GPI. The proposed GPO can be used in any low-profile robot equipped with fisheye cameras. In addition, GPO can be used by combining with other robot sensors for better performance. In future studies, GPO will be employed to multi-view fisheye cameras in the pallet robot for more accurate visual odometry performance.

Author Contributions

Conceptualization, S.-Y.P. and S.-H.B.; methodology, software, validation, formal analysis, U.-G.L. and S.-H.B.; sensor calibration U.-G.L.; investigation, S.-Y.P. and S.-H.B.; writing—original draft preparation, U.-G.L. and S.-Y.P.; writing—review and editing, S.-Y.P.; supervision, S.-Y.P.; project administration, S.-Y.P.; funding acquisition, S.-Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2021R1A6A1A03043144).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y.; Cai, S.; Li, S.J.; Liu, Y.; Guo, Y.; Li, T.; Cheng, M.M. CubemapSLAM: A Piecewise-Pinhole Monocular Fisheye SLAM System. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer International Publishing: Cham, Switzerland, 2019; Volume 11366. [Google Scholar]
Barron, J.L.; Fleet, D.J.; Beauchemin, S.S. Performance of Optical Flow Techniques. Int. J. Comput. Vis. 1994, 12, 43–77. [Google Scholar] [CrossRef]
Polack, P.; Altche, F.; DAndrea-Novel, B.; de La Fortelle, A. The Kinematic Bicycle Model: A Consistent Model for Planning Feasible Trajectories for Autonomous Vehicles? In Proceedings of the IEEE Intelligent Vehicles Symposium, Los Angeles, CA, USA, 11–14 June 2017. [Google Scholar]
Julier, S.J.; Uhlmann, J.K. Unscented Filtering and Nonlinear Estimation. Proc. IEEE 2004, 92, 401–422. [Google Scholar] [CrossRef]
Intel T265 Tracking Camera. Available online: https://www.intelrealsense.com/tracking-camera-t265/ (accessed on 4 August 2023).
Flores, M.; Valiente, D.; Cebollada, S.; Reinoso, Ó.; Payá, L. Evaluating the Influence of Feature Matching on the Performance of Visual Localization with Fisheye Images. In Proceedings of the International Conference on Informatics in Control, Automation and Robotics, Paris, France, 6–8 July 2021. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
Rosten, E.; Drummond, T. Machine Learning for High-Speed Corner Detection. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Graz, Austria, 7–13 May 2006; Volume 3951. [Google Scholar]
Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE Features. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Zhang, Z.; Rebecq, H.; Forster, C.; Scaramuzza, D. Benefit of Large Field-of-View Cameras for Visual Odometry. In Proceedings of the IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; Volume 2016. [Google Scholar]
Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef] [Green Version]
Cyganek, B.; Siebert, J.P. An Introduction to 3D Computer Vision Techniques and Algorithms; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
Swank, A.J. Localization Using Visual Odometry and a Single Downward-Pointing Camera; NASA: Washington, DC, USA, 2012. [Google Scholar]
Zhang, J.; Singh, S.; Kantor, G. Robust Monocular Visual Odometry for a Ground Vehicle in Undulating Terrain. In Proceedings of the Springer Tracts in Advanced Robotics; Springer: Berlin/Heidelberg, Germany, 2014; Volume 92. [Google Scholar]
Jordan, J.; Zell, A. Kinematic Model Based Visual Odometry for Differential Drive Vehicles. In Proceedings of the 2017 European Conference on Mobile Robots, ECMR, Paris, France, 6–8 September 2017. [Google Scholar]
Zhou, D.; Dai, Y.; Li, H. Ground-Plane-Based Absolute Scale Estimation for Monocular Visual Odometry. IEEE Trans. Intell. Transp. Syst. 2020, 21, 791–802. [Google Scholar] [CrossRef] [Green Version]
Jordan, J.; Zell, A. Ground Plane Based Visual Odometry for RGBD-Cameras Using Orthogonal Projection. IFAC-PapersOnLine 2016, 49, 108–113. [Google Scholar] [CrossRef]
Malis, E. Improving Vision-Based Control Using Efficient Second-Order Minimization Techniques. In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 26 April–1 May 2004; Volume 2004. [Google Scholar]
Ouyang, M.; Cao, Z.; Guan, P.; Li, Z.; Zhou, C.; Yu, J. Visual-Gyroscope-Wheel Odometry with Ground Plane Constraint for Indoor Robots in Dynamic Environment. IEEE Sens. Lett. 2021, 5, 6000504. [Google Scholar] [CrossRef]
Chen, X.; Vempati, A.S.; Beardsley, P. StreetMap—Mapping and Localization on Ground Planes Using a Downward Facing Camera. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018. [Google Scholar]
Lin, H.Y.; Chung, Y.C.; Wang, M.L. Self-Localization of Mobile Robots Using a Single Catadioptric Camera with Line Feature Extraction. Sensors 2021, 21, 4719. [Google Scholar] [CrossRef] [PubMed]
Lee, U.-G.; Park, S.-Y. Visual odometry of a mobile palette robot using ground plane image from a fisheye camera. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLIII-B1-2, 431–436. [Google Scholar] [CrossRef]
Lucas, B.D.; Kanade, T. Iterative Image Registration Technique with an Application to Stereo Vision; Carnegie Mellon University: Pittsburgh, PA, USA, 1981; Volume 2. [Google Scholar]
Khomutenko, B.; Garcia, G.; Martinet, P. An Enhanced Unified Camera Model. IEEE Robot. Autom. Lett. 2016, 1, 137–144. [Google Scholar] [CrossRef]
Rehder, J.; Nikolic, J.; Schneider, T.; Hinzmann, T.; Siegwart, R. Extending Kalibr: Calibrating the Extrinsics of Multiple IMUs and of Individual Axes. In Proceedings of the IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; Volume 2016. [Google Scholar]
Furgale, P.; Rehder, J.; Siegwart, R. Unified Temporal and Spatial Calibration for Multi-Sensor Systems. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013. [Google Scholar]
Furgale, P.; Tong, C.H.; Barfoot, T.D.; Sibley, G. Continuous-Time Batch Trajectory Estimation Using Temporal Basis Functions. Int. J. Robot. Res. 2015, 34, 1688–1710. [Google Scholar] [CrossRef]
Maye, J.; Furgale, P.; Siegwart, R. Self-Supervised Calibration for Robotic Systems. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gold Coast, QLD, Australia, 23–26 June 2013. [Google Scholar]
Oth, L.; Furgale, P.; Kneip, L.; Siegwart, R. Rolling Shutter Camera Calibration. In Proceedings of the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
Lee, U.G.; Choi, K.J.; Park, S.Y. The Design and Implementation of Autonomous Driving Pallet Robot System Using ROS. In Proceedings of the International Conference on Ubiquitous and Future Networks, ICUFN, Jeju Island, Republic of Korea, 17–20 August 2021. [Google Scholar]
OptiTrack V120. Available online: https://optitrack.com/cameras/v120-trio/ (accessed on 4 August 2023).
Zhang, Z.; Scaramuzza, D. A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018. [Google Scholar]

Figure 1. A fisheye image from a low-profile robot. (Left): A fisheye camera is mounted at the front of a pallet robot. (Right): A sample image from the fisheye camera.

Figure 2. The flow diagram of the proposed visual odometry method.

Figure 3. The algorithm steps of the proposed Ground Plane Odometry.

Figure 4. A graphical model of the enhanced unified camera model.

Figure 5. (a) Illustration of cubemap model. (b) Illustration of relation between cubemap image and cubemap model.

Figure 6. Illustration of enhanced unified camera model based cubemap projection model.

Figure 7. Illustration of relation between fisheye image and cubemap image projection.

Figure 8. Illustration of the relationship between the ground plane and front-face and bottom-face images, respectively.

Figure 9. Generation of a ground plane image from a cubemap image.

Figure 10. The distorted ground plane image of a chessboard placed at the front of the robot.

Figure 11. Ortho-rectification of GPI. After ortho-rectification, the chessboard image is almost perfective undistorted.

Figure 12. The motion vector on the ground plane image.

Figure 13. Relation of the motion vector in ground plane image and actual motion vector.

Figure 14. Motion vector on the ground plane image applying to the bicycle motion model.

Figure 15. (a) A pallet-type robot; (b) An Intel T265 fisheye camera is attached in front of the pallet robot.

Figure 16. (a) Optitrack’s optical pose tracking device V120 Trio; (b) Four optical markers attached to the pallet robot.

Figure 17. (a) Unfiltered motion vectors obtained from the ground plane image (b) A

(\bar{d x}, \bar{d y})

plot of motion vectors.

Figure 17. (a) Unfiltered motion vectors obtained from the ground plane image (b) A

(\bar{d x}, \bar{d y})

plot of motion vectors.

Figure 18. Comparison of median vector and mean vector method in GPO. From top, trajectories of Test1, Test2, and Test3.

Figure 19. Comparison of three odometry methods. From top, trajectories of Test4, Test5, and Test6.

Table 1. Comparison of chessboard dimension in GPI and ortho-rectified GPI.

Measurement (Figure 11)	Chessboard Ground Truth	GPI (Error)	Ortho-Rectified GPI (Error)
(a + c)/2 (mm)	4 × 20.007	84.664 (4.636)	80.085 (0.058)
(b + d)/2 (mm)	2 × 20.050	36.660 (3.44)	40.036 (0.064)
Area (S) (mm²)	3209.139	3127.102 (82.037)	3207.881 (1.258)

Table 2. Experimental pallet robot’s hardware specification.

Component	Hardware
Embedded Board	NVIDIA Jetson-Nano
Wheel Encoder	PHIDGET Optical Rotary Encoder
Motor	PHIDGET NEMA-17 Bipolar Stepper
Motor Controller	PHIDGET Stepper Bipolar HC
LiDAR	SLAMTEC RPLIDAR-S1
Camera	Intel RealSense T265 Tracking Camera
Battery	NEXT-408PB-UPS (40,800 mAh)

Table 3. The GPO’s translation errors in meter. (Path length is in parenthesis).

		Test1 (3.8832)	Test2 (2.2896)	Test3 (5.4559)	Average
GPO (Mean Vector)	min	0.0018	0.0115	0.0533	0.0222
	max	0.1694	0.1736	0.1194	0.1541
	avg	0.0435	0.0876	0.0886	0.0732
GPO (Median Vector)	min	0.0019	0.0040	0.0182	0.0080
	max	0.0787	0.0329	0.0731	0.0615
	avg	0.0337	0.0165	0.0485	0.0329

Table 4. Translation error comparison of three odometry methods. (Path length is in parenthesis and unit is in meters).

Image Type	Odometry	Test4 (12.7826)	Test5 (14.6013)	Test6 (17.0111)	Average
GPI	T265	0.118618	0.226798	0.151477	0.1656
	GPO	0.077336	0.183521	0.097809	0.1195
	EKF (GPO+T265)	0.047038	0.103502	0.068983	0.0731
Ortho-rectified GPI	T265	0.118618	0.226798	0.151477	0.1656
	GPO	0.049830	0.217686	0.085550	0.1177
	EKF (GPO+T265)	0.038309	0.067267	0.068169	0.0579

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, S.-Y.; Lee, U.-G.; Baek, S.-H. Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera. Appl. Sci. 2023, 13, 9095. https://doi.org/10.3390/app13169095

AMA Style

Park S-Y, Lee U-G, Baek S-H. Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera. Applied Sciences. 2023; 13(16):9095. https://doi.org/10.3390/app13169095

Chicago/Turabian Style

Park, Soon-Yong, Ung-Gyo Lee, and Seung-Hae Baek. 2023. "Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera" Applied Sciences 13, no. 16: 9095. https://doi.org/10.3390/app13169095

APA Style

Park, S. -Y., Lee, U. -G., & Baek, S. -H. (2023). Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera. Applied Sciences, 13(16), 9095. https://doi.org/10.3390/app13169095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual Odometry of a Low-Profile Pallet Robot Based on Ortho-Rectified Ground Plane Image from Fisheye Camera

Abstract

1. Introduction

2. Literature Review

3. Proposed Approach

3.1. Overview

3.2. Calibration of Fisheye Camera Using EUCM

3.3. Mapping Fisheye Image to Cubemap Image

3.4. Generating GPI from Cubemap Image

3.5. Ortho-Rectification of GPI

3.6. Feature Tracking in Orth-Rectified GPI

3.7. Metric Motion in Ortho-Rectified GPI

3.8. Robot Odometry Using a Kinematic Motion Model

4. Experiment and Evaluation

4.1. Experimental Setup

4.2. Motion Vector Evaluation

4.3. Odometry Trajectory Evaluation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI