1. Introduction
Video stabilization is commonly used in unmanned aerial vehicles (UAVs), humanoid robots, binocular robot and so on, for surveillance, navigation, guidance, and control of the system through video data [
1]. The jitter and burling effect in videos are mainly caused from the movement of the camera, also called the global motion, and due to the motion of the moving object in the existing video, which is termed the local motion. It requires reducing those phenomena for high-quality video output while the camera is in motion thus enabling usage of the other features, e.g., tracking, mapping and recognizing. This research is divided into analyzing the mechanical stabilization systems and the digital stabilization systems [
2].
Firstly, the mechanical stabilization systems improved the support base of the camera by detecting the acceleration and angular velocity while the camera is moving [
2]. In the camera market, the Optical Image Stabilization (OIS) system is installed on the camera lens or the image sensor which is quite expensive [
3]. On the other hand, digital stabilization systems deal with image post-processing by compensating the movement of the captured image when the camera is moving. It can be divided into three steps, namely motion estimation, motion smoothing and image warping [
4,
5]. Motion estimation adopts the motion model, e.g., translation, affine, and similarity while two frames of the image source change the motion. Then, smoothening of the camera intentional motion is done by employing the Kalman filter [
6] or a Gaussian low-pass filter [
7,
8] in order to eliminate the unplanned motion. Lastly, the stabilized video is warped on the final image plane.
The common technique for motion estimation in digital video stabilization is using block matching [
9], a KLT (Kanade–Lucus–Tomasi) tracker [
10,
11], SIFT [
12] and SURF [
13], respectively. Some of the feature trackers such as SIFT and SURF have a heavy computational load for real-time digital video stabilization [
14]. However, recently, Dong et al. [
14] and Lim et al. [
15] performed a KLT tracker to estimate the motion in real-time with a high frame rate and a low computational cost. A KLT tracker detected the feature points by Good Feature to Track and estimated the optical flow of consecutive frames with the Lucas–Kanade method. This tracker features success in evaluating the motion in the small movement but fails when exposed to a significant change in both local motion and global motion [
15]. However, the large motion can be approximated by using IMU (Inertial Measurement Units) data as shown in [
16] that demonstrates the efficiency of the gyroscope in estimating the optical flow during fast rotation. It adopted only the gyroscope data to aid the optical flow computation in improving the performance while measuring the motion estimations.
From the literature review, the motion distribution of global and local motion is limited to approximating the motion with only one motion estimation method. Therefore, we propose a hybrid function to switch the motion estimation algorithm to determine a transformation between two consecutive frames for stabilizing the video in the real environment. In case of low movement, we apply a KLT tracker to compute the optical flow of the moving object on two consecutive frames. However, in the case of fast rotation, the rotational data from an IMU sensor is used to estimate the movement by calculating the motion from the predefined moving point and the reference point. Thus, motion flow in each method can estimate the motion with rigid transformation. We then reduce the noise of motion by a Kalman filter which smoothens the trajectory. Finally, the stabilized video is warped.
The paper is organized as follows. In
Section 2, we present a related work in video stabilization. In
Section 3, we introduce our proposed framework concerning the methodology about the video stabilization using both a hybrid method and calibrated sensors. Next, we discuss a hybrid method in
Section 4, which estimates the motion by swapping between a KLT tracker and an IMU-aided motion flow, and the homography approximation of the subsequence frames. Then, the motion filter used to reduce noise is explained in
Section 5.
Section 6 discusses the multi-threaded approach. Moreover, the efficiency of a hybrid method and the performance of the multi-threaded approach are presented in
Section 7. Finally, the conclusions are provided in the last section.
2. Related Work
IMU sensors are used to estimate the motion of moving objects. To increase the motion tracking performance, an IMU sensor is also an integral part along with other types of sensors such as Zhao et al. [
17] who used an GPS/INS (inertial navigation system) system to correct positions when compared with the standalone INS; Gadeke at al. [
18] proposed an IMU sensor fusion with a barometric sensor for tracking the position of smartphones in case of lost connection; Carlos A. et al. [
19] presented the motion tracking from human by using an IMU sensor, and a laser rangefinder to interact with robots; and Lake et al. [
20] established the interface device between an IMU sensor and EMG sensor to detect the motion from muscles. Moreover, an IMU sensor was successful in tracking accurately the movement by combining an accelerometer and a gyroscope, as shown in [
21,
22]. However, our work focuses on only the gyroscope data of an IMU sensor used to measure the camera rotation for basic stabilization of videos.
In video stabilization applications, an IMU sensor is used in the mechanical system to reduce the jitter effect in devices, as demonstrated by Antonello et al. [
23] who installed an IMU on the pan-tilt-zoom (PTZ) camera and used a two-level cascade to control a pneumatic hexapod’s support base. On the other hand, the IMU sensor in the digital video stabilization system is used only by the camera to estimate motion. Odelga et al. [
24] successfully stabilized a video with a low computation load by using the orientation data from the support base relative to a horizontal frame with no feature tracking applied in the design. Drahansky et al. [
25] used an accelerometer to estimate a local motion vector for calculating a smooth value in a global motion vector, while Karpenko et al. [
3] used only gyroscope data to create rolling shutter wrapping for video stabilization. Additionally, other researchers have integrated IMU data and feature tracking. Moreover, some researchers have included both IMU data and feature tracking, respectively, e.g., Ryu at al. [
26] who proposed the motion estimation by incorporating the rotation motion into the KLT tracking, which used the initial position from an IMU to predict the next frame on the robot’s eye application, thereby demonstrating speed and accuracy. Moreover, in the real-time of the video stabilization, Lim et al. [
15] proposed a multi-threaded approach to improve the processing speed compared to Dong et al. [
27], who used a notebook computer with a 2.5 GHz Intel Duo Core at 40 fps. However, Lim et al.’s method had an average frame rate at 50 fps while processing on a notebook computer with a dual-core 1.70 GHz processor.
3. Proposed Framework
The challenge in this paper is how to estimate the motion of the consecutive frames when using on the moving camera, e.g., the large rotates due to the camera rotations and the local movements based on the moving object in the scene. According to our objective, we applied a hybrid method to approximately decide the method to estimate the motion of the moving camera as illustrated by the flowchart in
Figure 1.
Firstly, the rotational velocity of the camera needs to check for decision the function to estimate motion parameters between a KLT tracker and an IMU-aided motion estimator. Then, the motion estimation forward to approximate the homography by using the rigid transformation to warp the stabilized frame. However, the undesired motion may contain from the previous step then it can reduce the motion noise by the motion filter. In this paper, we use Kalman filter which suitable to the dynamic model. Lastly, the final stabilized frame is creating with the accurately affine matrix.
To estimate the motion with a hybrid method, we adopted the rotational velocity of the camera (
ωcam) measured by an IMU sensor (
ωimu) to decide the estimation method. However, the two devices were different positions thus we needed to transform
ωimu to
ωcam with the relative orientation (
Rci). Hence, the rotational velocity of the camera is defined by:
where
bimu is the gyroscope bias,
t and
toff represent the measurement data from an IMU sensor in real-time and the temporal time offset, respectively. The camera and the IMU sensor are calibrated as follows: (1) calibrate the camera to find out the focal length (
f) by using the camera calibration module in OpenCV (Open Source Computer Vision); (2) calibrate the gyroscope to prevent gyro drift problems with averaging of the bias offset. We can assess the bias offset by measuring the output signal of the gyroscope over a long period of time when this sensor is sitting still and reduce noise by Kalman filter; (3) estimate
Rci by using the relation between gyroscope data and optical flow that was proposed by Li and Ren [
16] which is using the CC+LS13 method, and (4) determine the
toff between the gyroscope and the camera input by a cross-correlation (CC) method [
28] to synchronize operation between both of the sensors to accurately collect data in time. Then, we can correctly estimate the
ωcam.
The values of ωcam can be divided into two assumptions at 0.5 rad/s on the z-axis as IMU sensor values diminish in the low rotational rate and it continues to stabilize the video in this case by using a KLT tracker for the local motion in an existing video. Thus, both methods of motion estimation can be described in detail as the following subsection.
6. Multi-Threaded Approach for Video Stabilization
The primary process of the stabilized video has three steps: motion estimation, motion smoothing, and image warping. For implement our algorithm in real-time, we managed the execution of each step algorithm with the multi-threaded approach. It can be processed into an array of commands in a single process, executed independently along with sharing the processing resources. To array process of multi-threaded approach in the video stabilization, we separated into three processes: thread1 for motion estimation, thread2 for motion smoothing and thread3 for image warping as shown in
Figure 7.
The motion estimation thread inputted the video stream from the moving camera. A hybrid method switched the algorithm to estimate the motion, and it kept the estimated motion from the first 5th input frames before feeding to another process; all algorithms of the stabilized video were simultaneously processed after the 5th frame. Thus, from the 6th frame, the motion smoothing and image warping started the computation. Consequently, the accuracy motion in thread2 collected the probable motion in thread1 dynamically from the 2nd frame to the 5th frame.
The multi-threaded approach delayed in the starting process because thread2 and thread3 used the carried out data from thread1. However, the whole process continued until the last frame, and the threaded approach increased the frame rate as compared to the single-threaded approach demonstrated in the next section.
7. Experimental Results and Discussion
In the experiment, we implemented our algorithm by using the ELP-4025 fisheye camera (produced by Ailipu Technology Co. Shenzhen, China) and an IMU as GY-85 model as shown in
Figure 8. The GY-85 model consists of 3-axis accelerometer, 3-axis gyroscope, and 3-axis magnetometer. The fisheye camera and the IMU sensor were used because of being low-cost and also can be applied in the future development of surveillance applications in the mobile robot. The resolution of the captured video was 640 × 480 pixels. We developed our algorithm with C++ and also used the basic module from OpenCV.
Before stabilizing the video, the camera and the IMU sensor were calibrated. Firstly, we calibrated the camera with the chessboard by using the camera calibration module in OpenCV to determine the focal length. Secondly, we calibrated the gyroscope to prevent the drift problem by averaging the bias offset. The bias offset could also be determined by measuring the output signal of the gyroscope that reduced the noise in the original signal through the Kalman filter, as over a long period this sensor was sitting still. Then, we achieved the averaging of the bias offset in Equation (1) from 15 experiments with standard deviation (STD). The calibration data is shown
Table 1. Next,
Rci was estimated by using the measured data from the gyroscope in the IMU sensor and the optical flow (see detail [
16]). In addition, the average of
Rci from 15 experiments with the CC+LS13 method in [
16] are shown in
Table 1, which represents in term of quaternion.
Moreover, the camera and the IMU sensor are synchronized and operate between both of the sensors to accurately read data in time by a cross-correlation method. The temporal synchronization assumed the delay from the sensors which is constant [
32]. This is a simple and easy way to define time lag between the two measurements. It can identify the offset time by applying the small sinusoidal signal required for moving the camera around the
z-axis to measure
. Then, comparison of the average magnitude of optical flow and gyro data is done as shown in
Figure 9. The optical flow is estimated with the image interpolation algorithm [
33] and the gyro data is calculated from Equation (1). The archived data from the optical flow and gyro data have the same phase included so the maximum phase lag between the two measured data represented by toff, equals to −0.035 s, as in our case.
Our proposed method was compared with a standalone of a KTL tracker and an IMU-aided motion estimator. To evaluate the performance of the stabilized video with reasonably acceptable results, we used the inter-frame transformation fidelity (ITF) [
34] to represent the quality of the stabilized video in a single value by summarized the peak signal to noise ratio (PSNR), that is given by:
where
Nmax is the number of frames and PSNR is used to perform the effectiveness of the stabilization method, which is defined as:
where
Imax is the maximum pixel intensity of the video frame, and MSE is the mean square error between two consecutive frames,
In and
In+1, which is calculating in every pixel of the continuous frames in the stabilized video, can be defined as:
where
N and
M are the frame dimensions. The high of ITF and PSNR represents the good quality of the stabilized video.
We used the IMU sensor data to optimize the switching threshold rotation speed. The switching threshold of different
values resulted in low ITF while using the high
to switch, as shown in
Figure 10. Motion was not estimated at low rotation speeds during the high percentage case of motion estimation, as in Sequences 3–5. The highest IFT was obtained at
equaled to 0.5 rad/s. IFT decreased in Sequences 1 and 2 while the switching threshold changed to a higher value, because it was not sufficiently active in the high-percentage case of low
.
Figure 11 shows the results of the stabilized video with other methods when the
of the 48th, 168th and 227th sequence frames were at −7.7 rad/s, 6.07 rad/s and −1.21 rad/s, respectively. The Kalman filter was warped with rigid transformation. The stable results of an IMU-aided motion estimator in the 48th and 168th frames were working better than the KLT tracker, especially in the case where it was fast rotated due to the KLT tracker failing to optimize the optical flow. Conversely, when the rotation was low in the 227th frame, a KLT tracker more efficient than the IMU-aided motion estimator due to the rotational signal being feeble. Thus, our improvement of an auto-switching algorithm included all advantages of both methods to illustrate the stabilized video as shown in
Figure 11c, that was switching to an IMU-aided motion estimator in the 48th and 168th frame and was switching to the 227th frame regarding fast rotations and low rotations, respectively.
Moreover, we implemented five sequences with 300 frames for each sequence to demonstrate the ITF value of our algorithm when compared with the other method as shown in
Table 2 and
Table 3, which is fairly evaluated. It was clear to perform our algorithm switch to both methods depending on the motion behavior. Sequence 1 and Sequence 2 showed the ITF of our proposed method that resembled a KLT tracker due to the percentage of the estimation method with both method being similar, and an IMU-aided motion estimator was inefficient in case of low rotations. Hence, it means a KLT tracker is useful for low rotations. However, in the case of fast rotations more than the case of low rotations, e.g., Sequences 3–5, the standalone of the IMU-aided motion estimator was more for the ITF than that a KLT tracker. Moreover, in the case of fast rotations, the IFT of our proposed model is better than standalone KLT tracker and an IMU-aided motion estimator because our algorithm can conduct all rotational behaviors. The demos of the stabilized video is shown in the following link:
https://youtu.be/ZS_PLFBBRAM.
In
Figure 12, we represent the PSNR of Sequence 1 that was used to calculate the ITF, as shown in
Table 3 demonstrates the stabilized efficiency. It can be seen that the value of an original video is lower than the stabilized video in different motion estimation methods. The PSNR of the KLT tracker is higher than other methods, as shown by the data in
Table 3 showing the latter with a high ITF. But some of the consecutive frames with a KLT tracker are under the original video, which means the stabilized video in the subsequent frame is rough due to a KLT tracker failure. On the contrary, with our proposed method maintains the PSNR over the original video. It means the whole video of our proposed method achieved smoothness without any rough display on the video.
The maximum and average of MSE, are shown in
Table 4 through comparison of the other methods. Most of the values from our proposed method was less than the standalone of a KLT tracker and an IMU-aided motion estimator, especially when the camera was moving with high rotation in the Sequences 3–5.
The effect of the motion is illustrated in
Figure 13, which showed our effective approach in smoothening the large motion in Sequence 1 from the Kalman filter. It improved the evenness over the originally measured motion by removing the undesired motion in both the
x and
y trajectories. Both offsets in the
x- and
y-trajectories of the stabilized video was smoother than the original one and that calculating in real-time.
Furthermore, we examined the execution of our algorithm in real-time with diverse threaded approaches. We implemented the algorithm on an Intel NUC Core i5-5250U at a frequency of 1.60 GHz, operating on ROS (Robot Operating System) in the Ubuntu platform. In the case of the single-threaded approach, we used the recorded rotational data from real-time, loading it offline for estimating the motion of high rotations with the IMU-aided motion estimator.
Table 5 shows the comparison between single-threaded and multi-threaded approaches, respectively, showing an average frame rate for the multi-threaded approach to be 25.1 fps, which increased by 26.7%, compared to single-threaded approach.
The higher frame rate of the multi-threaded approach, more than the single-threaded approach shared the data with another thread suiting our proposed algorithm, described in real-time stabilization. The multi-threaded approach reduced the processing time to smoothen the trajectory because it could be calculated immediately with no waiting of the motion estimated data. Nevertheless, it has limited frame speed due to the firmware of the low-cost camera. Thus, this approach used less memory and low computational load to synchronize the threads of the stabilized video in real-time, which could be implemented in the high-level applications.