1. Introduction
In business centers [
1], large public buildings (subways, airports, libraries, etc.), high-risk industrial parks, hospitals, nursing homes [
2], and other indoor places where Global Positioning System (GPS) helps little but where navigation and location services are urgently needed, indoor positioning technology has broad application prospects [
3,
4]. Common optional techniques for indoor positioning include infrared ray (IR), ultrasonic wave, radio-frequency identification (RFID), wireless local area network (WLAN), Bluetooth, ultra-wideband (UWB), etc. Based on these technologies, different implementation schemes have been developed. However, each has obvious drawbacks considering positioning accuracy [
5] or the ability to resist electromagnetic interference [
6], or the high cost of hardware devices [
7], which makes them difficult to become popularized.
Compared with these indoor positioning technologies mentioned above, visible light positioning (VLP) technology has outstanding advantages because of its abundant bandwidth resources [
8], strong ability to resist electromagnetic interference, higher positioning accuracy and illumination ability. Additionally, the cost of hardware devices supporting VLP systems is relatively small, because complex facilities conducting precise measurement are not necessary. A simple VLP system even can be structured with an image sensor, some LEDs and a processor. Therefore, VLP technology has a promising prospect in the field of indoor positioning. There exist two modes of indoor positioning systems based on VLP: The photodiode-based (PD-based) positioning, and the image-sensor-based (IS-based) positioning [
9,
10]. The PD-based VLP systems have been deeply studied by researchers before. A series of research projects on VLP systems based on PD have been completed in our previous works [
11,
12,
13,
14,
15,
16] through which we found that the mobility of the positioning terminal has been severely limited because of PD’s sensibility to the direction of the light beam. To deal with the orientations of the PDs, the authors of Reference [
17] designed a pyramid receiver (PR) and hemispheric receiver (HR), which improved the channel capacity and bit-error-rate (BER) performance under various settings and reduced the space usage. However, another significant drawback of PD-based VLP systems is the doubtful robustness. Even at the same location, repeated measurements will yield a fluctuating value of light intensity. Also, PD-based VLP systems are easily disturbed by ambient light and reflected light [
15]. Meanwhile, the angular and signal strength received need to be precisely measured. Otherwise the positioning results will have noticeable error. In addition, precise instruments must be exploited because the received signal strength and signal’s time difference of arrival should be measured to calculate the distance between the reference point and target [
2]. In contrast, IS-based VLP methods are unaffected by these problems mentioned above. What is more, smart phones are nowadays equipped with high-resolution complementary metal-oxide-semiconductor (CMOS) sensor cameras, which can be combined with IS-based VLP methods easily, where huge commercial value lies. Furthermore, the position would fail if the LED is shielded in PD-based VLP systems. In comparison, our previous work [
9] based on image sensor solved the problem effectively. Therefore, image sensors are better than PDs as the receiving terminal for VLP systems [
18,
19]. Finally, few research projects on PD-based VLP have explored the field of dynamic positioning, which has significance for real-time indoor navigation, but restricts the application to low-speed motion or static positioning [
20].
Theoretically, IS has better performance than PD regarding the field of VLP, but existing studies have not yet yielded satisfactory results in terms of positioning accuracy, real-time ability, and robustness, which are three vital elements for VLP systems. There is still much room for improvement. First, for positioning accuracy, a circle projection-based single LED system was proposed in Reference [
21], whose positioning accuracy was only 25.12 cm. In Reference [
22], the positioning error was reduced to less than 10 cm; however, the performance remained unsatisfactory. These works did not give satisfactory accuracy. In Reference [
23], a novel method was proposed utilizing the electronic compass and gyroscope to calculate the yaw angle of the positioning terminal. The stimulated results showed the error was within 2 cm. However, the real-time ability was not considered in the article. Additionally, few studies could maintain a good balance between real-time ability and positioning accuracy. The author used a minimax filter to estimate the trajectory of the terminal in Reference [
24], whose average velocity simulated by MATLAB simulator was 1 m/s (3.6 km/h), unable to fit higher motion speed. Finally, the VLP method proposed in Reference [
25] considers the system’s real-time ability, whose maximum allowable motion speed of the positioning terminal was 18 km/h, with a positioning accuracy of 7.5 cm. Though it maintained a relatively good balance between accuracy and real-time ability, the blur effect, caused by high relative speed between IS and LED, was ignored. When the target moves at such a speed, the image of the LED will blur. Because the IS-based positioning method in Reference [
25] is based on pixel intensity detection, when the LED is blurred in the image, the LED–ID recognition process through pixel intensity detection becomes tough, which may lead to failure of location. The ignorance of robustness is crippling in the field of VLP, which can make research outputs have little practical value, and this reference article is no exception. Moreover, few papers have considered the general circumstance that the transmitter (i.e., LED) is shielded or broken. When the light links are blocked between LEDs and positioning terminal, the positioning would fail because most of the positioning algorithms are based on two or more LEDs, the absence of even one LED in the image can lead to failure of the whole algorithm. The shielded effect is a fatal problem in the field of VLP. No one has yet solved the problem except our previous work [
9], where a relatively perfect method is proposed. Based on optical flow and Bayesian forecast, the algorithm possesses the ability to address all the difficulties of VLP systems mentioned above, whose positioning accuracy is 0.86 cm and the maximum allowable speed of positioning terminal is up to 48 km/h. Also, the algorithm can locate the target even if half of the LED is shielded. However, the computational cost of the optical flow method is very large. Furthermore, optical flow algorithms alone cannot track the LED under shielded effect. Therefore, Bayesian forecast is introduced to solve the problem of shielded effect. Finally, the outcomes of the Bayesian forecast algorithm and the optical flow algorithm was combined by Kalman filter to obtain the final output.
Though our previous work [
9] seems powerful enough to address all the knotty problems in the field of VLP, it also raised some problems that cannot be ignored. First, the optical flow method used in Reference [
9] greatly increases the computational cost and the running time of the whole system, which reduces the real-time ability of the algorithm, whose utility is inversely proportional to the computational cost. Because the function of the optical flow method is to track the LED, in this article, to improve the real-time ability of our visible light positioning method, it was replaced by the mean shift algorithm, which greatly reduces the computational cost. Moreover, the mean shift algorithm possesses wonderful robustness, which can reduce the influence of the shielded effect exists in the VLP situations to a great degree. Furthermore, in the simple background of VLP, the precision of the mean shift algorithm does not lose to any other target tracking algorithm. Second, the Kalman filter used in Reference [
9] simply combines the outcome of optical flow and Bayesian forecast without considering the state noise and measurement noise of the whole system, which greatly affects the accuracy of the VLP system. In this article, the unscented Kalman filter is introduced to improve the robustness and accuracy of our visible light positioning method and process the noises of the whole VLP system in a clever way.
In Reference [
26], the author proved that the introduction of the unscented Kalman filter into the camshift algorithm can improve the accuracy of the target tracking algorithm as same as reducing the running time of the whole system by leaps and bounds. Inspired by Reference [
26], our proposed algorithm also used unscented Kalman filter (UKF) to improve the performance of the mean shift algorithm, and eventually our VLP method. However, though the unscented Kalman Filter in Reference [
26] took noises into account, the noises were only Gaussian white noises, which are random noises without considering the condition of the target. If the LED is not occluded and the measured position of the LED is accurate, the noises considered will reduce the accuracy. Therefore, inspired by Chaos theory, a weight measuring the accuracy of the measured LED’s position was introduced into our algorithm. If the accuracy of the measured LED’s position is reliable, the weight of Gaussian white noises will be small, and the measured position will be trusted more by the Kalman filter. Through this method, the bad influence of using Gaussian white noise as the state noise and measurement noise can be reduced.
To address the shortcomings of existing methods and improve the performance of our previous VLP system, in this paper, we propose a high-precision, real-time and robust indoor visible light positioning method based on the mean shift algorithm and unscented Kalman filter. The function of the LED is to deliver world coordinates using visible light. After the initial position of the positioning terminal is calculated by the LED–ID recognition algorithm, the mean shift algorithm is utilized to track and locate the LED in the image sequences in real time with high robustness. Then the trajectory of the positioning terminal can be calculated combining the positioning terminal’s initial position with its relative position relationship in the subsequent frames calculated by MS and UKF. The UKF algorithm forecasts the possible location of the LED in the next frame, equipping the mean shift algorithm with the ability to track fast moving targets and reduces the computation cost of the whole algorithm. Also, when the LED is shielded, the trajectory of the positioning terminal will be output by UKF to reduce the influence of noises.
Previous studies have considerable drawbacks in robustness because they had to repeat the complex positioning algorithm to get the trajectory of the positioning terminal. However, in our VLP method, only the initial world position of the positioning terminal is calculated by the static positioning algorithm. The trajectory can be calculated using the relative position relationship between the LED and the positioning terminal in the image sequence, which can be accomplished by the target tracking algorithm. Most of the repeating process of the static positioning algorithm is replaced by robust and lightweight target tracking algorithm. The remainder of this paper is organized as follows:
Section 2 provides a detailed description of the proposed positioning and tracking algorithm. The experimental setup and analysis are presented in
Section 3. Eventually, we summarize our work in
Section 4.
2. Theory
When our proposed method starts, the LED–ID recognition algorithm first finds the LED utilizing the image sensor, then the ID of the LED will be recognized. The details of the LED–ID recognition algorithm was deeply researched in our previous works [
18,
19].
Our proposed method utilizes the rolling shutter mechanism of the Complementary Metal Oxide Semiconductor (CMOS) image sensor. The exposure and data readout are performed row by row. The data of one row read out immediately when the exposure of this row is finished. The working principle of the rolling shutter mechanism is shown in
Figure 1. Instead of employing traditional LED–ID encoding and decoding methods, the process of LED–ID detection and recognition is regarded as a classifying problem regarding machine learning in our algorithm. Through off-line training for the classifiers and on-line recognition for LED–ID, high-speed and robust LED–ID recognition was realized. The LED–ID algorithm will not be detailed in this article. For readers interested in LED–ID recognition, please refer to our previous article [
19].
After the LED–ID is recognized, the world’s coordinate of the LED is known. Then through single-light positioning technology, the initial real-world position of the positioning terminal can be obtained. Next, the mean shift algorithm and UKF dynamically track the LED in the image sequence, through which the relative position in the pixel coordinates between LED in the current frame and LED in the initial frame can be calculated. Utilizing different coordinate systems and the geometric relationships between the LED and the image sensor, linear mapping from the pixel coordinates to the world’s coordinates can be established. With the information above, the relative relationship between the LED’s current position and its initial frame in the pixel coordinate plane can be transformed to the relative position relationship in the world’s coordinate space. Finally, by combining the positioning terminal’s initial position with its relative position relationship in the subsequent frames, the location of the positioning terminal in the real world can be obtained.
Hence, by dynamically tracking the LED in the two-dimensional plane of the IS in the captured image sequences, indoor dynamic positioning can be accomplished. Because the processing time of our lightweight target tracking algorithm is much shorter than that of most static positioning systems, the real-time ability of our algorithm is also good. The steps of our whole algorithm are as follow:
First, the LED–ID recognition algorithm obtains the LED’s initial world coordinates and pixel coordinates. The latter also initializes MS and UKF. Then the predicted position of UKF is treated as the starting position of iteration in MS. When
, the ratio of the target’s area factor
to its initial value, is smaller than the threshold in 5 frames, which means LED is lost in the previous tracking process, the LED–ID recognition algorithm plays its role again to search for the initial position of the LED to start the next tracking cycle. After the LED is tracked and positioned in the pixel coordinate plane, the positioning terminal’s world coordinates can be calculated by the proposed algorithm. The flow of the whole algorithm is shown in
Figure 2. In the following parts, these processes will be analyzed mathematically, and formulations will be given.
After the procedures of LED–ID recognition, the initial position of the LED is obtained, which also initializes MS and UKF. Meanwhile, different coordinate systems, shown in
Figure 3, and the geometrical relationship between the LED and image sensor, shown in
Figure 4, are utilized to get the location of the positioning terminal in the real world. Define
as the set of pixels belonging to LED at time t in the image, and (
)
.
represents
pixel. Because of the background’s area, such as the ceiling of the house, it is much larger than the LED’s area, and the position of the LED (s(t) (
)) in the image can be approximated as the centroid of
.
The world’s coordinates of the image sensor is defined as
(t) (
,
,
), where dx and dy represent each pixel’s size on IS in different coordinate directions, respectively. The height between the LED and lens is known as H, which is assumed as a fixed value and is already known by the algorithm. With all the conditions above, the pixel coordinate (
,
) of the image’s centroid can be calculated as follow:
Next, the horizontal coordinate of image sensor (camera)
in the camera coordinate can be calculated by the following equation:
can be calculated by the same method. As can be seen from
Figure 5, we can rotate the coordinate by Equation (3) and realize positioning under any azimuth.
is transformed from
by the equation below:
In the equation above, represents translation and (,) has been denoted by (t). Let (t) = h(s(t),), where h represents the mapping function and stands for the coordinate of the LED in the world’s coordinates that we have known.
2.1. The Mean Shift Algorithm
In some traditional VLP methods based on image recognition only, every image must be processed to locate the positioning terminal, where maximum allowable motion speed of the positioning terminal is confined by the blur effect caused by the high relative speed between LEDs and image sensors, whereas the blur effect has much less of an impact on the mean shift algorithm, relying on color histogram. Additionally, even if half of the LED is shielded in the image, the tracking process won't fail. In view of the wonderful robustness and real-time ability of the mean shift algorithm, it was used as the main part of our proposed VLP algorithm. Non-parametric density estimation is the basic concept in the mean shift algorithm, using kernel density estimation as the fundamental of the whole theory.
One of the most popular density estimation methods is known as kernel density estimation. Given n points of data
, i = 1…n in the d-dimensional space
, and the multivariate kernel density estimator whose kernel function K(x), together with a symmetric positive definite d × d bandwidth matrix H, calculated in the point x, satisfying:
in which
The kernel function K(x) with d variates is combined with the following equations:
where
is a constant, and I is an Identity matrix. The multivariable kernel function can be generated by the following two methods.
in which
is obtained from the univariate kernels and
is the product of rotating
(x) in
. Besides,
is radially symmetric. The normalized constant
assures that the integral of function
is one.
We only need to focus on a special class of kernel functions satisfying
where k(x) is called the profile function of the K(x). The normalized constant
is always positive, ensuring that the kernel function K(x) integrates to one.
H is a d × d bandwidth matrix which is often computed as diagonal H = diag[
,…
], or proportional identity matrix H =
I. The proportional identity matrix has only one parameter, which can be computed more easily. If we use the equation H =
I for calculation, then Equation (4) can be rewritten as
If the definition of one-dimension kernel function K(x) is brought into Equation (13), we can get the equation which the general mean shift algorithm uses to calculate the density estimate of eigenvalues.
The distribution position with maximum density in the sample data group can be obtained by estimating the density gradient. The density gradient is defined as the gradient of kernel density estimation function, which can be calculated by Equation (14):
in which
(x) is the non-parametric density estimation function based on kernel G(x) at point x, and
(x) is a mean shift vector. Besides, g(x) = −
is the profile function of kernel G(x).
In general, the shorter the distance of the sample point to the central point, the more significant the statistical property of the estimated point x becomes. Thus, the concept of kernel density function is introduced, giving each sample point a different weight associating with their distance to the center point. Also, the mean shift procedure is guaranteed to make the kernel function converge at a nearby point where the estimate density gradient is zero.
Epanechikov kernel function is used for model description. In initial frame, supposing that there are n pixels
in the target region, and the center point is
. If the bandwidth of the kernel function is h, and we uniformly divide the feature space into m subintervals, the probability density estimation of the eigenvalue of the target model u = 1,…,m is
where C is a normalization constant, and function k() is the profile function of kernel, measuring the weight of each pixel by the distance from which to the center point
, as stated above. δ(b(
) − u) judges whether the eigenvalue of pixel
belongs to the
bin. The candidate regions in the subsequent frames are described in the same way.
Because of the excellent fitness between Bhattacharyya coefficient and the mean shift algorithm, the Bhattacharyya coefficient is selected to measure the similarity between target model and candidate model in our proposed algorithm.
The eigenvalue of each model in the feature space is divided into m parts. If public members of two models in a part can be found when the Bhattacharyya coefficient is being computed in each part, the value of Bhattacharyya coefficient grows. The value of m depends on the range of eigenvalue in the feature space. The accuracy of the Bhattacharyya coefficient will be influenced whether m is too large or too small. Here is the definition of Bhattacharyya coefficient:
where ρ(y) ∈ [0,1], whose value represents the similarity between the two models. The candidate region that make the maximum value of ρ(y) is believed to be the position of the target.
The value of Bhattacharyya coefficient ρ(y) should be maximized if we want to get the most accurate position of the target. Normally, the center point
of the target in last frame is treated as the initial position of the target in current frame, where the algorithm begins searching for the goal of optimal matching, whose center point is y. Firstly, calculate the probability density estimation
of the candidate target at point
in the current frame, the current Bhattacharyya coefficient satisfying:
whose Taylor expansion is computed by:
in which
is the weight of each pixel, whose definition is:
It is easy to notice that only the second term of Equation (19) is associated with y. If it gets the maximum value, the value of Bhattacharyya coefficient is maximized too.
Thus, we analyze the second term, to define:
which is very similar with the definition of the kernel density function. The mean shift vector also can be calculated, pointing towards the center point y of the target’s actual position from the candidate point
, satisfying:
Supposing that the distribution of target’s model is
u = 1,…,m, the estimated position in the current frame is
, and the error allowed is ε, then the mean shift algorithm can be implement by the following steps, as is shown in
Figure 6:
Assume that the target’s initial position in the current frame is the central point of the target in last frame. Firstly, compute the probability density estimation of the candidate target in current image frame using the same method as Equation (15). Then, utilize Equation (17) to calculate the Bhattacharyya coefficient ρ();
Compute the weight of each bin with Equation (20);
Update the center point y of target region;
Calculate the Bhattacharyya coefficient ρ(y);
If ρ(y) > ρ(), the center of the search region transfers to point y;
If |y − | < , the point y is considered as the center point of the target region in the current frame, else skip to step 1.
2.2. The Unscented Kalman Filter
Though the traditional mean shift algorithm possesses excellent real-time ability and accuracy under ideal situations, it cannot track targets with high speed because the algorithm starts searching the candidate target from its central point in the last frame within a small area. If the target moves outside the area within the time of one frame, the algorithm fails. To improve the real-time ability and the robustness of our proposed algorithm, giving it the ability to track objects moving at a high speed as well as to pinpoint the target under shielding effect, we introduced the unscented Kalman filter into our detection algorithm.
Based on the unscented transformation (UT), shown in
Figure 7, UKF abandons the traditional method of linearization of non-linear functions and adopts the framework of the Kalman filter, which is shown in
Figure 8. On the premise that the mean value and covariance of the random vectors remain unchanged, a set of Sigma sample points was selected, each of which goes through non-linear transformations. Then the mean and variance of the random vector through the non-linear transformation were estimated by the statistics of the transformed sample points, avoiding the error caused by linearization. The UKF algorithm has better stability than EKF because it no longer calculates the Jacobi matrix of non-linear equations. Also, UKF has a similar performance and smaller calculation cost compared with particle filter.
An ordinary state space model can be divided into two parts: namely state transfer model and state observation model. The state of the tracking system is the state of the target, while the observation model is the sequence image. Given that the velocity of the target changes, we assume the acceleration a(k) is a random quantity, and a(k) satisfying the Gauss distribution a(k)
. The target’s state vector was set to be
, where (x, y) was the target’s center point;
and
represent the velocity of the target at the x and y coordinate directions, respectively; and
was the ratio of the target’s current area in the tracking window to its initial area. Observation variable
, in which
and
are observation values of the target. The state transition model and observation model are respectively computed by:
where
, and
represent the value of white Gaussian noise of the state transition model and observation model, respectively.
is the time interval of adjacent frames, which is one in our algorithm.
is the target’s area factor, which is computed by the zeroth moment of the image in the tracking window.
(k),
is the centroid of the LED obtained from the mean shift algorithm. Supposing that
,
is the initial value of the LED’s centroid, calculated by our LED–ID detection algorithm, X(0) =
. The theory and formula of the unscented Kalman filter is shown below.
Assume that state’s mean and variance of the n-dimension state vector X at time k − 1 are
and P(k − 1) respectively. The state transition model and observation model are:
where
is the state transition equation and H is the observation equation (specially,
in our algorithm).
and
are the white Gaussian noise matrices of these two models (in our algorithm
and
, whose statistical characteristics satisfy:
in which
and
are the covariance matrix of two noise matrices, respectively.
(1) Initialization of UKF
The Sigma points can be calculated by Equations (30)–(32).
where
is the candidate parameter, and
.
(2) Time updating process. Take Sigma points into the state transition Equation (33) and observation Equation (36), then compute the state vector’s average value by Equation (37)
where
is the weight coefficient of the mean.
;
, i = 1,…,2n.
(3) Observation updating equations. Take Equations (34), (36), (37) into Equations (38) and (39), compute Kalman gain by Equation (40).
in which
+ (1 −
);
0 and the value of
is 0 here.
Particularly in our algorithm, a confidence coefficient
is introduced into Equation (38), which can be rewritten as:
If there exists little interference or shielded area of the LED in the image, in other words, , which means (k), can been viewed as the real centroid of the LED, let to reduce the influence of the noise matrix , meaning the original observation value is trusted. However, if the value of is close to 0, which means most areas of the LED are occluded in the image, the outcome of MS cannot be trusted. Because the centroid of the candidate region will be viewed as the actual centroid of the LED by the mean shift algorithm, whose error is considerable. In this condition, it is necessary to give the noise matrix a bigger weight. Originally, . To obtain proper weight, will be normalized into [], where in our algorithm. Thus, . When , in our algorithm.
The mean and variance of the state vector can then be updated after taking Kalman gain into Equations (42) and (43):
Due to the uncertainty of moving targets and model, each time the target’s position calculated by our proposed algorithm will be compared with its last position to update and correct the state model of unscented Kalman filter.
Through introducing UKF into the mean shift algorithm, our proposed algorithm possesses the capacity to track fast moving targets because the mean shift algorithm starts searching for the candidate target from the position predicted by UKF (, the prediction value of state equation in Equation (27)) in current frame instead of from the centroid of the target in last frame, which also reduces the number of iteration of MS. The candidate search region in the current frame was chosen reasonably by UKF, taking the priori positions of the target into consideration, thus the problem of the target’s giant velocity making it move outside the candidate search region within the time of one frame has been solved. Meanwhile, because of the reduced average iteration number, the real-time ability of the algorithm was enhanced too. When the target was located, its current position was compared with previous positions to estimate its velocity and the most possible position in the next frame, through which the state model of UKF was updated. The combination of UKF and the mean shift algorithm makes a closed loop tracking system which can track fast moving targets in real-time effectively.