The Kalman Filter Revisited Using Maximum Relative Entropy
Abstract
: In 1960, Rudolf E. Kalman created what is known as the Kalman filter, which is a way to estimate unknown variables from noisy measurements. The algorithm follows the logic that if the previous state of the system is known, it could be used as the best guess for the current state. This information is first applied a priori to any measurement by using it in the underlying dynamics of the system. Second, measurements of the unknown variables are taken. These two pieces of information are taken into account to determine the current state of the system. Bayesian inference is specifically designed to accommodate the problem of updating what we think of the world based on partial or uncertain information. In this paper, we present a derivation of the general Bayesian filter, then adapt it for Markov systems. A simple example is shown for pedagogical purposes. We also show that by using the Kalman assumptions or “constraints”, we can arrive at the Kalman filter using the method of maximum (relative) entropy (MrE), which goes beyond Bayesian methods. Finally, we derive a generalized, nonlinear filter using MrE, where the original Kalman Filter is a special case. We further show that the variable relationship can be any function, and thus, approximations, such as the extended Kalman filter, the unscented Kalman filter and other Kalman variants are special cases as well.1. Introduction
In 1960, Rudolf E. Kalman demonstrated an ingenious way to estimate unknown variables from noisy measurements [1]. He did this by including information about the underlying dynamical system that governed the variables under consideration. With this, the optimal state of the system was inferred. To do this, he had two main assumptions: (1) all noise was Gaussian or normal and linearly additive; (2) the dynamical system was linear. The result was what is known as the Kalman filter.
Essentially, the algorithm follows the logic that if the previous state of the system is known, it could be used as the best guess for the current state. This information is used in two ways, the first is that prior to any measurement, the underlying dynamics of the system may be known. Given this knowledge and the previous state, the new state could be determined. Second, measurements of the unknown variables are taken. These two ways may conflict. Which solution should we believe? The answer is that we should believe them both, with some uncertainty. They should both be taken into account to determine what our new belief is for the state or what the values of the variables are after the measurements.
Bayesian inference is specifically designed to accommodate the problem of updating what we think of the world based on partial or uncertain information. It is well remarked that the Kalman filter is a special case of Bayesian inference [2]. We present our own derivation of the general Bayesian filter, then adapt it for Markov systems. A simple example is shown for pedagogical purposes with emphasis on the construction of the Kalman gain. Besides offering a greater pedagogical understanding of the algorithm, this also offers an insight into what to do when the Kalman assumptions are not valid or no longer apply. This allows the enhancement of sophisticated solutions, such as the extended Kalman, unscented Kalman, etc. [3].
However, Bayes rule does not assign probabilities; it is only a rule to manipulate them. The MaxEnt method [4,5] was designed to assign probabilities. This method has evolved to a more general method, the method of maximum (relative) entropy (MrE) [6,7], which has the advantage of not only assigning probabilities but updating them when new information is given in the form of constraints on the family of allowed posteriors. The main purpose of this paper is to show both general and specific examples of how the MrE can be applied using data and moment constraints. It should also be noted that Bayes’ rule is a special case of MrE. This implies that MrE is capable of producing every aspect of orthodox Bayesian inference and proves the complete compatibility of the Bayesian and entropy methods. Further, it opens the door to tackling problems that could not be addressed by either the MaxEnt or orthodox Bayesian methods individually; problems in which one has data and moment constraints. Thus, Kalman filters can be seen as a special case of filters developed using the MrE methods with Kalman assumptions.
In this paper, we will show several things; first, the derivation of the general Bayesian filter as used for problems that are of the nature that the Kalman filter is intended for, i.e., Markov systems. Second, we will show a simple example illustrating that the Kalman filter is a special case of the general Bayesian filter for pedagogical purposes. Third, we show that using the Kalman assumptions or “constraints”, we can arrive at the Kalman filter from MrE directly. Finally, we will show how the same Kalman logic can be applied to non-linear dynamical systems using Bayes rule and avoid approximations that are usually applied in extended Kalman filter and the unscented Kalman filter.
2. Bayesian Filter
Here, we will build the Bayesian filter. We start with Bayes rule,
At this point, we come to our first key assumption for Kalman; if xk is “complete” [9], then yk is not conditionally dependent on Yk−1 or p(yk|xk, Yk−1) = p(yk|xk). In other words, if we have, xk, then we do not need Yk−1 to determine the probability of yk. This then yields,
3. Kalman Filter
The second key assumption of the Kalman filter is that we assume that we do not have the past measurements, Yk−1, when trying to determine our belief for xk. This means that we need a form for our prior that allows us not to use past measurements. However, the previous value for the state, xk−1, is known.
Now, we will include the main Kalman assumptions above, first that all noise is Gaussian and linearly additive. Therefore, we will use Gaussians for our density distributions,
The next question is deciding on the value of the means, since we are not inferring those, as can be seen from the posterior. For this, we have to look at the “forward” problems for each density function. For the prior, the forward problem is xk = Fk,k−1xk−1 + ηk−1, where Fk,k−1 is called the “transition matrix” and for the likelihood, it is yk = Gkxk + ɛk, where Gk is called the “measurement matrix” function [10]. Each have Gaussian noise, ηk−1 and εk, with zero means, respectively. Therefore, for the prior, we have,
For the likelihood we have,
Note, this is similar to a least squares (which itself is a special case of Bayes). There is one more obvious question to be answered: while this may be a solution for the density function in regards to xk, how do we get xk−1? We need a single number. The answer depends on the what is considered the “best guess” or point estimate for xk−1. There are many choices, such as the mean, median or mode. However, since we are dealing with a symmetric solution, they are one in the same. Therefore, the easiest point estimate to get is the mode, x̂, where,
3.1. A Simple Example
To show its processing workflow, we show a very simple example. We wish to know our 1D location given the known dynamical system and a measurement at each time step. First, we let Gk = 1. Then, we apply the dynamical equation,
One last question that needs to be addressed is what is the value of xk−1? The last assumption of the Kalman filter is that the MAP estimate of the last estimate is the best estimate for this value. This is a key assumption and not trivial, as it means that with each iterative step, information would be lost generally. This is not the case for the Kalman filter, since the Gaussian is assumed, and therefore, the mode and the variance uniquely identify the distribution.
These solutions can be manipulated and written in the other following form, as well,
4. Maximum Relative Entropy
First, we present a review of maximum relative entropy. For a more detailed discussion and several examples, please see [6,11]. Our first concern when using the MrE method to update from a prior to a posterior distribution is to define the space in which the search for the posterior will be conducted. We wish to infer something about the values of one or several quantities, θ ∈ Θ, on the basis of three pieces of information: prior information about θ (the prior), the known relationship between x and θ (the model) and the observed values of the data x ∈ . Since we are concerned with both x and θ, the relevant space is neither nor Θ, but the product × Θ, and our attention must be focused on the joint distribution, P (x, θ). The selected joint posterior, Pnew(x, θ), is that which maximizes the entropy (in MrE terminology, we “maximize” the negative relative entropy, S, so that S ≤ 0. This is the same as minimizing the relative entropy),
The new information is the observed data, x′, which in the MrE framework must be expressed in the form of a constraint on the allowed posteriors. The family of posteriors that reflects the fact that x is now known to be x′ is such that,
We proceed by maximizing Equation (26) subject to the above constraints. The purpose of maximizing the entropy is to determine the value for P when S = 0, meaning that we want the value of P that is closest to Pold given the constraints. The calculus of variations is used to do this by varying P → δP, i.e., setting the derivative with respect to P equal to zero. The Lagrange multipliers, α, β and λ(x) are used so that the P that is chosen satisfies the constraint equations. The actual values are determined by the value of the constraints themselves. We now provide the detailed steps in this maximization process.
First we setup the variational form with the Lagrange multipliers,
In order to determine the Lagrange multipliers, we substitute our solution Equation (35) into the various constraint equations. The constant α is eliminated by substituting Equation (35) into Equation (29),
The Lagrange multiplier β is determined by first substituting Equation (42) into Equation (30),
5. Maximum Relative Entropy and Kalman
There are works where entropy maximization is being used in Kalman filtering [12–14]. For example, [14] uses entropy maximization as one of the approximation tools to reduce uncertainty. Here, we show that if the same assumptions are taken into account, the explicit closed form solution is derived. So, a numerical comparison, as in [14], becomes unnecessary and even limited. To the best of our knowledge, there is no work that shows a direct link between the original Kalman filter and maximization of the relative entropy and produces the closed form solution. We will now present a more complicated example illustrating the maximum relative entropy (MrE) solution and discuss the assumptions and constraints that lead to the same closed form Kalman filter solution.
This example consists of analyzing a linear system composed of two equations that represent linear motion with constant acceleration ca,k. The dynamics of the velocity, vk, and the position, xk, are encoded in the following two relationships, which we assume are relevant for predicting the linear motion of the next state,
Here, we will derive the so-called “prediction step”, which will be the posterior or the following optimization criterion or entropy, which has the form,
All constraints come from the same Kalman filter assumptions. We derive the first constraint using Equation (48), which is the variance,
- (1)
The means of all noise variables are zero;
- (2)
The joint distribution function is a multivariate normal distribution;
- (3)
The covariance matrix is not only valid for the previous posterior distribution discretization interval, but also for the current posterior distribution discretization interval, which in our case is P̄(xk, vk, ηx,k−1, ηv,k−1, ηa,k−1). In other words, it is implied that in our specific case, we have the following equalities,
- (4)
The last, but not the least, assumption in Kalman Filtering is that our noise variables are independent from our main state variables, i.e., xk and vk. The main benefit of this assumption is that we can manipulate Equation (53) by applying the constraints in Equations (55)–(58). Keep in mind that the means of noise variables are zeros, so many additive terms will zero out after integrations. Therefore, we finally get Equation (53) in the form of,
Similarly, we can construct two other MrE constraints based on Kalman filter assumptions as,
To be clear, this “posterior” Equation (70) is effectively the “prior” Equation (8) that would be in the Kalman filter.
5.1. Kalman Filter’s Updating Step
Our focus here is the traditional updating step of the Kalman filter and its reproduction by MrE. The measurement distribution needed would be obtained in a similar manner as in the predictive step using the following constraints,
5.2. Kalman Filter Revisited
We will now present the solution of the Kalman filter that is the same closed form solution as in the previous subsection. First, we need to construct our problem in matrix form.
The mathematical model or our state space system, as in Equation (48) and Equation (49), has the following matrix representation:
Sometimes, there are discussions [15] about the numerical stability of the Kalman filter and a different, and simplified version of covariance matrix Equation (97) is used [9],
Summarizing, if our state space system and its corresponding transition matrix is of such a size and/or sparsity that we can get its inverse matrix analytically in its reduced solution without numerical iterations, then selecting which version (Joseph’s stabilized or original) does not matter, because the closed form and the numeric answer would be the same. From a practical point of view, the maximum relative entropy method might be particularly useful for loosely coupled systems (not necessarily small), because its complexity (the total number of Lagrange multipliers) is equal to the total number of transition equations, and it has no explicit difficulty in calculating inverse matrices, because of the variational techniques used.
The fact is that both Equation (97) and Equation 99 return exactly the same closed form solution as MrE does in Equations (80)–(82), and by definition, these expressions are numerically stable, always. However, if one applies Kalman filter matrix operations and does not continue with further reductions of the final estimates’ expressions into their irreducible forms, then, yes, expression Equation (97) might be better to use compared to Equation (99). This shows one more benefit of MrE: the closed form solutions allow one to avoid numeric instability in certain situations.
6. Nonlinear Filter
The original Kalman filter has an assumption that the relationships between the state space system’s variables are linear. This assumption allows it to be expressed in a matrix form. Therefore, by definition, the Kalman filter is a linear filter, and nonlinear relationships have no explicit representations in transition matrix Equation (90). For that reason, variants of the Kalman filter were invented. One is called the extended Kalman filter, where any function is approximated locally by calculating a Jacobian at the approximate estimated location. Another variant is the unscented Kalman filter, which locally restricts the data sampling to a set of 2n+1 sigma points, where n is the dimension of the state space. It allows the avoidance of calculating the Jacobian and has the benefit that the nonlinear transition can be locally approximated by a cubic characteristic, if we looked from a single variable’s point of view. In the next section, we will derive the Kalman filter expression by a probabilistic approach by applying a one-to-one transition between the state variables, where the transition can be monotonic increasing or decreasing and not necessarily linear, as in original Kalman filter assumptions. More complex and generalized formulae might be found in [17].
6.1. Generalized Univariate Nonlinear Filter for Monotonic Transitions
In this section, we construct a general transformation of variables. While this can be found in undergraduate texts and advanced literature on Kalman filtering [18,19], we feel the need to include it, as it allows us to further point directly to why Kalman assumptions are necessary. Assume we have a single random variable, X (thus, a univariate approach). A random variable is a function whose value is subject to variations, due to some randomness. A value of this function (which we will call a random variable from now on) is associated with some probability (discrete case) or with probability density (continuous case). In this section, we deal with continuous real-valued data values only, but the approach itself is not restricted to them only.
The cumulative distribution function (cdf) of X is the function given by,
Assume we are measuring the traveled distance by a robot in meters (random variable X) and we can learn how many kilometers the robot has traveled (random variable Y); then, there is a physical relationship in the measurable space, because we know the ratio between kilometers and meters (A = 1000; note that this parameter is known, i.e., 1000, is a constant), as,
Current one-to-one assumptions are still more general than the original Kalman filter assumptions, because we allow not only the linear equation system of variables, but also the system of any continuously increasing or decreasing functions. Then, the definition or the meaning of Equation (103) can be represented by the following expressions:
6.2. Generalized Multivariate Nonlinear Filter for Monotonic Transitions
We begin the generation for a multivariate, nonlinear filter. A system of transformations in the measurement space is,
6.3. Kalman Filter Revisited Using a Jacobian
In this subsection, we will revisit the Kalman filter example from the previous section. Our state space system with transformation function g (x) stays the same,
7. Summary and Final Remarks
Kalman demonstrated an ingenious way to estimate unknown variables from noisy measurements, in part by making various assumptions. In this paper, we derive the Bayesian filter and, then, show that by applying the Kalman assumptions, we arrive at a solution that is consistent with the original Kalman filter for pedagogical purposes; explicitly showing that the “transition” or “predictive” step is the prior information and the “measurement” or “updating” step is the likelihood of Bayes’ rule. Further, we showed that the well-known Kalman gain is the new uncertainty associated with the posterior distribution.
Recently, a paper [22] used maximum relative entropy (MrE) with the Kalman assumptions, but did not explicitly state that there is a direct link between these two approaches. Here, we showed that the method of maximum relative entropy (MrE) explicitly produces the same mathematical solutions as the Kalman filter, and thus, Kalman is a special case of MrE. We also showed that the closed form solutions after the application of MrE are immune to real-life numeric problems, which might occur when manipulating the Kalman filter matrix operations.
By applying and manipulating pure probabilistic definitions and techniques used in signal analysis theory, we derived a general, nonlinear filter, where constraining the variables of interest in the form of continuous monotonic increasing or decreasing functions and not necessarily a linear set of functions, like in the original Kalman filter. Thus, we can include more information and extend approximation approaches, such as the extended Kalman filter and unscented Kalman filter techniques and other hybrid variants.
In the end, we derived general distributions using MrE for use in Bayes’ Theorem for the same purposes as the original Kalman filter and all of its offshoots. However, MrE can do even more. An important future work will be to include simultaneous constraints on the posterior that Bayes cannot do easily alone, such as including macroscopic or course-grained relationships between the various variables of interest. This has been demonstrated in [11].
Acknowledgments
We would like to acknowledge valuable discussions with Julian Center and Peter D. Joseph.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic. Eng 1960, 82, 35–45. [Google Scholar]
- Meinhold, R.J.; Singpurwalla, N.D. Understanding the Kalman Filter. Am. Statis 1983, 37, 123–127. [Google Scholar]
- Gibbs, B.P. Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook; Wiley: New York, NY, USA, 2011. [Google Scholar]
- Rosenkrantz, R.D., Ed.; E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics; Dordrecht: Reidel, Holand, 1983.
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Giffin, A.; Caticha, A. Updating Probabilities with Data and Moments. In AIP Conference Procedings Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Knuth, K., Caticha, A., Center, J. L., Giffin, A., Rodríguez, C.C., Eds.; American Institute of Physics: Melville, NY, USA, 2007; Volume 954, p. 74. [Google Scholar]
- Caticha, A.; Giffin, A. Updating Probabilities. In Conference Procedings of Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Mohammad-Djafari, A., Ed.; American Institute of Physics: Melville, NY, USA, 2006; Volume 872, p. 31. [Google Scholar]
- Cox, R.T. Probability, frequency, and reasonable expectation. Am. J. Phys 1946, 14, 1–13. [Google Scholar]
- Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Chen, Z. Bayesian filtering: From Kalman filters to particle filters, and beyond. Statistics 2003, 182, 1–69. [Google Scholar]
- Giffin, A. From physics to economics: An econometric example using maximum relative entropy. Physica A 2009, 388, 1610–1620. [Google Scholar]
- Mitter, S.K.; Newton, N.J. A Variational Approach to Nonlinear Estimation. SIAM J. Contr 2004, 42, 1813–1833. [Google Scholar]
- Mitter, S.K.; Newton, N.J. Information and Entropy Flow in the Kalman-Bucy Filter. J. Stat. Phys 2005, 118, 145–176. [Google Scholar]
- Eyink, G.; Kim, S. A maximum entropy method for particle filtering. J. Stat. Phys 2006, 123, 1071–1128. [Google Scholar]
- Joseph, P.D. Kalman Filter Lessons. Available online: http://home.earthlink.net/~pdjoseph/id11.html (accessed on 18 February 2014).
- Crassidis, J.L.; Junkins, J.L. Optimal Estimation of Dynamical Systems; Chapman & Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
- Poularikas, A. D. The Handbook of Formulas and Tables for Signal Processing; CRC Press: Boca Raton, FA, USA, 1998. [Google Scholar]
- Jazwinski, A. H. Stochastic Processes and Filtering Theory; Academic Press: New York, NY, USA, 1970. [Google Scholar]
- Crisan, D.; Rozovskii, B. L. The Oxford Handbook of Nonlinear Filtering; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
- Katok, A.; Hasselblatt, B. Introduction to the Modern Theory of Dynamical Systems; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
- Beck, C.; Schögl, F. Thermodynamics of Chaotic Systems: An Introduction; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Urniezius, R. Online robot dead reckoning localization using maximum relative entropy optimization with model constraints. In AIP Conference Proceedings of Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Mohammad-Djafari, A., Ed.; American Institute of Physics: Melville, NY, USA, 2011; Volume 1305, p. 274. [Google Scholar]
© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Giffin, A.; Urniezius, R. The Kalman Filter Revisited Using Maximum Relative Entropy. Entropy 2014, 16, 1047-1069. https://doi.org/10.3390/e16021047
Giffin A, Urniezius R. The Kalman Filter Revisited Using Maximum Relative Entropy. Entropy. 2014; 16(2):1047-1069. https://doi.org/10.3390/e16021047
Chicago/Turabian StyleGiffin, Adom, and Renaldas Urniezius. 2014. "The Kalman Filter Revisited Using Maximum Relative Entropy" Entropy 16, no. 2: 1047-1069. https://doi.org/10.3390/e16021047
APA StyleGiffin, A., & Urniezius, R. (2014). The Kalman Filter Revisited Using Maximum Relative Entropy. Entropy, 16(2), 1047-1069. https://doi.org/10.3390/e16021047