1. Introduction
The PnP problem is a classic computer vision problem that involves estimating the pose of a calibrated camera in 3D space, given a set of correspondences between 3D points in the world and their 2D projections in the camera image. The goal is to determine the camera’s position and orientation (i.e., its pose) relative to the 3D points in the world. The PnP problem has many important applications in robotics, augmented reality, and computer vision. There are several algorithms that have been developed to solve the PnP problem, including the classic Direct Linear Transformation (DLT) [
1] algorithm and more recent algorithms such as the Efficient Perspective-n-Point (EPnP) [
2] algorithm and the Universal Perspective-n-Point (UPnP) [
3] algorithm. These algorithms are based on different optimization techniques, and they have been shown to provide better accuracy and robustness than the classic DLT algorithm. Other work includes [
2,
4,
5,
6,
7,
8].
We next present the notations used hereafter and outline the approach we followed, which yielded a polynomial optimization problem. Thus, we continue a long line of research, starting with [
4], with some of the more recent work including [
5,
6,
9,
10]. As we shall elaborate in the following sections, our contribution is twofold: we offer a method to solve the optimization problem, which is much faster than previous work, as well as a guaranteed approximation, which is also easily parallelizable.
Given 2D projections of 3D points whose real-world coordinates are known, one seeks to determine the camera position and angle, i.e., a translation vector T and a rotation matrix R, relative to the world coordinate frame, which provide an optimal fit between the 3D points and their projections on the camera image plane.
Denoting by
the unit vectors in the direction of the projections of
, we obtain the following very-well-known expression to minimize [
4]:
Geometrically (see
Figure 1, borrowed from [
11]), rotating the point set
by the optimal
R and translating by the optimal
T maximize the sum of squared norms of their respective projections on the lines spanned by
(i.e., the points are optimally aligned with the respective “lines of sight”).
To minimize Equation (
1), one first differentiates by
T and equates to 0; this yields
T as a function of
R. Substituting back in Equation (
1) yields:
where
R is the “flattened” (
) rotation matrix, and
Therefore, we are now faced with the problem of minimizing a quadratic polynomial in the nine elements of a rotation matrix, which are subject to six quadratic constraints (the rows must form an orthonormal system). The common approach to this problem is to parameterize the set of rotation matrices by unit quaternions, as follows:
which transforms the problem into one of minimizing a quartic over the three-dimensional unit sphere
. This solution is global and does not rely on iterative schemes. In an extensive set of experiments [
5], it was found to be more accurate than other methods. The accuracy of this approach is also noted in many other publications, including the recent [
9,
10]. Since the running time of the algorithm in [
5] is faster than the ones provided in other papers, which approach PnP as the polynomial minimization problem described herewith (more on this in
Section 5.1), we directly compared our running times with those given in [
5], but the improvement was generic, as will be elaborated in
Section 3.
We shall refer to the polynomial that should be minimized as and, to reduce equation clutter, will denote its variables as , and not .
Due to the great importance of PnP in real-life applications, the minimization problem has been addressed in numerous papers, starting with [
4]; for a recent survey, see [
5]. There does not exist, however, a fast solution that is guaranteed to work in all problem instances; solving with Lagrange multipliers leads to a set of equations for which no closed-form solution exists. Therefore, it is customary to apply a convex relaxation approach, which we describe in
Section 2. In
Section 3, we describe a faster solution to the relaxation approach, which also requires less memory. In
Section 4, we present a different type of solution, which achieves a guaranteed approximation to the non-relaxed problem; experimental results are presented in
Section 5; conclusions are offered in
Section 6; some technical details are provided in
Appendix A.
2. The Lasserre Hierarchy
Our point of departure from previous work is the approach for minimizing
on
. We begin by describing the existing approach; see, for example, [
5].
The Lasserre hierarchy [
12,
13,
14] is a powerful tool for solving polynomial optimization problems and has found many applications in a variety of fields, including control theory, robotics, and machine learning. One advantage of the Lasserre hierarchy is that it can be implemented using off-the-shelf semidefinite programming solvers, which makes it accessible to a wide range of users. The main drawback of the Lasserre hierarchy is that the computational complexity of the hierarchy grows rapidly with the degree of the polynomials involved, which limits its applicability to problems with a low to moderate degree. Here, we only describe its application to the problem at hand, minimizing
over the three-dimensional unit sphere
.
Definition 1 ([
15])
. (2.4): Given a fourth-degree (quartic) polynomial and an even integer , define the corresponding k-th degree problem bywhere is an arbitrary polynomial of degree and a polynomial of degree k, which can be expressed as the Sum Of Squares of polynomials (SOS) of degree . The advantage of this so-called “SOS relaxation” is that solving for the maximal can be reduced to a Semidefinite Programming (SDP) problem, for which there exist efficient numerical algorithms.
Note that, if the polynomial equality:
holds, then, since
on
and obviously
everywhere, clearly,
on
; hence,
is a lower bound on the sought minimum. Alas, there is no useful upper bound on the value of
k for which the resulting
is equal to the minimum. This problem is exacerbated by the fact that the complexity of the SDP problem rapidly increases with
k. To see why, observe the following result (for the proof, see [
16]).
A polynomial
of even degree
k in
can be expressed as a sum of squares iff it can be written as
, where
v is the vector of monomials in
of a total degree not exceeding
and
B a semidefinite positive matrix. For example, if
, we must have that:
where
B is a
semidefinite positive matrix, and
However, while there are 15 monomials of a total degree no more than 2, for the next case (), we must consider all monomials of a degree no more than 3, of which there are 35, which means that the corresponding B will be of size . Generally, there are such monomials.
In order to obtain reasonable running times, a solution that has been widely adopted by the computer vision community [
4,
5] is to solve the SOS relaxation for the case of
. This immediately raises the following question: Are there PnP instances in which the solution of the fourth-degree problem is different from the correct one (i.e., the global minimum of
)? To the best of our knowledge, this question was first answered (in the affirmative) in two recent papers [
11,
17]. These cases appear to be very rare, as described in [
17]. An exact answer to the question of just
how rare they are is unknown, as it touches on the very difficult and yet-unsolved problem of the relative measure of SOS polynomials among all positive polynomials, for which only asymptotic results are known [
18].
In this paper, we offer two solutions to optimizing :
Algorithm 1: This solves the four-degree problem, but in a much faster way than [
2,
3,
4,
5,
9,
10], by relying on the fact that
is homogeneous (i.e., contains only terms of total degree four). While it is not guaranteed to recover the global minimum, neither is the commonly accepted solution described above. In numerous tests run on real PnP data, both solutions gave identical results, with our proposed method being faster by an order of magnitude.
Algorithm 2: In order to obtain a guaranteed approximation to the global minimum, we optimized on “slices” of , each of which is a linear image of the two-dimensional unit sphere . Then, we applied a famous theorem of Hilbert, which states that every positive homogeneous fourth-degree polynomial in three variables can be represented as a sum of squares. We offer an analysis, backed by experiments, to determine how many slices are required to obtain a good approximation of the global minimum over .
We now proceed to a detailed study of the relaxed problem and offer our improvement.
3. Algorithm 1: Approach and Reduction of Complexity
We began by studying the complexity of the widely used fourth-degree relaxation, sketched in
Section 2. Recall the problem:
where
B is a
semidefinite positive matrix:
and
is an arbitrary second-degree polynomial in
.
This leads to a Semidefinite Programming (SDP) problem, in which must be maximized, under the following constraints:
B is a semidefinite positive matrix.
There are 70 equalities that must be satisfied, which correspond to the 70 coefficients of .
These 70 equalities also depend on the 15 coefficients of the quadratic polynomial .
In the SDP formulation, this problem is posed as follows:
where, for two matrices of the same dimensions
,
is defined as
(also equal to the trace (
AB)) and for a square and symmetric matrix
P,
stands for
P being semidefinite positive.
This problem is typically solved by formulating the following dual-problem, which also allows recovering the point at which the minimum is obtained:
The SDP problem can be solved using available packages such as SDPA [
19] or by direct optimization [
5].
3.1. Improving Algorithm 1
The size of the SDP problem is, naturally, a crucial factor in the complexity of its solution. In the above problem, the matrix X is of size , and there are 70 equality constraints, corresponding to the 70 coefficients of . In addition, there are 15 “free” (i.e., without constraints) coefficients of , which must be recovered.
Proceeding as in [
19], this entails an optimization procedure based on the famous
iterative barrier method, with a Hessian of size
and, with each iteration step, solving linear systems of size
.
We propose a substantially simpler solution, which is an order of magnitude faster than previous work, while producing identical results in numerous tests on real PnP data. Specifically, in our solution, the matrix X is smaller ( vs. ), and there are fewer coefficients to recover (34 vs. 85). Since a major part of the solution is computing the inverse of the Hessian matrix (which is for a matrix of size ) and our Hessian is quite smaller, the proposed solution is about 10-times faster than in previous work (it takes about 1 ms, on a relatively weak i7-6500u processor to find the minimum).
Details of Our Proposed Improvement
Our proposed solution uses the fact that the polynomial is homogeneous, meaning it contains only monomials with total degree four, and all the other coefficients are equal to 0.
Suppose that the minimum of
on
is
. Since on
, it holds that
, then on
, the following trivially holds:
However,
is also homogeneous, as it clearly contains only monomials of total degree four; hence we have, for every real number
:
but for every
,
, and
therefore,
on
iff
on
.
Next, we followed the standard SOS relaxation [
12]: assume that if a fourth-degree polynomial
is non-negative on
, it can be written as
for some
matrix
B such that
. Note that we used the fact that the polynomial is homogeneous; hence, the vector of monomials contains only 10 entries, and not 15.
Therefore, we can replace the previous optimization problem by
The size of the problem can be further reduced as follows. By equating the coefficient of
on both sides of Equation (
5) and denoting the coefficient of
by
, we can see that
. Since we wish to maximize
, we can minimize
(because
is fixed). Therefore, now, the problem becomes
This problem has no inequality constraints and can be described as the problem of minimizing
, where
C is the
matrix with
and all other elements equal to 0, under 34 equality constraints on
B, corresponding to the 34 coefficients (we do not need one for
). In addition to working with smaller matrices, we do not require any auxiliary variables as the 15 coefficients of
in previous work, e.g., [
5], and the constraint matrices corresponding to the 34 coefficients are very sparse (see
Appendix A), which further reduces the running time [
20].
Next, we considered the dual-problem, with the well-known method of replacing
with
yielding the following problem:
such that:
where
is a semidefinite positive matrix.
3.2. Finding a Good Starting Point
It is well known that finding a good initial point greatly affects the running time of SDP algorithms. These algorithms seek a solution in which a matrix
B must be semidefinite positive. Typically, they follow the “barrier method”, which imposes a penalty on matrices whose determinant is very small, thus keeping the candidate matrix from crossing the “barrier” defined by the set of singular matrices [
14,
19,
21]. Ideally, an initial point is at the center of the region that contains the viable solutions for
B. We next describe how this can be achieved for our problem.
Recall that, in the proposed solution, the term containing the matrix
B equals
It is known from the theory of SDP [
12,
14] that the optimal
B consists of the corresponding
moments of the solution
(and this is what allows extracting the solution from
B). Since we know that all points
are equally viable as solutions, we can compute the center of mass of the viable
B-region by
These integrals can be computed as follows. Firstly, note that, from the symmetry and parity considerations, all of them are equal to 0, except
i.e., all those containing a fourth power of a variable or the product of squares of two variables. Next, we computed
using the well-known parameterization of
:
and the Jacobian, corresponding to the three-dimensional area element, equals
. Therefore, after normalizing by the surface area of
, which equals
, we obtain
and similarly, the matrix elements corresponding to
equal
. It follows that the center of mass of the viable region, henceforth denoted by
, is equal to:
After offering our improvement for the solution of the relaxed problem, we next handled the non-relaxed version and offer a fast, easily parallelizable solution with guaranteed accuracy.
4. Algorithm 2: Solution with “Slices”
In this section, we offer our additional contribution to solving the algebraic optimization problem. While efficient and widely used, the method we discussed in
Section 2 and
Section 3 does not solve the original problem of minimizing
on the unit sphere
, but a relaxed version of it; specifically, the relaxation consists of replacing the condition that a polynomial is everywhere positive (which is notoriously difficult to check; actually, it is NP-complete [
12,
14]), by an easy-to-verify (and impose) condition: that the polynomial is the sum of squares of polynomials—or, equivalently, that it can be written as
, where
B is a semidefinite positive matrix and
v the vector of monomials with half the degree of the given polynomial (clearly an odd degree polynomial cannot be everywhere positive).
Since the entire well-developed and very successful theory of Sum Of Squares (SOS) optimization relies on this relaxation, a great deal of effort has been put into studying the following question: How “tight” is the relaxation, i.e., can the difference between the optimal solution and the relaxed one be bounded? There are still no general answers to this question. In the context of the PnP problem, recent work showed that, in principle, the relaxation may fail [
17], and in [
11], a concrete example was provided for such a failure, even in the six-degree stage of the Lasserre hierarchy (Definition 1).
We propose a very simple approximation algorithm to optimizing , with the following properties:
is optimized on a pre-set number of “slices” of , defined by the intersection of with sub-spaces defined by .
Applying a famous theorem of Hilbert [
22] (see the ensuing discussion) to our approach described in
Section 3, it turns out that the absolute minimum on every slice can be
exactly computed using the relaxation we presented in
Section 3.
The optimization for each slice is extremely fast, as the sought positive semidefinite matrices are of size .
Since the difference between the coefficients of the polynomials of two nearby slices is very small, the optimization solution for the ith slice can be used to find a good starting point for optimizing over the (i + 1)th slice.
Further speedup of the optimization can be achieved by parallelizing the optimization, by diving it into separate optimizations run in parallel over sets of adjacent slices.
We next describe the method. First, for a given scalar
, denote
which we refer to as the
-
slice. In
Figure 2, the equivalent notion of slices of
is depicted.
Next, we address the problem of optimizing
on
. Clearly,
is a fourth-degree homogeneous polynomial in three variables, and by re-scaling
z, it can be considered as defined on
. As in Algorithm 1, we can seek its minimum by solving
However, due to the following famous theorem of Hilbert, there is a crucial difference between this problem and that of Algorithm 1.
Theorem 1. Every non-negative homogeneous polynomial of degree four in three variables can be expressed as a sum of squares or, equivalently, asfor some semidefinite positive matrix B. The theorem was first proven by Hilbert [
22]. Easier-to-follow proofs were later discovered, for example Chapter 4 in [
16], but they are still quite elaborate. It follows immediately that the corresponding SDP problem will always return the global minimum over
.
Approximation Quality of the Slices Method
In order to estimate how well the minimum over the slices approximates the minimum over , we first provide a result that quantifies how well the slices approximate every point on .
Lemma 1. It is possible, with n slices, to approximate every point on to within a distance of .
Proof. We shall use two types of slices:
and
. Hence, we can assume without loss of generality that
. We used slices with
. Using rotational symmetry, we can assume that
, for
, and sought the closest point to
on the hyperplane defined by
. This point is readily seen to equal
and its squared distance from
equals
Since we have, for the
-slice,
, the previous expression equals
its Taylor expansion around
equals
and since
and
, the result immediately follows. □
Next, we analyzed the distance between the minimum over the slices and the minimum over
. Recall that the function to be minimized is:
for some semidefinite positive matrix
B, which depends on the input to the PnP problem (i.e., points
and directions
; see
Section 1). Therefore, we need to analyze the difference between Equation (
6) for a point
and a point
q whose distance from
p is at most
. Proceeding as in matrix perturbation theory [
23], we denote
, where
denotes a “small” vector, in the sense that all squares of its components are very small and can be ignored in the forthcoming analysis.
We start by estimating the difference between
P and
Q, which denotes the length-10 vectors constructed from
, i.e.,
and similarly for
Q. Since a direct computation shows that:
it is clear that
. Furthermore, we have:
Ignoring terms of order at least two in
, we obtain:
Expanding and using
, as well as the Cauchy–Schwarz inequality allow bounding the last expression by
; hence, we have
. Lastly, denoting
and again ignoring the higher-order terms, we obtain (using the inequality
):
which provides a bound on the error of the approximate minimum.
Next, the results of both our algorithms are presented.
5. Results
Experimental Data
The data we used for comparison with [
5], as well as the results for the various methods tested in [
5] can be found at
https://tinyurl.com/bdudy3z3, accessed on 11 April 2023. These data are described in detail in [
5] and consist of webcam and drone data, as well as noisy and synthetic data. Altogether, there are 230MB of data, consisting of image coordinates
and line of sight direction
(Equation (
1)). The results are for all methods and consist of the rotation matrix
R and translation
T between frames (Equation (
1)). Our results were identical to those of the NPnP method introduced in [
5], but about ten-times faster on average (
Figure 3).
5.1. Experimental Results for Algorithm 1
We chose the experiments in the recent paper [
5] as a baseline for comparison, for the following reasons:
It contains extensive experiments, comparing many different algorithms, on different types of data, also with varying levels of noise.
To the best of our knowledge, it is the only paper that addresses the solution of PnP as an SDP problem (first suggested in [
4]) with dedicated software, as opposed to an off-the-shelf SDP package. The considerable effort to tailor the solution to PnP reduced the running time by a factor of about three relative to standard packages.
The comparison of the global optimization approach to PnP (which we also followed here) proved it to be more accurate than other popular methods, including EPnP [
2], DLS [
24], RPnP [
25], and the iterative method presented in [
26].
Additional evidence for the superior performance (in terms of accuracy) of SDP-based methods is provided in other papers, including the recent [
9]; this further motivated our effort to reduce the running time of SDP.
Using the approach described in
Section 3.1 with
(
Section 3.2) as an initial point for the SDP iterations, with the SDPA package [
19], yielded a reduction in the running time of about an order of magnitude relative to the state-of-the-art results in [
5]. This improvement can be expected, given the reduction in the number of variables relative to previous work, which entails a reduction in the size of matrices that should be inverted (from
to
). Thus, its advantage does not depend on the specific SDP tools applied, nor the computational platform.
The proposed algorithm was compared to [
5] on 10,000 PnP inputs. In all cases, the results were identical up to four decimal points. The distributions of the running times are presented in
Figure 3.
5.2. Experimental Results for Algorithm 2
In experiments on 10,000 instances of real PnP data, the error was quite small and decreased rapidly when the number of slices was increased.
Table 1 provides the average and maximal relative error of Algorithm 2 vis-à-vis the global minimum (note that the error was
one-sided, that is the global minimum was always smaller than the one provided by Algorithm 2).
Running Time of Algorithm 2
For an individual slice, the solution consisted of optimizing a quartic ternary form, i.e., a homogeneous polynomial of degree four with three variables. Using the approach described in
Section 3 for the similar problem with four variables led to an SDP problem with a
matrix, which can be solved very quickly. Further speedup was obtained by partitioning the slices into disjoint sets, each consisting of consecutive
values, and solving for each set separately, in parallel, on a standard multi-core processor. Since the coefficients for polynomials of consecutive slices are nearly identical, further speedup can be obtained by using the solution for the previous slice as a starting point for the current one; this is, however, not trivial as for other optimization problems, as we now elaborate.
Using an Initial Guess for Consecutive Slices
Suppose we have optimized the polynomial
, and wish to continue with the consecutive slice, that is optimize
, where
is very close to
. Since the coefficients of both polynomials are very close, ostensibly, we could use the solution for
as an initial point for optimizing
; alas, there is a subtle issue at play. Recall that the sought semidefinite positive matrix
B appears in the expression, which we optimized as
therefore, it is well known from moment theory [
14] that, if
is the point at which the minimum is attained, the optimal
B is of the form:
Evidently, this matrix is highly singular (of rank one), while a good starting point should be in the interior of the feasible region (semidefinite positive matrices). We tested two solutions to this problem, which still allow gaining from the proximity of the two slices:
While solving for the first slice in a set of nearby slices, if n iterations are required (for the SDPA package we used, typically ), store the iteration, and use it as a starting point for the other slices. This initial guess, while closer to the solutions than a random starting point, is well within the interior.
As in
Section 3, compute the center of the feasible region,
, for our problem. Then, if the solution matrix for the
slice is
, choose as an initial point for the
slice (
) a convex combination of
and
. Intuitively speaking, we take the (singular) solution for
and slightly “push it away” from the boundary of the feasible region, thus creating an initial point that is both close to the solution and in the interior of the feasible region.
Method 2 above produced slightly better results in terms of running times. For 40 slices, the average running time was 1.35 ms, i.e., slightly higher than Algorithm 1.