The distance from the local polytope can be computed efficiently, once we have a set of vertices that is small enough and suitable for representing the closest distribution . The algorithm introduced in this paper solves Problem 1 by iteratively generating a sequence of sets . At each step, the minimal distance is first computed over the convex hull of the given vertices. Then, the oracle is consulted. If the set does not contain the right vertices, the oracle returns a strictly positive maximal value and a vertex, which is added to the set (after possibly removing vertices with zero weight). The optimization Problem 1 is solved once the oracle returns zero, which guarantees that all the optimality conditions of the problem are satisfied. Before discussing the algorithm, let us derive these conditions.
5.1. Necessary and Sufficient Conditions for Optimality
Problem 1 is a convex optimization problem whose constraints satisfy Slater’s condition, requiring the existence of an interior point of the feasible region. This is the case, as a positive strictly satisfies all the inequality constraints. Thus, the four Karush–Kuhn–Tucker (KKT) conditions are necessary and sufficient conditions for optimality. Let us briefly summarize these conditions. Given an objective function of the variables and equality constraints , it is well known that the function F is stationary at if the gradient of the Lagrangian is equal to zero, for some value of the Lagrange multipliers . This is the first KKT condition. The second condition is the feasibility of the constraints; that is, the stationary point must satisfy the constraints . These two conditions are necessary and sufficient, as there are only equality constraints. If there are also inequalities, two additional conditions on the associated Lagrange multipliers are required. Given inequality constraints , with associated Lagrange multipliers , the third condition is the non-negativity of the multipliers; that is, . This condition says that the constraint acts only in one direction, like a floor acts on objects through an upward force, but not with a downward force. The last condition states that the Lagrange multiplier can differ from zero only if the constraint is active; that is, if . This is like stating that a floor acts on a body only if they are touching (contact force). This condition can concisely be written as .
Let us characterize the optimal solution of Problem 1 through the four KKT conditions.
The stationarity condition on the gradient of the Lagrangian gives the equality
Eliminating
, this equality and the dual feasibility yield the inequality
From Equation (
18), we have that the complementary slackness is equivalent to the following condition,
that is, the left-hand side of the last inequality is equal to zero if
is in the support of
. The slackness condition (
20), the primal constraint and Equation (
19) provide necessary and sufficient conditions for optimality. Let us introduce the function
which is the opposite of the gradient of
F with respect to
, up to the factor
. Summarizing, the conditions are
The second condition can be rewritten in the more concise form
Indeed, using Equations (
22) and (24), it is easy to show that condition (23) is satisfied if and only if
which gives equality (
25), by definition of
(Equation (
9)).
Condition (
22) can be checked, by consulting the oracle with
as the query. If the oracle returns a non-positive maximal value, then the condition is satisfied. Actually, at the optimal point, the returned value turns out to be equal to zero, as implied by the other optimality conditions.
Similar optimality conditions hold if we force to be equal to zero outside some set . Let us introduce the following minimization problem.
The optimal value of this problem gives an upper bound on the optimal value of Problem 1. The two problems are equivalent if the support of a minimizer
of Problem 1 is in
. The necessary and sufficient conditions for optimality of Problem 2 are the same as of Problem 1, with the only difference that condition (
22) has to hold only in the set
. That is, the condition is replaced by the weaker condition
Thus, an optimizer of Problem 2 is solution of Problem 1 if the value returned by the oracle with query is equal to zero.
Hereafter, the minimizer and the minimal value of Problem 2 will be denoted by
and
, respectively. The associated optimal local distribution
, defined by Equation (
9), will be denoted by
.
5.2. Overview of the Algorithm
Problem 1 can be solved iteratively by finding the solution of Problem 2 over a sequence of sets
. The sets are built according to the answer of the oracle, which is consulted at each step of the iteration. The procedure stops when a desired accuracy is reached or
contains the support of a minimizer
, and the solution of Problem 2 is also the solution of Problem 1. Let us outline the algorithm. Suppose that we choose the initial
as a set of sequences
associated to
linearly independent vertices (
being possibly equal to 1). Let us denote this set by
. We solve Problem 2 with
and get the optimal value
with minimizer
. Let us denote the corresponding (unnormalized) local distribution by
. That is,
Since the cardinality of
is not greater than
and the problem is a convex quadratic optimization problem, the corresponding computational complexity is polynomial. Generally, a numerical algorithm provides an optimizer, up to some arbitrarily small but finite error. In
Section 5.5, we will provide a bound on the accuracy required for the solution of Problem 2. For now, let us assume that Problem 2 is solved exactly. If the support of
is in
,
is equal to the optimal value of Problem 1, and we have computed the distance from the local polytope. We can verify if this is the case by checking the first optimality condition (
22), as the conditions (23) and (24) are trivially satisfied by the optimizer of Problem 2 for every
. The check is made by consulting the oracle with the function
as the query. If the oracle returns a maximal value equal to zero, then we have the solution of Problem 1. Note that if the optimal value of Problem 2 is equal to zero, then also the optimal value of the main problem is equal to zero and the conditional distribution
is local. In this case, we have no need of consulting the oracle.
If the optimal value of Problem 2 is different from zero and the oracle returns a maximal value strictly positive, then the minimizer of Problem 2 satisfies all the optimality conditions of Problem 1, except Equation (
22) for some
. The next step is to add the pair of sequences
returned by the oracle to the set
and solve Problem 2 with the new set. Let us denote the new set and the corresponding optimal value by
and
, respectively. Once we have solved Problem 2 with
, we consult again the oracle to check if we have obtained the solution of Problem 1. If we have not, we add the pair of sequences
given by the oracle to the set
and we solve Problem 2 with the new set, say
. We continue until we get the solution of Problem 1 or its optimal value up to some desired accuracy. This procedure generates a sequence of sets
and values
. The latter sequence is strictly decreasing, that is,
until
contains the support of
and the oracle returns zero as maximal value. Let us show that. Suppose that
is the optimizer of Problem 2 with
and
is the new element in the set
. Let us denote by
the local distribution associated with
, that is,
The optimal value
of Problem 2 is bounded from above by the value taken by the function
for every feasible
, in particular, for
with
positive. Let us set
equal to the value minimizing
F; that is,
which is equal to the value returned by the oracle. It is strictly positive, as the oracle returned a positive value—provided that
does not contain the support of
. Hence,
is a feasible point and, thus, the corresponding value taken by
F,
is an upper bound on
. Hence,
that is,
is strictly smaller than
.
This procedure generates a sequence
that converges to the optimal value of Problem 1, as shown in
Section 6. For any given accuracy, the computational cost of the procedure is polynomial, provided that we have access to the oracle.
To avoid growth of the cardinality of
beyond
during the iteration and, thus, the introduction of redundant vertices, we have to be sure that the sets
contain points
associated to linearly independent vertices
of the local polytope. This is guaranteed by the following procedure of cleaning up. First, after the computation of
at step
n, we remove the elements in
where
is equal to zero (this can be checked even if the exact
is not known, as discussed later in
Section 5.6). Let us denote the resulting set by
. Then, the set
is built by adding the point given by the oracle to the set
. Let us denote by
the set of vertices associated to the elements in the support of
. The cleaning up ensures that the optimizer
is in the interior of the convex hull of
, up to a normalization constant, and the new vertex returned by the oracle is linearly independent of the ones in
. Indeed, we have seen that the introduction of such a vertex allows us to lower the optimal value of Problem 2. This would not be possible if the added vertex was linearly dependent on the vertices in
, as the (normalized) optimizer
of Problem 2 is in the interior of the convex hull of
.
This is formalized in Lemma 1.
Lemma 1. Let be a sequence such that If Ω is a set such thatthen the vertex is linearly independent of the vertices associated to the sequences in Ω. Proof. The proof is by contradiction. Suppose that the vector
is linearly dependent with the vectors
with
, then there is a real function
such that
By definition of
, this equation implies that
. From this equation and Equation (
34), we have
Summing over
r and
s, we get a contradiction with Equation (
33). □
This lemma and the optimality conditions (
22) and (23) imply that the sets
, built through the previously discussed procedure of cleaning up, always contain points associated to independent vertices and, thus, never contain more than
elements. Indeed, the set
contains points
where the minimizer
is different from zero, for which
as implied by condition (23). Furthermore, given the sequence
returned by the oracle, condition (
22) implies that
until the set
contains the support of
and the iteration generating the sequence of sets
is terminated.
The procedure of cleaning up is not strictly necessary for having a polynomial running time, but it can speed up the algorithm. Furthermore, the procedure guarantees that the distribution approaching the minimizer during the iterative computation is always represented as the convex combination of a minimal number of vertices. Thus, we have a minimal representation of the distribution at each stage of the iteration.
5.4. Stopping Criterion for Algorithm 1
The lower bound on
, denoted by
, is computed by using the dual form of Problem 1. As shown in
Section 6.1, any local distribution
induces the lower bound
where
is the maximal value returned by the oracle with
as query. An upper bound on
is obviously
In the limit of
equal to the local distribution minimizing
F, the lower bound is equal to the optimal value
. This can be shown by using the optimality conditions. Indeed, conditions (
22) and (
25) imply the limits
which imply
as
approaches the minimizer. This is made even more evident, by computing the difference between the upper bound and the lower bound. Indeed, given the local distribution
computed at Step 3 and the corresponding
returned by the oracle at Step 4, the difference is
which evidently goes to zero as
goes to
. Thus, the upper bound
on the accuracy computed in Step 5 goes to zero as
approaches the solution. This guarantees that the algorithm stops sooner or later at Step 6, provided that
converges to the solution. If Problem 2 is solved exactly at Step 3, then the distribution
satisfies condition (
25), and the upper bound on the reached accuracy takes the form
Even if Condition (
25) is not satisfied, we can suitably normalize
so that the condition is satisfied.
In the following, we assume that this condition is satisfied.
5.5. Stopping Criterion for Problem 2 (Optimization at Step 3 of Algorithm 1)
In Algorithm 1, Step 3 is completed when the solution of Problem 2 with a given set is found. Optimization algorithms iteratively find a solution up to some accuracy. We can stop when the error is of the order of the machine precision. Here, we will discuss a more effective stopping criterion. This criterion should preserve the two main features previously described:
The sequence of the exact optimal values of Problem 2, with , is monotonically decreasing.
The sets contain points associated with linearly independent vertices of the local polytope, implying that the cardinality of is never greater than .
To guarantee that the first feature is preserved, it is sufficient to compute a lower bound on
from a given
so that the bound approaches
as
approaches the optimizer
. If the lower bound with the set
is greater than the upper bound
on
(see Equation (
31)), then
. Denoting by
the lower bound on the optimal value
, the monotonicity of the sequence
is implied by the inequality
As shown later, by using dual theory, a lower bound on
is
where
and
is an unnormalized local distribution, associated to a function
with support in
. This bound becomes equal to
in the limit of
equal to the minimizer of Problem 2. Equation (
43) gives the condition
where
and
is the local distribution computed in Step 3. If this condition is satisfied by the numerical solution found in Step 3, then the series
is monotonically decreasing. As we will see, to prove that the series converges to the minimizer of Problem 1, we need the stronger condition
where
is any fixed real number in the interval
. A possible choice is
. If this inequality is satisfied in each iteration of Algorithm 1, the sequence
satisfies the inequality
which turns out to be equal to Equation (
32) in the limit
. The right-hand side of Equation (
47) goes to zero as
approaches the optimizer, as implied by the optimality conditions of Problem 2. Thus, if the set
does not contain all the points where
is different from zero, then the inequality is surely satisfied at some point of the iteration solving Problem 2, as
tends to a strictly positive number. When the inequality is satisfied, the minimization at Step 3 of Algorithm 1 is terminated. If
is the support of
, the inequality will never be satisfied and the minimization at Step 3 will terminate when the desired accuracy on
is reached.
5.7. Solving Problem 2
There are standard methods for solving Problem 2, and numerical libraries are available. The interior point method [
19] provides a quadratic convergence to the solution, meaning that the number of digits of accuracy is almost doubled at each iteration step, once
is sufficiently close to the minimizer. The algorithm uses the Newton method and needs to solve a set of linear equations. Since this can be computationally demanding in terms of memory, we have implemented the solver by using the conjugate gradient method, which does not use the Hessian. Furthermore, if the Hessian turns out to have a small condition number, the conjugate gradient method can be much more efficient than the Newton method, especially if we do not need to solve Problem 2 with high accuracy. This is the case in the initial stage of the computation, when the set
is growing and does not contain all the points of the support of
.
The conjugate gradient method iteratively performs a one-dimensional minimization, along directions that are conjugate with respect to the Hessian of the objective function [
19]. The directions are computed iteratively, by setting the first direction equal to the gradient of the objective function. The conjugate gradient method is generally used with unconstrained problems, whereas Problem 2 has the inequality constraints
. To adapt the method to our problem, we perform the one-dimensional minimization in the region where
is non-negative. Whenever an inactive constraint becomes active, or vice versa, we set the search direction equal to the gradient and restart the generation of the directions from that point. Once the procedure terminates, the algorithm provides a list of active constraints with
. Numerical simulations show that this list is generally complete, and corresponds to the points where the minimizer
is equal to zero.
In general, the slackness condition (
25) is not satisfied by the numerical solution. However, as previously pointed out, we can suitably normalize
so that this condition is satisfied by
. Thus, we will assume that the equality
holds with
. This also implies that