1. Introduction
Set-constrained and set-unconstrained optimization use several theorems that include a first-order necessary condition, as well as second-order conditions that are necessary and/or sufficient for a point to be a local minimizer [
1,
2]. These theorems require the objective function to be once or twice continuously differentiable.
The first-order necessary condition for unconstrained optimization requires the gradient to be zero at a minimizer [
3,
4,
5]. The first-order necessary condition for set-constrained optimization requires the dot product of the gradient and a vector in a feasible direction to be non-negative. When constraints are defined in terms of differentiable functions, the first-order necessary condition takes the form of the first-order Lagrange and Karush–Kuhn–Tucker (KKT) conditions [
2,
6]. Reference [
5] determined the solution to linear programming problems using a first-order necessary condition. Such conditions have been studied for control systems governed by ordinary differential equations [
7], stochastic differential equations [
8], and stochastic evolution equations [
9].
Throughout, we assume a nonzero gradient. Our main results present a series of four sufficient conditions and four corresponding necessary conditions for a point to be a local minimizer of a given function f (Theorems 1–8). Each is of the first-order type, including those that assume twice differentiability. Theorems 1 and 2 describe the behavior of f in cones determined by the gradient. Theorems 3 and 4 replace the geometrical viewpoint by analytical conditions involving sequences. The analytical versions use generalizations of the notions of gradient and differentiability that simplify statements and proofs. Theorems 5 and 6 are refinements when f is twice differentiable. A previous version of this paper included a remark that the analytical conditions are unverifiable. However, the last two results, Theorems 7 and 8, which return to the geometrical view, are proved precisely by verifying the analytical conditions. They replace the original cones by larger regions that we call -cones. An -cone is an ordinary cone when and a paraboloid when . The results fail for half-planes, and a paraboloid is a limiting case for what is possible. Example 5 illustrates a class of problems that do not meet criteria for a strict local minimizer in the KKT theory but are covered by Theorem 7.
We remark that the cones used here are different from the cones of descent directions in [
2], which are actually half-planes. Our sufficient conditions do not guarantee that a point with a nonzero gradient is a strict local minimizer on a half-plane. Example 1 gives a function with a nonzero gradient that is not a strict local minimizer on a half-plane.
Convex optimization problems require the objective function to be convex, which essentially makes them second-order conditions. The first-order conditions that we propose do not require the objective function to be convex. The requirement for a function to be twice continuously differentiable is different from the condition of convexity, which constrains the values of the second derivatives.
Notation and terminology. Points of are written as column vectors with norm . A subset of is called a neighborhood of a point if it contains a disk for some , and a neighborhood of a set if it is a neighborhood of every point in . A point is a strict local minimizer of a function if there is an such that whenever and , and a local minimizer if the condition holds with > replaced by ≥. Local maximizers and strict local maximizers are defined similarly by reversing the inequalities.
2. First-Order Conditions for Local Minimizers
The gradient of a real-valued function
f defined on a subset
of
is defined by
whenever the partial derivatives exist. Gradients appear in a standard first-order necessary condition for local minimizers. Consider a function
that is
on a neighborhood of a set
and a point
. If
is a local minimizer of
f, then
for every vector
d in a
feasible direction, that is, a direction such that some straight line segment with endpoint
lies entirely within
[
1] (Theorem 6.1). Hence, if
for some feasible direction
at
, the standard necessary condition implies that
is not a local minimizer. However, it may occur that
has no feasible direction within
, and then it is impossible for the standard necessary condition to give such information. For example, feasible directions are impossible for any set
whose points have only rational coordinates. For an elementary example, consider the objective function
on the set
Then, f attains a maximum value on at . The point admits no feasible direction within (proof: any feasible direction must be along a line segment , ; then, for all sufficiently small positive , , and hence , which is impossible), and thus the standard necessary condition yields no information.
Feasible directions play no role in our results, and instead all that is needed is that
. Theorem 2 (FONC 1) is a new first-order necessary condition for local minimizers. It is proved using a corresponding new first-order sufficient condition, Theorem 1 (FOSC 1). Corollary 1 is a companion sufficiency condition for strict local maximizers. In the example
on the set
defined by (
1) and
, the gradient is
, and therefore:
- (1)
Theorem 2 is applicable and implies that is not a strict local minimizer, because every opposite cone contains points arbitrarily near .
- (2)
Corollary 1 is applicable and implies that is a strict local maximizer, because is entirely contained in the opposite cone .
We need a few more preliminaries. A real-valued function
f on a subset
of
is said to be
differentiable at
if the domain of
f contains a neighborhood of
, all partial derivatives of
f exist at
, and the function
defined by
satisfies
When
, this is equivalent to the existence of a derivative at the point, but in general the simple existence of partial derivatives does not imply differentiability. A convenient sufficient condition for differentiability is that
f is defined and
on some neighborhood of
[
10] (Th. 9 on p. 113). Differentiability is equivalent to the existence of a first-order Taylor approximation, as in [
1] (Th. 5.5 on pp. 64–65) or [
10] (Th. 2 on p. 160).
Any two nonzero vectors
in
determine an acute angle
such that
This notion is implicit in the definition of a cone in .
Definition 1. If , , , and , the set consisting of together with all points in that satisfyis denoted by and called a cone with vertex and direction . The opposite cone is defined by Our first result is a first-order sufficient condition (FOSC) for a local minimizer.
Theorem 1 (FOSC 1). Let f be a real-valued function that is defined and on a neighborhood of a point such that . Assume that Ω is a set in the domain of f that contains and is contained in some cone , . Then, is a strict local minimizer of f on Ω.
We note a consequence that will be useful in Theorem 2.
Corollary 1. For f as in Theorem 1, is a strict local maximizer of f on any set in the domain of f that contains and is contained in for some .
The corollary follows by applying Theorem 1 with f replaced by .
Proof of Theorem 1. Let
be a set in the domain of
f that contains
and is contained in
,
. Since
f is
on a neighborhood of
,
f is differentiable at
. Thus,
where
. Therefore, we may choose
so that the punctured disk
is contained in the domain of
f and
whenever
. Suppose
and
. Then,
, so
. By (
4),
Since
, by (
5),
, and hence by (
6),
Therefore, is a strict local minimizer of f on . □
A corresponding first-order necessary condition (FONC) is deduced with the aid of Corollary 1. We first introduce some useful terminology.
Definition 2. A point of Ω is called isolated if there exists an such that the punctured disk contains no points of Ω, or, equivalently, there is no sequence in such that .
In the extreme case that a set consists of just one point, that point is isolated in the set because the alternative would imply the existence of other points in the set. Isolated points occur in our setting in the next result, which is illustrated in
Figure 1.
Theorem 2 (FONC 1). Let f be a real-valued function that is defined and on a neighborhood of a point such that . If is a strict local minimizer of f on some subset Ω of , then is an isolated point of for every .
Proof. Suppose is a strict local minimizer of f on some subset of . We argue by contradiction to show that is an isolated point of for every . Assume that this conclusion is false for some . Then, the cone contains a sequence in that converges to . Since is a strict local minimizer of f on , there exists such that whenever and . By the definition of convergence, for all sufficiently large n, and hence for all sufficiently large n. However, by Corollary 1, is a strict local maximizer for f on , so for a sufficiently large n, which is a contradiction. The result follows. □
3. Analytical Versions of the Conditions and Generalized Differentiability
In this section, we present analytical versions of the conditions for local minimizers given in
Section 2. They are stated in a more general setting that uses extensions of the notions of differentiability and gradient to arbitrary sets. The analytical versions are in some ways more transparent and lead to generalizations of Theorems 1 and 2 in
Section 5.
Definition 3 (Generalized Differentiability).
Let Ω be a subset of . We say that a function is differentiable at a point if is not an isolated point of Ω, and (2) there is a vector in such that the function defined by satisfies or, equivalently, for every sequence in converging to , the sequence defined by satisfies Any such vector is denoted by and called a gradient of f at .
By [
10] (Th. 9 on p. 113), the condition (
8) is automatically met if
f has an extension to a function
which is
on a neighborhood of
, and then we can choose
. In general, gradients are not unique, but for our purpose any choice will work.
For a fixed f and , the set of all gradients is a closed convex set. We shall not need this fact and omit a proof.
Theorem 3 (FOSC 2).
Let Ω be a subset of , and let be differentiable at some point with gradient . Assume that for every sequence in with there is a such that for all sufficiently large n. Then, is a strict local minimizer of f over Ω. Proof. Argue by contradiction. If
is not a strict local minimizer of
f, there is a sequence
in
with
such that
for all
n. Define
by (
9). Then, by (
11),
for all sufficiently large
n. Since
f is differentiable at
,
as
by (
10). Dividing by
, we obtain
for all sufficiently large
n. The result that
for all sufficiently large
n contradicts our choice of the sequence
to satisfy
for all
n. The theorem follows. □
The corresponding necessary condition is conveniently stated in contrapositive form.
Theorem 4 (FONC 2).
Let Ω be a subset of , and let be differentiable at some point with gradient . If there exists a sequence in converging to and number such that for all sufficiently large n, then is not a local minimizer of f on Ω. Equivalently, if is a local minimizer of f over , then there exists no such sequence and number .
Proof. Assume we are given such a sequence
and
. Define
by (
9). Then, by (
14),
for all sufficiently large
n. Since
f is differentiable at
,
as
by (
10). Thus, dividing (
15) by
, we see that
for all sufficiently large
n. Hence,
for all sufficiently large
n. Since
,
is not a strict local minimizer. □
Example 1. The inequality (11) in Theorem 3 (FOSC 2) cannot be weakened to Choose , , and , . Then, we can take , and any sequence in with satisfies (17). However, is not a local minimizer because , and the sequence in defined by , , converges to and satisfies for all . Example 2. We cannot relax the inequality (14) in Theorem 4 (FONC 2) to Choose , , , , and . See Figure 2. The sequence , , belongs to and converges to . It satisfies (18) because Nevertheless, is the minimum value of f attained on Ω. For, if and , then , and whenever and .
4. Refinements Based on Twice Differentiability
The additional smoothness of f yields stronger results.
Definition 4 (Generalized Twice Differentiability).
Let Ω be a subset of . We say that a function is twice differentiable at a point if is not an isolated point of Ω, and (2) there exist and such that the function defined by satisfies or, equivalently, such that for any sequence in converging to , where Any such vector is denoted and called a gradient of f at , and any such matrix is called a Hessian of f at .
Twice differentiability implies differentiability according to Lemma 1 below. By Th. 3 on p. 160 of [
10],
f is twice differentiable at
if it has an extension to a function
that is
on a neighborhood of
. In this case,
and
can be chosen as the usual gradient and Hessian of
.
Definition 5. Given sequences and of real numbers, we write (1) to mean that there exists such that for all sufficiently large n, and (2) if for all sufficiently large n and .
Lemma 1. Let Ω be a subset of , and let be twice differentiable at some point with gradient . Then, for any sequence in converging to , Proof. We start with (
21), assuming some choice of Hessian
. Let
be the matrix bound of
. Then, by the triangle inequality and Cauchy–Schwarz inequality,
By (
22),
, and hence
for all sufficiently large
n. Therefore
for all sufficiently large
n. □
Theorem 5 (FOSC 3).
Let Ω be a subset of , and let be twice differentiable at some point , with gradient . Assume that for every sequence in with , there is a sequence of positive numbers such that and for all sufficiently large n. Then, is a strict local minimizer of f over Ω. Proof. If the conclusion is not true, there exists a sequence
in
with
such that
for all
n. Then, by hypothesis, there is a sequence
of positive numbers satisfying (
24) such that
. By Lemma 1 and (
24),
for all sufficiently large
n. Hence, since
,
for all sufficiently large
n. Thus,
for all sufficiently large
n, contradicting our choice of the sequence
. The result follows. □
Theorem 6 (FONC 3).
Let Ω be a subset of , and let be twice differentiable at some point , with gradient . If there exist a sequence in converging to and a sequence of positive numbers such that and for all sufficiently large n, then is not a local minimizer of f on Ω. Equivalently, if is a local minimizer of f on , no such sequences and can exist.
Proof. Let
and
be sequences with the properties stated in the theorem. By Lemma 1 and (
26),
for all sufficiently large
n. Hence, since
,
for all sufficiently large
n. Thus,
for all sufficiently large
n, and therefore
is not a local minimizer. □
Remark 1. If satisfy the conditions for a strict local minimizer in Theorem 3 (FOSC 2), they satisfy the conditions in Theorem 5 (FOSC 3) by choosing , . Similarly, if meet the conditions for a local non-minimizer in Theorem 4 (FONC 2), they meet the conditions in Theorem 6 (FONC 3). Examples 3 and 4 show that both converse statements fail.
Example 3. See Figure 3. Set , , on and let . Then, f is twice differentiable at with gradient . The point is a strict local minimizer because and for every other . The condition for a strict local minimizer in Theorem 3 (FOSC 2) is not satisfied. For example, the sequenceis in , , and the inequality (11) implies , which is impossible. On the other hand, the condition in Theorem 5 (FOSC 3) is satisfied. Consider any sequence , , in with . Choosing , we readily find that , and . Example 4. See Figure 4. Set , , on and let . Then, f is twice differentiable at with gradient . Since on the curve , is not a local minimizer of f on Ω. It is not possible to show this using Theorem 4 (FONC 2), because it is impossible to find a sequence satisfying the conditions there. However, the conditions of Theorem 6 (FONC 3) are met by choosingfor all . 5. Applications of the Analytical Conditions
Example 3 suggests looking for generalizations of Theorems 1 and 2 to larger regions. In this section, we show that such generalizations exist, again assuming twice differentiability.
Definition 6. Let , , and be given. For each , define as the set consisting of the point , together with all points such that and , whereis the projection of in the direction , andis a component of in the orthogonal direction. We call an α-cone. The opposite α-cone is the set See
Figure 5. For
,
is a cone as in Definition 1. For
, it is a paraboloid.
Theorem 7 (FOSC 4). Let Ω be a subset of , and let be twice differentiable at some point , with nonzero gradient . If for some and , then is a strict local minimizer of f over Ω.
As in Theorem 1, we can apply Theorem 7 with f replaced by .
Corollary 2. Let Ω be a subset of , and let be twice differentiable at some point , with nonzero gradient . If for some and , then is a strict local maximizer of f over Ω.
Remark 2. Theorem 7 fails for . For an example, let , , and Then, f is twice differentiable at , , and (see Figure 5). If Theorem 7 were true for , then would be a strict local minimizer for f on Ω. However, f assumes positive, negative, and zero values at points of Ω arbitrarily close to . This contradicts Theorem 7, and therefore cannot be allowed in the theorem. Proof of Theorem 7. We prove the theorem by verifying the condition for a strict local minimizer in Theorem 5 (FOSC 3). Let
be a sequence in
such that
. We shall construct a sequence
of positive numbers satisfying (
24) for all sufficiently large
n such that
. For all
, set
Then,
and
for all
n by the definition of
. Since
and
are orthogonal,
Since
,
, and hence
for all sufficiently large
n, say
. Since
,
, and hence
for all
(because
). Thus.
We assume also that
and hence
, say
. Then,
by the Cauchy–Schwarz inequality and the fact that
. Therefore,
Setting
, we obtain
for all sufficiently large
n and
We have verified the requirements in Theorem 5, and therefore is a strict local minimizer of f over by that result. □
Theorem 8 (FONC 4). Let Ω be a subset of , and let be twice differentiable at some point , with nonzero gradient . If is a strict local minimizer of f on Ω, then is an isolated point of for every and .
Proof. Assume that is a strict local minimizer of f on , and, if possible, that is not an isolated point of for some and . Then, there is a sequence that belongs to both and such that . By Corollary 2, for all sufficiently large n, contradicting our assumption that is a strict local minimizer of f on . The theorem follows. □
Examples 5 and 6 are set in the context of the Karush–Kuhn–Tucker (KKT) theorem [
2,
6], which allows constraint conditions to be expressed in terms of inequalities. We follow the account in [
1], in which the KKT theorem appears as Theorem 21.1, and Theorem 21.3 is the corresponding second-order sufficient condition (SOSC). Theorems 9 and 10 below are specializations of these results to the cases that concern us here. Theorems 21.1 and 21.3 in [
1] allow additional Lagrange-type conditions that play no role in our examples.
Theorem 9 (KKT Theorem). Let be given functions. Assume that is a local minimizer for f subject to the condition , and that is a regular point for g in the sense that . Then, there is a real number such that
- (1)
;
- (2)
.
The corresponding sufficient condition requires the stronger assumption that the given functions are . The Hessians for are the matrices of second-order partials of .
Theorem 10 (SOSC). Let be given functions. Assume that satisfies and we can find a real number satisfying the following conditions:
- (1)
.
- (2)
.
- (3)
If are the Hessians of and , then for all such that and .
Then, is a strict local minimizer for f subject to the condition .
Example 5. Set , , and for all in . Then, and on . The set is an -cone with in the direction (see Figure 3). Therefore, by Theorem 7 (FOSC 4), is a strict local minimizer for f subject to the constraint . However, this cannot be shown with Theorem 10 because , which is a hypothesis in Theorem 10. To see why this is a problem, consider the form that appears in condition (3). At any point with , the Hessian of g is given by The second partial does not exist at any point on the line . Thus, is undefined. Hence, the Lagrangian in condition (3) of Theorem 10 is undefined, and therefore Theorem 10 cannot be applied. We remark that this example is within the scope of Theorem 9 (KKT Theorem), and the conditions (1) and (2) there are satisfied with .
Example 6. Set , , and for all in . Then, on . In this example, is not a local minimizer of f subject to the constraint . For example, for on the boundary of the constraint set, Might this example contradict Theorem 7 or Theorem 10? Fortunately, no, and it is instructive to see why. Theorem 7 is not applicable because the constraint set is an -cone with , and it is shown in Remark 2 that Theorem 7 fails for . To see that Theorem 10 is also not applicable, let us check the required conditions (1)–(3):
- (1)
Since , for all .
- (2)
For , .
- (3)
Therefore,for every such that , that is, for all such that .
In view of (28), the positive definiteness condition in (3) fails, and hence Theorem 7 cannot be applied to this example. 6. Conclusions
The first-order necessary conditions in this paper contribute to the literature on first-order optimality conditions by introducing stronger results than those in the current literature. The new first-order necessary conditions imply the standard first-order necessary conditions, including those in [
1,
5]. We introduced first-order sufficient conditions that we did not find elsewhere in the literature. Our explanation of why the new conditions are stronger used examples that were two-dimensional. However, the method is applicable to general
n-dimensional problems including linear programming.
We proposed first-order sufficient conditions for set-constrained optimization that do not require the objective function to be convex or the constraint equations to be differentiable. Conditions that require the function to be convex are essentially second-order conditions. Our conditions only require the gradient of the objective function to be nonzero at a candidate minimizer, and they are essentially first-order conditions even when we apply them to problems where the objective function is twice differentiable.
When the given function is continuously differentiable at and the gradient is nonzero, the simplest form of the sufficient condition says that there is a cone with a vertex at , and is a strict local minimizer on the cone. This sufficient condition was employed to prove a corresponding necessary condition that does not use feasible directions and instead uses the topological notion of an isolated point in a set.
We introduced generalized differentiability and reformulated the first-order conditions in terms of convergent sequences. The new differentiability does not require the objective function to be defined on an open neighborhood of . It only requires the function to be defined on the constraint set.
We refined the first-order conditions for a minimizer to twice differentiable functions in terms of -cones. The sufficiency version says that a twice differentiable function with a nonzero gradient has a strict local minimizer at the vertex of an -cone whose axis is the gradient direction. We presented a problem with an -cone constraint set where the new sufficiency condition shows that the candidate point is a strict local minimizer. This problem satisfies the necessary condition of the KKT method but not the sufficient condition, because the Hessian is undefined at the candidate minimizer.