Next Article in Journal
Subsampling Algorithms for Irregularly Spaced Autoregressive Models
Previous Article in Journal
Metaheuristic Algorithms in Optimal Design of Engineering Problems
Previous Article in Special Issue
Performance Analysis and Parallel Scalability of Numerical Methods for Fractional-in-Space Diffusion Problems with Adaptive Time Stepping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions

Software Engineering Institute, John von Neumann Faculty of Informatics, Óbuda University, 1034 Budapest, Hungary
Algorithms 2024, 17(11), 523; https://doi.org/10.3390/a17110523
Submission received: 7 October 2024 / Revised: 10 November 2024 / Accepted: 13 November 2024 / Published: 14 November 2024
(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Abstract

:
We investigate and compare two versions of the Nelder–Mead simplex algorithm for function minimization. Two types of convergence are studied: the convergence of function values at the simplex vertices and convergence of the simplex sequence. For the first type of convergence, we generalize the main result of Lagarias, Reeds, Wright and Wright (1998). For the second type of convergence, we also improve recent results which indicate that the Lagarias et al.’s version of the Nelder–Mead algorithm has better convergence properties than the original Nelder–Mead method. This paper concludes with some open questions.
MSC:
65K10; 90C56

1. Introduction

The original Nelder–Mead (NM) simplex algorithm was constructed in 1965 by Nelder and Mead [1] as an improved version of the simplex method of Spendley, Hext and Himsworth [2]. During the almost six decades that have passed, the Nelder–Mead algorithm has became a truly popular and widely used method for the minimization problem
f x min f : R n R
in derivative-free optimization (see, e.g., [3,4,5,6,7,8,9,10,11]).
In spite of its widespread use, only a few convergence results are known. They are of essentially two types. The first type of results are related to the convergence of function values at the vertices of the simplex sequence. They say nothing about the vertices. The most famous results of this type are due to Lagarias, Reeds, Wright and Wright [12]. The other type of results are related to the convergence of simplex vertices or simplices. The first result of this type is due to Kelley [13], who proved that under a certain (Armijo-type) descent condition, the accumulation points of the simplex sequence are critical points of f. Recent results on the convergence of the simplex sequence to a common limit point are given in papers [14,15,16].
Wright [17,18] raised several questions concerning the Nelder–Mead method:
(a)
Do the function values at all vertices necessarily converge to the same value?
(b)
Do all vertices of the simplices converge to the same point?
(c)
Why is it sometimes so effective (compared to other direct search methods) in obtaining a rapid improvement in f?
(d)
One failure mode is known (McKinnon [19])—but are there other failure modes?
(e)
Why, despite its apparent simplicity, should the Nelder–Mead method be difficult to analyze mathematically?
In fact, there are several recent examples that indicate different types of convergence behavior:
  • The function values at the simplex vertices may converge to a common limit value, while the function f has no finite minimum and the simplex sequence is unbounded (Example 5 of [14]; Examples 1 and 2 of [15]; Example 4 of [20]).
  • The simplex vertices may converge to a common limit point, but it is not a stationary point of f (McKinnon [19] and Example 3 of [15]).
  • The simplex sequence may converge to a limit simplex of a positive diameter, resulting in different limit values of f at the vertices at the limit simplex (Theorem 1 of [21]; Example 4 of [14]; Examples 4 and 5 of [15]; Examples 1, 2 and 3 of [20]).
  • The function values at the simplex vertices may converge to a common value, while the simplex sequence converges to a limit simplex with a positive diameter (Example of 5 [20]).
These examples are answering questions (a), (b) and (d) of Wright negatively.
There are several variants of the original Nelder–Mead method [1]. Here, we study two main versions of the Nelder–Mead algorithm: the ’original’ (unordered) method of Nelder and Mead [1] (Algorithm 1) and the ’ordered’ version of Lagarias, Reeds, Wright and Wright [12] (Algorithm 2). In Section 2, we introduce both algorithms, while Section 3 contains their matrix representations. In Section 4, we study the connection and spectral properties of these matrices. In Section 5, we prove convergence of the function values at the simplex vertices for both algorithms. These results are more general than those of Lagarias et al. [12], while their proof is quite simple.
In Section 6, we study the convergence of the generated simplex sequence. Here, we give necessary and sufficient conditions for the convergence of the simplex sequence (for both algorithms). This relates the convergence to the convergence of an infinite matrix product. We pay particular attention to the case when the simplex sequence converges to a common limit point. The results show why the convergence problem (questions (c) and (e) of Wright) is difficult. This paper closes with computational results, conclusions and open questions.

2. Two Versions of the Nelder–Mead Method

We first define the original Nelder–Mead simplex method in the following form (see [1,6,22]).
The simplex of iteration k is denoted by S k = x 1 k , , x n + 1 k R n × n + 1 , where x i k denotes vertex i of iteration k. The function values at the vertices are denoted by f i k = f x i k ( i = 1 , , n + 1 ). We define the index of the worst vertex ( h k ), the best vertex ( k ) and the second worst vertex ( m k ) as follows:
h k : f h k k = max 1 i n + 1 f x i k , k : f k k = min 1 i n + 1 f x i k ,
m k : f m k k = max 1 i n + 1 , i h k f x i k .
Note that 1 k , m k , h k n + 1 , and they are not necessarily unique. Define the center x c k and straight line x k α as follows:
x c k = 1 n i = 1 , i h k n + 1 x i k , x k α = 1 + α x c k α x h k k ,
The reflection, expansion, outside and inside contraction points of simplex S k are defined as
x r k = x k 1 , x e k = x k 2 , x o c k = x k 1 2 , x i c k = x k 1 2 .
The corresponding function values are denoted by f r k = f x r k , f e k = f x e k , f o c k = f x o c k and f i c k = f x i c k , respectively.
The original Nelder–Mead simplex method is given by Algorithm 1.
Algorithm 1: Original Nelder–Mead algorithm.
Algorithms 17 00523 i001
    The “decision boundaries” are the same as in the ordered Nelder–Mead method of Lagarias et al. [12], which is given in Algorithm 2. Suppose that
f x 1 0 f x 2 0 f x n + 1 0
holds for the initial simplex vertices and that this ordering is kept during the subsequent iterations. Here, k = 1 , m k = n and h k = n + 1 for all k.
There are two rules of reindexing after each iteration. For a nonshrink step, x n + 1 k is replaced by a new point v x r k , x e k , x o c k , x i c k . The following cases are possible: f v < f 1 k , f 1 k f v < f n k and f n k f v < f n + 1 k . Define
j k = 1 , if f v < f 1 k j : f j 1 k f v < f j k , 2 j n + 1 , otherwise .
Then, the new simplex vertices are
x i k + 1 = x i k 1 i j k 1 , v i = j k , x i 1 k i = j k + 1 , , n + 1 .
This ‘insertion rule’ puts v into the ordering with the highest possible index, guaranteeing that f i k + 1 f i k ( i = 1 , , n + 1 ) and
f 1 k + 1 f 2 k + 1 f n + 1 k + 1 k 0 .
If shrinking occurs, then the new vertices must be ordered such that x 1 k + 1 = x 1 k if f z 1 f z i ( i = 2 , , n + 1 ).
For a given S 0 and f, the sequence S k is uniquely defined by Algorithm 2. Clearly, this is not true for Algorithm 1.
Algorithm 2: Ordered Nelder–Mead algorithm.
Algorithms 17 00523 i002

3. Matrix Representations for the Two Algorithms

The steps of both algorithms can be easily described by transformation matrices. We first consider Algorithm 1. If the incoming vertex v = x k α k ( α k ± 1 2 , 1 , 2 ) belongs to x r k , x e k , x o c k , x i c k , then
S k + 1 = x 1 k , , x h k 1 k , v , x h k + 1 k , , x n + 1 k ,
where v replaces vertex x h k k . For j = 1 , , n + 1 , define the vector
c j α = [ 1 + α n , , 1 + α n , α , 1 + α n , , 1 + α n ] T R n + 1
with α at the position j and the matrix
T j α = I + c j α e j e j T ,
where e j is the j-th unit vector. Then, v = S k c h k α k and
S k + 1 = S k T h k α k α k ± 1 2 , 1 , 2 .
The shrinking iteration can be written as
S k + 1 = S k T k s h r T k s h r = 1 2 I + 1 2 e k e T ,
where e = 1 , 1 , , 1 T . It follows that
S k + 1 = S k T k = S 0 i = 0 k T i = S 0 B k T i T ˜ ,
where
T ˜ = T α : 1 n + 1 , α = ± 1 2 , 1 , 2 T j s h r : 1 j n + 1 .
The set T ˜ has exactly 5 n + 1 nonsingular matrices. Note that e T T j α = e T and e T T k s h r = e T .
For Algorithm 2, we have the following (see, e.g., [14,15]). Assume that simplex S k satisfies condition (4). Then, for nonshrinking iterations, the new simplex is given by
S k + 1 = S k T α P j k ( α ± 1 2 , 1 , 2 ) ,
where
T α = I n 1 + α n e 0 α R n + 1 × n + 1
and
P j = e 1 , , e j 1 , e n + 1 , e j , , e n R n + 1 × n + 1 j = 1 , , n + 1
is a permutation matrix. If shrinking occurs, the new simplex is
S k + 1 = S k T s h r P T s h r = 1 2 I n + 1 + 1 2 e 1 e T ,
where the permutation matrix P P n + 1 is defined by the ordering condition (4) and P n + 1 is the set of all possible permutation matrices of order n + 1 .
Define the sets
T 1 = T α P j : α 1 2 , 1 2 , j = 1 , , n + 1 ,
T 2 = T 1 P j : j = 1 , , n T 2 P 1 , T 3 = T s h r P : P P n + 1
and
T = T 1 T 2 T 3 .
It follows that
S k + 1 = S k T k P k = S 0 i = 0 k T i P i = S 0 B k T i P i T , k 1 ,
where
B k = i = 0 k T i P i T i P i T .
The set T consists of 3 n + 3 + n + 1 ! nonsingular matrices. For any T i P i T , e T T i P i = e T .
Observe that both algorithms are nonstationary iterations. Note that the number of shrinking matrices for Algorithm 2 increased to n + 1 ! compared to the n + 1 shrinking matrices of Algorithm 1.

4. Connections and Properties

Here, we investigate the connection between the transformation matrices of both algorithms, their consequences and the spectral properties of the matrices.
The corresponding transformation matrices of Algorithms 1 and 2 are connected by orthogonal similarities. For any 1 s n + 1 ,
P T T s α P = T n + 1 α = T α
with P = e 1 , , e s 1 , e s + 1 , , e n + 1 , e s and
Q T T s s h r Q = 1 2 I + 1 2 e 1 e T = T s h r
with Q = e s , e 1 , , e s 1 , e s + 1 , , e n + 1 .
With S ˜ k = x ˜ 1 k , , x ˜ n + 1 k and S k = x 1 k , , x n + 1 k , denote the kth simplices of Algorithms 1 and 2, respectively. If a permutation matrix Q k = e r 1 , , e r n + 1 R n + 1 × n + 1 exists such that S ˜ k Q k = S k , x 1 k = x ˜ k k and x n + 1 k = x ˜ h k k , then we can ask if S ˜ k + 1 and S k + 1 define the same simplex. The assumption implies that e r 1 = e k and e r n + 1 = e h k . Hence, S ˜ k = S k Q k T and
S ˜ k + 1 = S ˜ k T h k α k = S k Q k T + Q k T c h k α k Q k T e h k e h k T .
Since e h k = Q k e n + 1 , Q k T e h k = e n + 1 and Q k T c h k α k = e r 1 , , e r n + 1 T c h k α k = c n + 1 α k , we have
S ˜ k + 1 = S k I + c n + 1 α k e n + 1 e n + 1 T Q k T = S k T α k Q k T = S k + 1 P j k T Q k T .
In the case of shrinking,
S ˜ k + 1 = S ˜ k T k s h r = S k 1 2 Q k T + 1 2 Q k T e k e T .
Note that Q k T e k = Q k T Q k e 1 = e 1 . Hence,
S ˜ k + 1 = S k 1 2 Q k T + 1 2 e 1 e T = S k 1 2 I + 1 2 e 1 e T Q k Q k T .
Since e T Q k = e T , we have that
S ˜ k + 1 = S k T s h r Q k T = S k + 1 P k T Q k T .
Hence, the simplices of iteration k + 1 are the same for the ordering of vertices. Since the continuation of the original Nelder–Mead method is not uniquely determined, unlike in the case of Algorithm 2, we can expect similar although not identical results for both algorithms.
The following results are obvious. For any 1 j n + 1 ,
T j α T j β = T j α β .
Lemma 1. 
The eigenvalues of T j α are λ i T j α = 1 ( i = 1 , , n ) and λ n + 1 T j α = α ( j = 1 , , n + 1 ). T j s h r has the eigenvalues λ 1 T j s h r = 1 and λ i T j s h r = 1 2 ( i = 2 , , n + 1 ) for j = 1 , , n + 1 . Both T j α and T j s h r have a diagonal Jordan form.
Corollary 1. 
Sequences T j α k k = 1 and T j s h r k k = 1 are uniformly bounded for α 1 and convergent for 1 α < 1 (see, e.g., [23,24,25]).
Note that T j α 2 = T α 2 = T α P j 2 > 1 and T j s h r 2 = T s h r 2 = T s h r P 2 > 1 , where
T α 2 = s n + s n 2 α 2 1 2 s n = 1 + α 2 2 + 1 + α 2 2 n
and
T s h r 2 = n + 5 8 + n + 5 2 64 1 4 1 2 .
The first expression follows from a result on the singular values of companion matrices (see, e.g., Barnett [26] or Kittaneh [27]), while the second follows from a result of Montano, Salas and Soto [28].
Matrices T s h r , T s h r P , T j s h r , T j α , T α P j ( 1 < α < 0 ) are left-stochastic. Hence,
T s h r 1 = T s h r P 1 = T j s h r 1 = T j α 1 = T α P j 1 = 1 1 < α < 0 .
We note that T j 1 ( 1 j n + 1 ) is an involution. If T 1 is multiplied by a permutation matrix P j ( 1 j n ), this property is lost. However, for n = 2 , T 1 P 2 is a 6-involutory matrix, which is exploited in proving Theorem 3 (for k-involutory matrices, see Trench [29]).
The multiplication of T α or T s h r by a permutation matrix P j or P makes a significant change in the eigenstructure. The following results hold (see Theorem 3 and Corollary 1 of Galántai [15]).
Theorem 1. 
(i) 
T α P 1 has at least one eigenvalue λ = 1 .
(ii) 
T 1 P j ( 1 j n + 1 ) has at least two eigenvalues λ = 1 .
(iii) 
If α > 1 and j 2 , then T α P j has exactly j 1 eigenvalues λ = 1 .
(iv) 
For α < 1 , T α P 1 has exactly one eigenvalue λ 1 = 1 , while the remaining n eigenvalues are in the open unit disk.
(v) 
If α < 1 and 2 j n + 1 , then T α P j has exactly j 1 eigenvalues λ = 1 , while n + 2 j eigenvalues are in the open unit disk.
(vi) 
If α = 1 and 1 j n + 1 , then all the eigenvalues of T 1 P j are on the unit circle.
(vii) 
If α > 1 and j = 1 , then T α P 1 has a second eigenvalue in the interval 1 , 1 + 1 + α n and all its eigenvalues are in the annulus 1 λ 1 + α .
Corollary 2. 
The eigenvalues of T s h r P ( P P n + 1 ) are λ 1 T s h r P = 1 and λ i T s h r P = 1 2 for i = 2 , , n + 1 .
The change in the eigenstructure of T α P j is the most striking in contrast to that of T j α . It gives better convergence properties for Algorithm 2, as shown in Section 6 and Section 7.

5. Convergence of the Function Values at the Vertices

Lagarias et al. [12] studied the convergence of function values at the simplex vertices for Algorithm 2. We cite their two most important results, before proving more general results for both Algorithms.
Lemma 2 
(Lemma 3.3 of [12]). If function f is bounded from below on R n and only a finite number of shrink iterations occur, then each sequence f i k k = 0 (generated by Algorithm 2) converges to some limit f i for i = 0 , 1 , , n + 1 and f 1 f 2 f n + 1 .
Lemma 2 guarantees the convergence under rather general conditions. In general, the limit values f i i = 1 n + 1 can be different as shown by Examples 1 and 2 of [20] (also see [15,21]).
Theorem 2 
(Theorem 5.1 of [12]). Assume that f is a strictly convex function on R 2 with bounded level sets. Assume that S 0 is non-degenerate. Then, the Nelder–Mead simplex method (Algorithm 2) converges in the sense that f 1 = f 2 = f 3 = f .
Theorem 2 is generally considered as the main result of [12]. In addition, it can be proved (see [20]) that if f is strictly convex on R 2 with bounded level sets, S 0 is non-degenerate (affine invariant); thus, any accumulation point S of the simplex sequence S k (generated by Algorithm 2) has the form S = x , x , x .
We need the following concept and results.
Definition 1. 
Let f be defined on a convex set S R n . The function f is said to be quasiconvex on S if
f λ x 1 + 1 λ x 2 max f x 1 , f x 2
for every x 1 , x 2 S and for every λ 0 , 1 .
Convexity implies quasiconvexity.
Lemma 3 
(Lemma 6 of [20]). If f is quasiconvex and bounded below on R n , then each sequence f i k k = 0 is monotone decreasing and converges to some limit f i for i = 0 , 1 , , n + 1 . Furthermore, f 1 f 2 f n + 1 .
Corollary 3 
(Corollary 2 of [20]). If f is convex on R n and an index 1 j n exists such that f j k < f n + 1 k , then f i c k < f n + 1 k and no shrinking occurs in iteration k.
Lemma 4 
(Lemma 10 of [20]). Assume that f is convex and bounded below on R n . If for some integer ℓ ( 1 n ),
f < f + 1 ,
then there is an index K > 0 such that for all k K , j k > , that is, the first ℓ vertices do not change.
Lemma 5 
(Lemma 11 of [20]). If f is convex and bounded below on R n , then f n = f n + 1 .
The following generalization of Theorem 2 of Lagarias et al. was proved in [20]. Here, we present it with a new short proof.
Theorem 3. 
Assume that f is a convex function on R 2 and is bounded below. Then, the Nelder–Mead simplex method (Algorithm 2) converges in the sense that f 1 = f 2 = f 3 .
Proof. 
It follows (Lemmas 3 and 5) that f 1 f 2 = f 3 . Assume that f 1 < f 2 . It follows from Lemma 4 that there exists an index K > 0 such that for k K , x 1 k = x 1 K . Hence, only x 2 k and x 3 k may change. The insertion rule and the impossibility of shrinking (Corollary 3) imply that only the following cases are possible:
(i)
f 1 k f r k < f 2 k ;
(ii)
f 2 k f r k < f 3 k and f 1 k f o c k f r k ;
(iii)
f 3 k f r k and f 1 k f i c k < f 3 k .
We assume that K > 0 is big enough, so that for k K , f 1 f 1 k < f 1 + ε , f 2 f i k < f 2 + ε ( i = 2 , 3 ), where ε > 0 is such that f 1 + 4 ε < f 2 . Depending on the selected case, f r k f 2 , f o c k f 2 or f i c k f 2 must hold. In case (i), x r k replaces x 2 k and S k + 1 = S k T 1 P 2 . In case (ii), since x o c k = 1 2 x c k + x r k , we have
f o c k 1 4 f 1 k + 1 4 f 2 k + 1 2 f r k 1 4 f 1 + ε + 3 4 f 2 + ε < f 2 .
Hence, case (ii) cannot occur. Similarly, in case (iii),
f i c k 1 4 f 1 k + 1 4 f 2 k + 1 2 f 3 k 1 4 f 1 + ε + 3 4 f 2 + ε < f 2
showing that case (iii) cannot occur. Hence, we can only have the simplex sequence
S k + K = S K T 1 P 2 k , k 0 .
Since T 1 P 2 is a 6-involutory matrix ( T 1 P 2 6 = I 3 ), S k + K is a periodic sequence and S 6 + K = S K , which is impossible by the insertion rule. This is a contradiction; thus, f 1 = f 2 = f 3 . □
For Algorithm 1, we prove the following results. First, observe the following facts. For nonshrink steps, we have
(i)
x i k + 1 = x i k for i h k , x h k k = v ;
(ii)
f i k + 1 = f i k ( i h k ) and f h k k + 1 < f h k k ;
(iii)
f k k = min i f i k f i k max i f i k = f h k k ;
(iv)
f k + 1 k + 1 f k k and f h k + 1 k + 1 f h k k .
Hence, f i k + 1 f i k ( i = 1 , , n + 1 ).
Lemma 6 
(Algorithm 1). If f : R n R is bounded below and only a finite number of shrink operations occur, then there exist finite limit values f i such that f i k f i ( i = 1 , 2 , , n + 1 ).
The only difference to Lemma 2 is that the function values f i k and their limits are not ordered.
Lemma 7 
(Algorithm 1). If f : R n R is quasiconvex on R n and bounded below, then there exist finite limit values f i such that f i k f i ( i = 1 , 2 , , n + 1 ).
Proof. 
If f is quasiconvex, then in the case of shrinking, the inequality
f i k + 1 = f 1 2 x i k + x k k max f i k , f k k = f i k ( i = 1 , , n + 1 )
also holds. So does Lemma 6. □
Lemma 8 
(Algorithm 1). If f : R n R is strictly convex, then no shrinking occurs.
Proof. 
Shrinking happens if f m k k f r k < f h k k f o c k > f r k or f i c k f h k k f r k f h k k holds. In general, f c k < 1 n i = 1 , i h k n + 1 f i k f m k k . If f m k k f r k < f h k k , then f o c k = f 1 2 x r k + x c k < 1 2 f r k + f m k k f r k . If f r k f h k k , then f i c k = f 1 2 x c k + x h k k < 1 2 f c k + f h k k f h k k . □
Remark 1 
(Algorithm 1). Assume that f : R n R is convex. Then, f c k f m k k . If f m k k f r k < f h k k , then f o c k f r k . If f r k f h k k , then f i c k f h k k . However, if at least for one j h k , f j k < f h k k holds, then f c k 1 n i = 1 , i h k n + 1 f i k < f h k k and f i c k < f h k k . In such a case, there is no shrinking.
Assume that f : R n R is bounded below and only a finite number of shrink operations occur. Then, we have finite numbers f i such that f i k f i as k , and each f i k is monotone decreasing ( i = 1 , , n + 1 ). The values f i are unordered. However, there is permutation i 1 , i 2 , , i n + 1 of the indices 1 , 2 , , n + 1 such that
f i 1 f i 2 f i n f i n + 1 .
Lemma 9 
(Algorithm 1). If f is convex and bounded below on R n , then f i n = f i n + 1 .
Proof. 
Assume that f i n < f i n + 1 . Lemma 6 and Remark 1 imply that for 0 < ε < f i n + 1 f i n , there exists an index K > 0 such that for k K ,
f i j f i j k f i j + ε j = 1 , , n + 1 .
Note that f i j f i j k f i n + ε < f i n + 1 f i n + 1 k ( j = 1 , , n ). Hence, there cannot be shrinking, and only the worst vertex x i n + 1 k can change for k K ( f v f i n + 1 ). Clearly, i n + 1 = h k for k K . If α = 1 , 2 , then f x r k < f m k k f i n by definition. Hence, only the cases α = ± 1 2 are possible. For simplicity, let η = i n + 1 . Then,
S k + 1 = S k T η α α 1 2 , 1 2
and
S K + j = S K i = 1 j T η α i α i 1 2 , 1 2 .
Since T j α T j β = T j α β ( j = 1 , , n + 1 ), it follows that
i = 1 j T η α i = T η 1 j + 1 i = 1 j α i T η 0 = I η 1 1 n e 0 0 0 0 0 1 n e I n + 1 η ,
S K + j S K I η 1 1 n e 0 0 0 0 0 1 n e I n + 1 η = S
and
x i n + 1 k x i n + 1 = 1 n j = 1 , j η n + 1 x j K = 1 n j = 1 n x i j K .
Since f is convex, we have
f i n + 1 = f x i n + 1 = f 1 n j = 1 n x i j K 1 n i = 1 n f i j K < f i n + 1 ,
which is a contradiction. Hence, f i n = f i n + 1 holds under convexity. □
Theorem 4. 
If f is convex and bounded below on R 2 , then Algorithm 1 converges in the sense that f 1 = f 2 = f 3 .
Proof. 
Lemmas 6 and 9 imply that f i 1 f i 2 = f i 3 for some permutation i 1 , i 2 , i 3 of 1 , 2 , 3 . Assume that f i 1 < f i 2 . Then, there exists an index K such that for k K , f i 1 f i 1 k < f i 1 + ε < f i 2 f m k k f h k k < f i 2 + ε ( f i 1 + 4 ε < f i 2 ) holds with m k , h k i 2 , i 3 . Since f i 1 k < f i 2 , only vertices x i 2 k and x i 3 k can change for k K . Also, note that f i 1 k < f i 2 implies that no shrinking may occur. It also follows that only f r k , f o c k , f i c k f i 2 are possible. Consider now the following ’possible’ cases:
(i)
f i 2 f r k < f m k k ;
(ii)
f m k k f r k < f h k k and f o c k f r k ;
(iii)
f h k k f r k and f i c k < f h k k .
In the first case, x m k + 1 k + 1 = x h k k + 1 = x r k and x h k + 1 k + 1 = x m k k . Hence, the operation S k + 1 = S k T h k 1 must be followed by S k + 2 = S k + 1 T m k α ( h k m k ) for some α 1 2 , 1 2 , 1 . In the second case,
f o c k 1 4 f i 1 k + 1 4 f m k k + 1 2 f r k 1 4 f i 1 + ε + 3 4 f i 2 + ε < f 2 ,
which makes this particular step impossible. Similarly, in the third case,
f i c k 1 4 f i 1 k + 1 4 f m k k + 1 2 f h k k 1 4 f i 1 + ε + 3 4 f i 2 + ε < f 2 ,
this step is impossible. Hence, the only possible step is S k + 1 = S k T h k 1 with m k + 1 = h k and h k + 1 = m k . In fact, we have S k + 2 = S k T h k 1 T m k 1 for a pair of m k and h k , and this is repeated. Consider the following possible cases:
m k , h k J = 2 , 3 , 3 , 2 , 1 , 3 , 3 , 1 , 1 , 2 , 2 , 1 .
It is easy to check that for every h k , m k J ,
T h k 1 T m k 1 6 = I .
Here, we have a periodicity, S k + 12 = S k T h k 1 T m k 1 6 = S k , which is impossible by the condition f r k < f m k k < f h k k . Hence, f 1 = f 2 = f 3 must hold. □
Theorems 3 and 4 are generalizations of Theorem 2 of Lagarias et al. [12], since they require only convexity and boundedness from below. Moreover, their proofs are much simpler.

6. Convergence of the Simplex Sequences

Here, we study the convergence of the simplex sequence S k generated by both Nelder–Mead methods. If S k S = x 1 , , x n + 1 , then (provided that f is continuous) f i k f x i for i = 1 , , n + 1 as k . The limit vertices x i can be different (see, e.g., Examples 1 and 2 of [20]).
Ideally, S k should converge to some limit of the form S = x , , x = x e T , where x R n is a stationary point of f . MacKinnon’s example [19] and Example 3 of [15] show that x is not always a stationary point.
If B k converges to a rank one matrix of the form B = w e T ( w R n + 1 ), then S k = S 0 B k S 0 w e T = x ^ e T and diameter S k 0 . There are several examples (see, e.g., [14,15,20,21]), where the simplex sequence S k converges to a limit S with diameter S > 0 and rank ( S ) , rank B 2 .
The following necessary and sufficient result holds for both algorithms.
Lemma 10. 
Assume that S 0 is non-degenerate (affinely independent). Then, S k (generated by Algorithm 1 or 2) converges to some S R n × n + 1 if and only if B k converges to some B .
Proof. 
In both cases S k = S 0 i = 0 k X i = S 0 B k X i T , where T = T ˜ (in case of Algorithm 1) or T = T (in case of Algorithm 2). If B k B , then S k = S 0 B k S 0 B : = S . Since e T B k = e T , we have
e T S k = e T S 0 B k .
By assumption, the first matrix on the right is invertible and so
B k = e T S 0 1 e T S k .
If S k S , then
B k = e T S 0 1 e T S k e T S 0 1 e T S = B .
From now on, we assume that S 0 is always non-degenerate (its columns are affinely independent). Lemma 10 is formally independent of f. However, the decision mechanism of the NM algorithm determines the next simplex and the resulting matrix product. Certain configurations clearly cannot occur. Such is the case of strictly convex functions, when no shrinking can occur or T 1 P 2 cannot be repeated in a sequence more than five times (for the case n = 2 ). In general, it is hard to tell what sequences (products) are possible for a given f.
Lemma 11. 
If B k = i = 0 k X i B ( X i T , T = T ˜ or T = T ), then rank B n .
Proof. 
Assume that rank B = n + 1 . Then, B is invertible. Since all of X i T is invertible, lim k B k 1 1 = B 1 . As X k = B k 1 1 B k ,
lim k X k = lim k B k 1 1 lim k B k = B 1 B = I .
Hence, X k I is necessary for an invertible B . Since T = T ˜ or T = T is a finite set and its elements are different from I, this cannot happen and rank B n . □
Sylvester’s theorem on rank (see, e.g., Mirsky [30]) implies that
rank S rank S 0 + rank B n + 1 .
Since, by assumption, rank S 0 = n , we have n rank S rank B 1 . In Example 4 of [15], rank S 0 = n , rank B = n and rank S = n 1 .
The following simple result characterizes the possible limits of infinite matrix products.
Lemma 12. 
Assume that B k = i = 0 k X i B ( X i T ) and X s occurs infinitely often in the product i = 0 X i , then every nonzero row of B is a left eigenvector of X s belonging to the eigenvalue λ = 1 .
Proof. 
Since X s appears infinitely many times in the product i = 0 X i , there is a subsequence of B i j with rightmost factor X s , say
B i 1 X s , B i 2 X s , ,
where the B i j s are products of X i s. Since B i j B , we have B i j X s B X s = B . □
Define the (left) 1-eigenspace of matrix A by E 1 A = x : x A = x . The rows of B belong to E 1 X s . If several X s , say X s i ( i = 1 , , m ), occur infinitely often in the product i = 0 X i , then the rows of B belong to i = 1 m E 1 X s i .
Since e T E 1 T i P i for all T i P i T and e T E 1 T i for all T i T ˜ , we have e T i = 1 m E 1 T s i P s i and e T i = 1 m E 1 T s i , respectively. If i = 1 m E 1 T s i P s i = λ e T : λ R or i = 1 m E 1 T s i = λ e T : λ R , then B has the form w e T for some w R n + 1 .
Note that for any T T ˜ or T T , e T T = e T . Hence, if B k B , then e T B k = e T e T B , that is, e T = e T B . If B = w e T , then e T B = e T w e T = e T implies that e T w = 1 .
We recall the following definitions and results (see, e.g., Hartfiel [31,32]).
A right infinite matrix product is an expression A 1 A 2 A k A k + 1 . A set M of n × n matrices has the right convergence property (RCP), if all possible right infinite products i = 1 A i ( A i M ) converge.
A set M of n × n matrices is product-bounded if there is a constant β > 0 such that A 1 A k β for all k and all A 1 , , A k M .
If M is an RCP set, then M is also product-bounded.
Lemma 13. 
If M is an RCP set, A 1 , , A k M and λ is an eigenvalue of A 1 A 2 A k , then λ < 1 or λ = 1 , and this eigenvalue is simple. Hence, each matrix of M must satisfy this condition.
The matrices T 2 P 1 and T j 2 have at least one eigenvalue greater than 1, and T 2 P 1 > 1 and T j 2 > 1 in any induced matrix norm. Hence, T 2 P 1 k and T j 2 k are unbounded. There are also examples when the Nelder–Mead algorithm produces unbounded simplex sequences S k = S 0 T 2 P 1 k or S k = S 0 T 1 P 1 k (see, e.g., [14,15,20]). It is clear that the whole matrix set T ˜ or T is not an RCP set.
Hence, we must seek for subsets M of T ˜ or T , which are RCPs. However, Blondel and Tsitsiklis [33] proved that the product boundedness of a finite matrix set M is algorithmically undecidable and it remains undecidable even in the special case, when M consists of only two matrices. Since product boundedness is a weaker property than the RCP from which it follows, and although it is algorithmically undecidable, it seems difficult to decide the RCP property in general. This might answer question (e) of Wright.
However, it is relatively easy to identify one particular RCP set. For both algorithms, the corresponding shrinking matrices form an RCP set.
Lemma 14. 
Both M 1 = T j s h r : 1 j n + 1 and M 2 = T s h r P : P P n + 1 are RCP sets.
Proof. 
By using induction on k, we can prove that for k 1 ,
i = 1 k T j i s h r = i = 1 k 1 2 I + 1 2 e j i e T = 1 2 k I + i = 1 k 1 2 i e j i e T
and
i = 1 k T s h r P j i = i = 1 k 1 2 I + 1 2 e 1 e T P j i = 1 2 k i = 1 k P j i + i = 1 k 1 2 i e i e T 1 = 1 .
For the last formula, note that if P is a permutation matrix, then e i T P = e T and P e i = e j for some j. In element-wise partial ordering, T j s h r 0 holds for j = 1 , , n + 1 and
0 i = 1 k T j i s h r = 1 2 k I + i = 1 k 1 2 i e j i e T e e T .
Here, 1 2 k I 0 as k , and the sequence i = 1 k 1 2 i e j i e T k = 1 is monotone increasing in component-wise partial ordering. Hence, i = 1 T j i s h r is convergent. We also have that 0 i = 1 k T s h r P j i e e T , 1 2 k i = 1 k P j i 0 as k , and i = 1 k 1 2 i e j i e T k = 1 is monotone increasing in partial ordering. Hence, i = 1 T s h r P j i is convergent. □
Remark 2. 
Corollary 2 implies that E 1 T s h r P = λ e T : λ R for all P P n + 1 . If matrices T s h r P s i ( i = 1 , , m ) occur infinitely often in the product i = 1 T s h r P j i and i = 1 k T s h r P j i B ( k ), then the nonzero rows of B belong to i = 1 m E 1 T s h r P j i = λ e T : λ R . Hence, i = 1 k T s h r P j i w e T for some w. Lemma 1 implies that E 1 T j s h r = λ e T : λ R for all j = 1 , , n + 1 . It also follows from the previous argument that i = 1 k T j i s h r w e T for some w.
The S k x ^ e T for the infinitely repeated shrinking process also follows from Cantor’s intersection theorem.
Lemma 15. 
Define the matrix
F = 1 e T 0 I n F 1 = 1 e T 0 I n .
Then, for all T T ( T = T or T = T ˜ ), the matrix F 1 T F has the common block lower triangular form
F 1 T F = 1 0 b T C T ,
where b T R n and C T R n × n .
The result was proved for T T in [14,15]. The proof of case T ˜ is similar. Since both T ˜ and T are finite sets, there exist two constants γ b , γ C > 0 such that b T γ b and C T γ C for all T T ˜ or T T . For T j α T ˜ , the corresponding C T j α has n 1 eigenvalues of λ = 1 and C T j α 1 . For T α P j T ( j > 2 ), C T α P j 1 . However, for T α P j T with α < 1 and j = 1 , 2 , the corresponding C T α P j s have all their eigenvalues in the open unit disk (see Theorem 1). Hence, they can be elements of an RCP set.
If
A i = 1 0 b i C i R n + 1 × n + 1 C i R n × n , i 1 ,
then
L k = i = 1 k A i = 1 0 i = 1 k j = 1 i 1 C j b i i = 1 k C i = 1 0 x k j = 1 k C i .
If the infinite matrix product i = 1 T i ( T i T ( i 1 ) or T T ˜ ( i 1 )) converges to a rank-one matrix of the form w e T , that is, if i = 1 k T i w e T ( w T = w 1 , w ^ T , w ^ R n ), then it follows that
i = 1 k 1 0 b T i C T i F 1 w e T F = e T w 0 w ^ 0 n × n .
Lemma 16. 
For the convergence of the simplex sequence to a common point, it is necessary and sufficient that both
i = 1 k C T i 0 ( k )
and
i = 1 k j = 1 i 1 C T j b T i x ˜ ( k )
hold for some vector x ˜ .
It is clear that the infinite products of M 1 or M 2 satisfy these conditions. We recall the following simple result.
Lemma 17 
([14,15,16]). Assume that j = 1 k C j c k , k = 1 c k is convergent ( < ) and that b k γ for all k. Then, L k = j = 1 k A j converges and
lim k L k = 1 0 x ˜ 0
for some x ˜ .
If C i q < 1 for all i, then the speed of convergence is linear (geometric). This follows from i = 1 k C i q k and the inequality
x ˜ x k γ i = k + 1 m j = 1 i 1 C i γ q k 1 q m > k
with m .
We recall the following result (see Hartfiel [31], Corollary 6.4, Daubechies and Lagarias [34]).
Lemma 18. 
Let M be a compact matrix set. Then, every infinite product, taken from M , converges to 0 if there is a norm · such that A q , q < 1 , for all A M .
Hence, there must be a norm · such that C i q < 1 , for all T s h r P j (or T j s h r ). Note that Lemma 18 is not constructive. However, for the convergence of i = 1 T i w e T (for all T i M ), the condition C T i q < 1 ( T i M ) is necessary and sufficient.
For Algorithm 1, M 1 is the only subset that satisfies this requirement. For Algorithm 2, define the set M ^ 2 = T α P j : α = ± 1 2 , j = 1 , 2 M 2 and assume that
(A)
there exists a matrix norm  · ϑ  (induced by a vector norm  · ϑ ) such that if  T i P i M ^ 2 , then  C T i P i ϑ < 1 .
If (A) holds, then M ^ 2 is an RCP set and every infinite product i = 1 T i P i ( T i P i M ^ 2 ) has the form w e T for some w. It follows from Theorem 1 that for T i P i T M ^ 2 , C T i P i ϑ 1 .
The existence of such norm · ϑ of the form A ϑ = S 1 A S 2 was proved in [14] for n = 1 , 2 , 3 and in [15] for n = 1 , 2 , , 8 . For the cases n = 9 , 10 , the corresponding matrices S, found experimentally by direct search methods, are given in Appendix A. Since the eigenvalues of T α P j ( α < 1 , j = 1 , 2 ) that lay inside the unit disk converge to 1 for n (see Lemma 3 of [15]), it is hard to find a norm with property (A) for greater n values.
The obtained RCP sets M 1 and M ^ 2 are quite narrow. However, we can keep the convergence of any infinite product from M 1 or M ^ 2 by inserting an infinite number of matrices from T under proper conditions (for Algorithm 2, see [14,15]). For Algorithm 2, it was also observed [16] that several fixed-length matrix products have spectral norms less than 1. Upon these observations, the convergence set of both algorithms can be significantly expanded.
Assume again that T = T ˜ or T = T . Moreover, introduce the notation
T = T i 1 T i : T i j T , j = 1 , 2 , ,
and consider the product j = 1 m T i j = j = 1 m t = 1 T i j 1 + t , where
T ^ j = t = 1 T i j 1 + t T j = 1 , , m .
Let
F 1 T i j F = 1 0 b T i j C T i j = 1 0 b i j C i j .
Then,
F 1 j = 1 m T i j F = 1 0 j = 1 m t = 1 j 1 C i t b i j j = 1 m C i j
and
F 1 T ^ j F = 1 0 r = 1 t = 1 r 1 C i j 1 + t b i j 1 + r τ = 1 C i j 1 + τ = 1 0 b ^ j C ^ j = 1 0 b T ^ j C T ^ j ,
where b ^ j r = 1 1 γ C r 1 γ b = : γ ^ b ^ and C T ^ j γ C = : γ ^ C ^ are uniformly bounded.
Assume that k = m and consider the expression
F 1 j = 1 m T ^ j F = 1 0 j = 1 m t = 1 j 1 C ^ t b ^ j j = 1 m C ^ j .
which is clearly equal to (32).
For 0 < q < 1 , define the sets
T q = T i 1 T i T : C T i 1 T i q , T Q = T T q .
For any T ^ j T q , we have the estimates b ^ j γ ^ b ^ and C ^ j q < 1 . Hence, by Lemma 17, the set T q is an RCP set, and the limits of the infinite products j = 1 T ^ j are of the form w e T .
Now, consider the product
j = 1 k T i j = j = 1 m + r T i j = j = 1 m T i j τ = m + 1 m + r T i τ = j = 1 m T ^ j τ = m + 1 m + r T i τ
where T ^ j T q ( j = 1 , , m ), T i τ T ( τ = m + 1 , m + r ) and 0 r < are fixed ( 1 ). The corresponding reduced form is
F 1 j = 1 k T i j F = 1 0 x k j = 1 m C ^ j τ = m + 1 m + r C i τ
Consider the product j = 1 m T ^ j ( T ^ j T ), where the number of factors T ^ j that belong to T q is r 1 m . Clearly, 0 r 1 m m . There exists a κ N such that 1 q κ 1 γ ^ C 1 q κ . Assume that there is a number μ 0 , 1 such that 1 + κ r 1 m κ m μ m . Then,
j = 1 m C ^ j τ = m + 1 m + r C i τ q r 1 m γ ^ C m r 1 m γ C r γ C 1 q r 1 m κ m r 1 m = γ C 1 q 1 + κ r 1 m κ m γ C 1 q μ m γ C 1 q μ k 1 = 1 q γ C 1 q μ k .
Since q ^ = q μ < 1 and the conditions of Lemma 17 hold, we proved the following result.
Theorem 5. 
Assume that n 2 , S 0 is non-degenerate, 1 is fixed and T q is not empty. Let r 1 k be the number of ℓ products that belong to T q during the first k iterations of the Nelder–Mead method (Algorithm 2). Moreover, assume that for an integer κ N , 1 q κ 1 γ ^ C 1 q κ and for some μ 0 , 1 , r 1 k μ + κ 1 + κ k . Then, the Nelder–Mead method (Algorithm 2) converges in the sense that
lim k x j k = x ^ j = 1 , 2 , , n + 1
with a convergence speed proportional to Q q μ k .
Note that assumption r 1 m μ + κ 1 + κ m , which is a density condition, yields an infinite number of factors from the set T Q as m . The larger the ratio T q / T , the wider the convergence set. Theorem 5 is clearly true for any pair of subsets W and W q such that W q W and W T .
For Algorithm 1 and some q < 1 , T ˜ q 1 = M 1 , to which we cannot add more elements, since C T j α 1 in any induced matrix norm. For Algorithm 2, T q 1 = M ^ 2 under Assumption (A).
Theorem 5 was first proved for Algorithm 2 for = 1 in [14,15] and for > 1 in [16] in a slightly different way and form. The conditions were verified for a spectral norm and q = 0.99 up to dimension n = 6 .
The computational results of Section 7 show that for > 1 , M 1 T ˜ q and M ^ 2 T q (see also [16]). Hence, Theorem 5 indeed extends the convergence sets M 1 and M ^ 2 (quite significantly).
Upon the basis of the earlier version of Theorem 5, a stochastic convergence result was proved for Algorithm 2 in [16]. Since Theorem 5 holds for both algorithms, this can be extended for Algorithm 1 as well. However, it will be published later for reasons discussed in Section 7.

7. Computational Results and Conclusions

Here, we consider the cardinality of set T q for both algorithms with and without shrinking operations.
Define the quantities r ˜ n , = T ˜ q / T ˜ and r n , = T q / T . The larger the ratio r ˜ n , or r n , , the wider the convergence set. Table 1 and Table 2 contain the computed values r ˜ n , or r n , , respectively.
Note that r ˜ n , 1 / 5 and r n , n + 1 ! 3 n + 3 + ( n + 1 ) ! . The ratio r n , cannot achieve 1 (see the examples within Section 1 and Section 6). For increasing n, the n + 1 ! shrinking matrices somehow distort the real situation, since it is unlikely that all possible shrinking steps occur for one particular function.
It is clear that Algorithm 2 has a wider convergence set than Algorithm 1.
If we discard the shrinking operations (the case of strictly convex functions), then Theorem 5 also holds for the pair of sets
T ˜ w = T i 1 T i : T i j T ˜ M 1 , j = 1 , , n + 1 ,
T ˜ w q = T i 1 T i T ˜ w : C T i 1 T i q
and
T w = T i 1 T i : T i j T M 2 , j = 1 , 2 , , ,
T w q = T i 1 T i T w : C T i 1 T i q ,
instead of T and T q , respectively. Define again the ratios r ˜ w n , = T ˜ w q / T ˜ w and r w n , = T w q / T w . Note that for n 10 and some 0 < q < 1 , r w n , 4 / 3 n + 3 . Table 3 and Table 4 contain the computed values of r ˜ w n , and r w n , , respectively.
The comparison of the last two tables shows again that Algorithm 1 has a smaller convergence set and a slower convergence rate than Algorithm 2.
We can observe that the larger the , the larger is the ratio in all four tables. Although the ratios cannot achieve 1, it is conjectured that they might be quite close to 1 for a large .
For both algorithms, we can observe the increase in convergence sets, if is increasing. For larger n values, this increase is slower, especially if no shrinking is allowed. However, this requires an enormous computational time to check, even for modest pairs of n and . The reason for this behavior is unknown and yet to be investigated. However, we can establish the following conclusions. In case of the convergence of the simplex sequence, Algorithm 2 is better than Algorithm 1. Hence, the ordering of Lagarias et al. [12] significantly improved the classical Nelder–Mead simplex method.
Theorem 5 indicates that the generated simplex sequence has a strong tendency to converge to one common limit point. This, together with Lemma 2 (or Lemma 3), may guarantee that the Nelder–Mead simplex algorithm generates good practical results.
In the case of the convergence of function values at vertices, the novel results are essentially the same for both methods. The extension of Theorems 2, 3 or 4 for n > 2 is an open question. For n = 2 , all proofs exploit (implicitly or explicitly) an involuntory property, which does not hold for n 3 . How the convexity assumptions can be relaxed further is also an open question. If f is quasiconvex on R 2 and bounded from below, then f 2 = f 3 . However, the other parts of the proof do not work.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The author is indebted to J. Abaffy and L. Szeidl for their help and comments. He is also indebted to the unknown referees whose observations and suggestions helped to improve this paper.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

The matrices for the norm A ϑ = S 1 A S 2 and n = 9 , 10 .
The case where n = 9 :
S = 1.2571 0.0128 0.5028 0.5047 2.3173 1.2682 1.4404 2.2437 1.1064 1.1260 1.8225 1.0287 0.0972 1.3367 1.2773 0.3227 0.7005 2.0421 1.7197 0.0434 0.1394 0.5115 0.1233 0.2385 0.9384 0.4162 2.5873 0.0424 0.0296 0.2238 1.3031 1.0884 0.0319 1.7164 2.1411 0.7368 1.1717 1.7668 0.2037 0.6359 1.3285 1.7570 0.1136 0.3241 1.0724 0.0138 0.9442 0.0342 0.8301 0.0499 0.2340 2.8213 1.0245 0.5072 0.0718 0.5593 2.5306 0.5053 1.5393 0.9299 0.4054 0.1010 0.0832 0.5396 0.2878 0.1743 2.5974 0.9844 0.1686 0.0242 1.0429 0.1559 2.1307 1.4421 0.4763 0.5795 0.0088 0.4682 1.0355 0.0462 0.5015
The case where n = 10 :
S = 2.0568 4.2600 0.8876 0.1566 1.9386 0.0651 0.2271 1.8056 0.1294 0.0664 0.4312 0.0886 1.8330 4.6222 0.0374 0.0071 0.0483 0.8386 0.2741 0.0184 3.4415 1.8884 0.8095 1.5183 1.2164 1.3902 0.0710 0.0837 1.6783 0.0210 2.1716 0.0643 1.0052 0.0034 1.0715 0.8995 2.6609 0.0733 2.7365 0.5002 1.3924 0.2185 0.0169 0.3663 0.0497 0.4920 4.0342 0.1023 1.1653 1.0472 0.1115 0.9027 1.9524 2.1948 2.7482 0.0134 0.0722 1.3448 1.2371 0.3717 0.5131 0.0069 0.8486 0.0600 0.7613 3.9197 0.5919 0.5482 1.0459 0.4000 0.9491 1.8435 3.2427 0.4871 0.5311 0.8950 0.6546 0.7932 0.6149 0.3206 1.0854 0.3223 0.1177 0.5542 0.7593 1.2426 0.0307 0.0804 0.2414 3.5372 0.0960 1.6263 0.1063 0.4209 1.6908 0.0096 0.1650 2.4456 0.5468 1.1978

References

  1. Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
  2. Spendley, W.; Hext, G.; Himsworth, F. Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics 1962, 4, 441–461. [Google Scholar] [CrossRef]
  3. Audet, C.; Hare, W. Derivative-free and Blackbox Optimization; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
  4. Conn, A.; Scheinberg, K.; Vicente, L. Introduction to Derivative-Free Optimizations; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar] [CrossRef]
  5. Kelley, C. Iterative Methods for Optimization; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar] [CrossRef]
  6. Kochenderfer, M.; Wheeler, T. Algorithms for Optimization; The MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
  7. Walters, F.; Morgan, S.; Parker, L.; Deming, S. Sequential Simplex Optimization; CRC Press LLC: Boca Raton, FL, USA, 1991. [Google Scholar]
  8. Larson, J.; Menickelly, M.; Wild, S. Derivative-free optimization methods. Acta Numer. 2019, 28, 287–404. [Google Scholar] [CrossRef]
  9. Rykov, A. System Analysis. Models and Methods of Decision Making and Search Engine Optimization (in Russian); MISiS: Moscow, Russia, 2009. [Google Scholar]
  10. Heeman, P. On Derivative-Free Optimisation Methods. Master’s Thesis, University of Bergen, Bergen, Norway, 2023. [Google Scholar]
  11. Rios, L.; Sahinidis, N. Derivative-free optimization: A review of algorithms and comparison of software implementations. J. Glob. Optim. 2013, 56, 1247–1293. [Google Scholar] [CrossRef]
  12. Lagarias, J.; Reeds, J.; Wright, M.; Wright, P. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optimiz. 1998, 9, 112–147. [Google Scholar] [CrossRef]
  13. Kelley, C. Detection and remediation of stagnation in the Nelder-Mead algorithm using an sufficient decrease condition. SIAM J. Optimiz. 1999, 10, 43–55. [Google Scholar] [CrossRef]
  14. Galántai, A. Convergence theorems for the Nelder-Mead method. J. Comput. Appl. Mech. 2020, 15, 115–133. [Google Scholar] [CrossRef]
  15. Galántai, A. Convergence of the Nelder-Mead method. Numer. Algorithms 2022, 90, 1043–1072. [Google Scholar] [CrossRef]
  16. Galántai, A. A stochastic convergence result for the Nelder-Mead simplex method. Mathematics 2023, 11, 1998. [Google Scholar] [CrossRef]
  17. Wright, M. Direct search methods: Once scorned, now respectable. In Numerical Analysis 1995 (Proceedings of the 1995 Dundee Biennial Conference in Numerical Analysis); Griffiths, D., Watson, G., Eds.; Addison-Wesley Longman: Harlow, UK, 1996; pp. 191–208. [Google Scholar]
  18. Wright, M. Nelder, Mead, and the other simplex method. In Documenta Mathematica, Extra Volume: Optimization Stories; FIZ Karlsruhe GmbH: Eggenstein-Leopoldshafen, Germany, 2012; pp. 271–276. [Google Scholar]
  19. McKinnon, K. Convergence of the Nelder-Mead simplex method to a nonstationary point. SIAM J. Optimiz. 1998, 9, 148–158. [Google Scholar] [CrossRef]
  20. Galántai, A. Convergence of the Nelder-Mead method for convex functions. Acta Polytech. Hung. 2024, 21, 185–202. [Google Scholar] [CrossRef]
  21. Galántai, A. A convergence analysis of the Nelder-Mead simplex method. Acta Polytech. Hung. 2021, 18, 93–105. [Google Scholar] [CrossRef]
  22. Hendrix, E. ; G.-Tóth, B. Introduction to Nonlinear and Global Optimization; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef]
  23. Oldenburger, R. Infinite powers of matrices and characteristic roots. Duke Math. J. 1940, 6, 357–361. [Google Scholar] [CrossRef]
  24. Meyer, C. Matrix Analysis and Applied Linear Algebra; SIAM: Philadelphia, PA, USA, 2000. [Google Scholar]
  25. Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  26. Barnett, S. Matrices: Methods and Applications; Clarendon Press: Oxford, UK, 1990. [Google Scholar]
  27. Kittaneh, F. Singular values of companion matrices and bounds of zeros of polynomials. SIAM J. Matrix. Anal. Appl. 1995, 16, 333–340. [Google Scholar] [CrossRef]
  28. Montano, E.; Salas, M.; Soto, R. Positive matrices with prescribed singular values. Proyecciones 2008, 27, 289–305. [Google Scholar] [CrossRef]
  29. Trench, W. Characterization and properties of matrices with k-involutory symmetries. Linear Algebra Its Appl. 2008, 429, 2278–2290. [Google Scholar] [CrossRef]
  30. Mirsky, L. An Introduction to Linear Algebra; Dover Publications, Inc.: New York, NY, USA, 1990. [Google Scholar]
  31. Hartfiel, D. Nonhomogeneous Matrix Products; World Scientific: Singapore, 2002. [Google Scholar] [CrossRef]
  32. Beyn, W.J.; Elsner, L. Infinite products and paracontracting matrices. Electron. J. Linear Al. 1997, 2, 1–8. [Google Scholar] [CrossRef]
  33. Blondel, V.; Tsitsiklis, J. The boundedness of all products of a pair of matrices is undecidable. Syst. Control Lett. 2000, 41, 135–140. [Google Scholar] [CrossRef]
  34. Daubechies, I.; Lagarias, J. Sets of Matrices All Infinite Products of Which Converge. Linear Algebra Its Appl. 1992, 161, 227–263. [Google Scholar] [CrossRef]
Table 1. Ratios for Algorithm 1.
Table 1. Ratios for Algorithm 1.
r ˜ n ,
n 234567
20.395560.525630.599530.658730.703310.7403
30.280.37550.457660.527660.583970.63187
40.280.343040.408020.468770.524890.57442
Table 2. Ratios for Algorithm 2.
Table 2. Ratios for Algorithm 2.
r ˜ n ,
n 234567
20.711110.836150.902040.940990.96410.97795
30.851850.937460.973890.989120.99560.99824
Table 3. Ratios for Algorithm 1 without shrinking.
Table 3. Ratios for Algorithm 1 without shrinking.
r ˜ w n ,
n 234567
20.180560.245370.29630.328230.355660.37868
300.0263670.0628970.0934010.118470.13871
4000.005550.0165750.0290860.041108
Table 4. Ratios for Algorithm 2 without shrinking.
Table 4. Ratios for Algorithm 2 without shrinking.
r ˜ w n ,
n 234567
20.345680.469140.546870.614370.671570.71873
300.11690.242570.32780.390740.44305
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Galántai, A. The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms 2024, 17, 523. https://doi.org/10.3390/a17110523

AMA Style

Galántai A. The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms. 2024; 17(11):523. https://doi.org/10.3390/a17110523

Chicago/Turabian Style

Galántai, Aurél. 2024. "The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions" Algorithms 17, no. 11: 523. https://doi.org/10.3390/a17110523

APA Style

Galántai, A. (2024). The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms, 17(11), 523. https://doi.org/10.3390/a17110523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop