1. Introduction
With the development of science and technology, more and more fields are involved in the solution of nonlinear equation problems, such as chemistry, mechanics, economics and product management [
1,
2,
3,
4]. For example, decentralized decision models in supply chain management and gas pressure volume models in physics can be converted into the following nonlinear equations
where
is a continuously differentiable function. In particular, symmetric nonlinear equations with the Jacobian matrix symmetry also have a wide range of applications, such as the gradient mapping of unconstrained optimization problem, the Karush–Kuhn–Tucker (KKT) of equality constrained optimization problem, and other fields [
5,
6].
The steepest descent method, Newton method, quasi-Newton method, Gauss–Newton (GN) method are commonly used iterative methods for solving (
1) [
7,
8,
9,
10]. The GN method is one of the most famous methods, when the Jacobian matrix is Lipschitz continuous and nonsingular at the solution of (
1), the GN method has quadratic convergence. However, when the Jacobian matrix is singular or nearly singular, the GN method may not be well defined. In order to overcome this difficulty, the Levenberg–Marquardt (LM) method [
11,
12] for solving (
1) was proposed. At the
k-th iteration, the trial step is
where
,
is a Jacobian matrix of
at
, which may be a symmetric matrix or non-symmetric matrix,
I is an identity matrix and the LM parameter
.
The LM method ensures the uniqueness of solution of (
1), and it also has quadratic convergence if
is Lipschitz continuous, nonsingular at the solution, and
is selected appropriately. In this sense, the update of the LM parameter has a great impact on the performance and efficiency of algorithm, many effective LM parameters have been proposed. Yamashita and Fukushima [
13] chose the LM parameter as
, and proved that the LM method had quadratic convergence under the local error bound condition and
is Lipschitz continuous at the solution. However, when
is far away from the solution set,
may be very large, which makes
very small and reduces the efficiency of algorithm; when
is sufficiently close to the solution set,
may be smaller than the machine epsilon and lose its role.
Based on these observations, Fan and Yuan [
14] generalized the LM parameter in [
13], and proved that the numerical results for choosing
is better than choosing
. Fan [
15] first introduced the regularization factor
into the LM method and chose
, with numerical results showing that this choice of
provides the best performance. However, when
is far away from the solution set, the choice of both LM parameters does not provide good results. Therefore, to avoid this situation, Fan and Pan [
16] chose the LM parameter as
, in which
is updated by a trust region technique. They defined
as a positive function of
, i.e.,
where
. This update strategy can obtain larger LM trial steps, so that the iterative sequence can quickly converge to the solution set when
is far away from the solution set. Amini et al. [
17] chose the LM parameter as
It is clear that when is far away from the solution set and is very large, is close to 1, so is close to . The choice of speeds up the efficiency of the algorithm more than previous LM parameters.
In addition to the above different choices of LM parameters, the introduction of adaptive technology also has a great impact on the LM method. As we all know, the ratio
between the actual and predicted reductions of the objective function reflects the degree to which the approximate quadratic model approaches the value function. To make more use of information about the ratio, Fan and Yuan [
18] proposed an adaptive LM method by selecting
,
,
is a continuous non-negative function about
, and
. The introduction of
avoids discontinuities when crossing the threshold
of the ratio, and better numerical results can be obtained.
In fact, similar adaptive techniques have been proposed in the trust region algorithms. If
is sufficiently greater than 1, the iteration is too successful at this time, then we can reduce
to a very small value. Then, the algorithm will continue to perform a large number of consecutive unsuccessful iterations. On the other hand, if
,
is a far-from-satisfactory trial step, then we can increase
greatly. At this moment, the successive iteration points will be close to each other and the algorithm will converge slowly. Therefore, Hei [
19] proposed an
R-function by using an adaptive update strategy to update the trust region radius
, i.e.,
. Furthermore, Walmag and Delhez [
20] proposed a
-function to update the trust region radius, i.e.,
, where
is a non-negative and bounded function about
. On this basis, Lu et al. [
21] argued that the consistency between the model and the objective function is not good enough in too-successful iterations, so an
L-function was proposed to update the trust region radius. They showed that the
L-function contains some favorable features of the
R-function and the
-function, and the method is more efficient in too-successful iterations. In this paper, we want to learn from the presentation of the
L-function and provide a new adaptive strategy to update the LM parameter. Our innovations mainly include the following:
◊ A new adaptive accelerated LM method is proposed, which can improve the consistency between the model and the objective function in too-successful iterations by using the ratio information of the actual reduction to the predicted reduction;
◊ The new algorithm can solve the situation in which the iterative sequence is far away from the optimal solution set, accept a large number of unsuccessful iterations and avoid jumping in local areas, thus improving the efficiency and stability of the algorithm;
◊ The new adaptive accelerated LM method has global convergence and quadratic convergence under local error bound.
The rest of this paper is organized as follows. In
Section 2, we describe in detail a new adaptive accelerated LM method which makes full use of the ratio information. Furthermore, we demonstrate that the new algorithm has global convergence under the appropriate conditions and maintains quadratic convergence under local error bound condition. In
Section 3, numerical results are given, indicating that the new algorithm is efficient. The conclusion is given in the last section.
2. Methodology
2.1. The Adaptive Accelerated Levenberg–Marquardt Method
In this section, our main aim is to discuss how to update the LM parameter to propose a new adaptive accelerated LM method. It is easy to see from (
2) that
is the solution to the optimization problem
If
then
is also the solution of the subproblem
Therefore, the LM method can be regarded as a trust region method, which implicitly modifies the trust region radius . The difference between the general trust region method and the LM method is that the LM method does not directly update the trust region radius, but updates the regularization factor .
We define the actual reduction and predicted reduction of the merit function
at the
k-th iteration as
and
The ratio between the actual and predicted reductions of the objective function is defined by
This ratio determines whether the trial step
is accepted. Here, we choose the LM parameter as
The usual empirical rules [
22,
23,
24,
25] of
can be usually summarized as follows
where
and
are constants.
Iterations with greater than are very successful iterations. In this case, it is usually assumed that the approximation of the model function to the objective function is accurate and should be reduced. However, at too-successful iterations, i.e., is sufficiently greater than 1, the consistency between the model and the objective function is not good enough. Thus, we use an adaptive strategy to update the factor , i.e., , where is a function about .
We construct
as follows:
where
and
are constants. Here,
satisfies the following properties
If we obtain a satisfactory trial step and ratio , then we accept trial step and reduce ; otherwise, we reject trial step and increase . At too-successful iterations, the actual reduction of the objective function obtained at iteration k is obviously greater than the predicted reduction. Although the current iteration allows the algorithm to progress towards the optimum, the approximation of the model function to the objective function is bad. Therefore, to avoid reducing too quickly, we use the K-function to update .
According to the properties of the K-function, the rate of reduction is the fastest when is close to 1, i.e., when the model function provides an accurate local approximation of the objective function. The new idea we propose is to allow to be updated at a variable rate according to , which would improve the efficiency and stability of the algorithm.
Based on the above analysis, we state a description of the new adaptive accelerated LM method (Algorithm 1) as follows.
In Algorithm 1,
m is a given lower bound of the parameter
. It is introduced to prevent the step from being too large when the sequence is near the solution.
Algorithm 1 NAALM. |
- 0.
Given , , , , . Let . - 1.
Compute and . If , stop. Otherwise, compute by ( 9). - 2.
Solving the following system
to determine . - 3.
Compute , and by ( 6)–( 8), respectively. - 4.
- 5.
Choose as
where is given by ( 11). Set and go to Step 1.
|
2.2. The Global Convergence
In this section, to obtain the global convergence of NAALM algorithm, we make the following assumption.
Assumption 1. is continuously differentiable, and the Jacobian matrix are Lipschitz continuous, i.e., there exist positive constants and such thatand Lemma 1. Let be computed by (12), then the inequality holds for all .
Proof. From (
7), for
, we have
then
The proof is complete. □
Theorem 1. Under the conditions of Assumption 1, the sequence generated by NAALM algorithm satisfies Proof. If the theorem is not true, then there exist a positive
and infinitely many
k such that
Let
,
be the sets of all indices that satisfy
and
Then, is an infinite set. In the following, we will derive the contradictions regarding whether is finite or infinite.
Case (I) is finite.
It follows from the definition of
that the set
is also finite. Let
be the largest index of
. Then, we know that
holds for all
. Define the indices set
Suppose . It is easy to see that . Moreover, we have . Otherwise, if , then , which contradicts the fact that is the largest index of . Hence, we have . By induction, we know that and hold for all .
It now follows from Step 3 of the NAALM Algorithm that
for all
, which imply
due to (
12)–(
14) and
for all
. Hence, we have
Furthermore, it follows from (
21), (
23) and Lemma 1 that
that is,
. In view of the updating rule of
, we know that there exists a positive constant
such that
holds for all sufficiently large
k, which is a contradiction to (
22). Hence, the supposition (
21) cannot be true while
is finite.
Case (II) is infinite.
It follows from Lemma 1 that
which gives
The above inequality, together with the Lipschitz conditions (
15) and (
16), implies that
Relation (
27) and the fact that (
21) holds for infinitely many
k indicate that there exists a
with
such that
By induction, we obtain that
for all
. This result and (
26) mean that
It follows from (
12) and (
13) that
. By the same analysis as (
24), we know that
. Hence, there exists a positive constant
such that
holds for all large
k, which introduces a contradiction. Therefore, the supposition (
21) cannot be true when
is infinite. The proof is complete. □
2.3. Local Convergence
In this section, we will study the local convergence properties of the NAALM algorithm by using the singular value decomposition (SVD) technique. We assume that the sequence generated by the NAALM algorithm converges to the nonempty solution set and lies in some neighborhood of . Firstly, we present some assumptions which the local convergence theory required.
Definition 1. Let such that , we say that provides a local error bound on for (1) if there exists a positive constant such thatwhere is the distance from x to . Assumption 2. (i) is continuously differentiable, and is Lipschitz continuous on with , i.e., there exists a positive constant such that(ii) provides a local error bound on some neighborhood of , i.e., there exists a positive constant such that By the Lipschitzness of the Jacobian matrix proposed by (
30), we have
and
where
is a positive constant.
In the following, we use
to denote the vector in
that satisfies
To obtain the local convergence rate of , we present some lemmas.
Lemma 2. Under the conditions of Assumption 2, for all sufficiently large k, there exists a constant such that Proof. According to (
34), we have
which means that
. Following from (
13),
and we have from (
32) that
As
is a minimizer of
, we have
then there exists a constant
such that
. The proof is completed. □
Lemma 3. Under the conditions of Assumption 2, for all sufficiently large k, there exists a positive constant such that Proof. First, we show that for sufficiently large
k, the following inequality holds
We consider two cases. In one case, if
, then the definition of
and Assumption 2 imply that
In the other case, if
, then we have
Inequalities (
41) and (
42), together with Lemma 2 show that
which gives (
40). Hence, it follows from (
40), Assumption 2 and Lemma 2 that
Therefore, we have , thus, there exists a constant such that for all large k. The proof is completed. □
Without generality, we assume rank
for all
. Suppose the SVD of
is
where
with
and
,
are orthogonal matrices. Correspondingly, we consider SVD of
by
where
,
are orthogonal matrixes,
with
and
with
.
Lemma 4. Under the conditions of Assumption 2, for all sufficiently large k, we have
;
;
;
.
Proof. The result
follows immediately from (
16). By (
15) and the theory of matrix perturbation [
26], we have
which implies that
Let
, where
is the pseudo-inverse of
. It is easy to see that
is the least-squares solution of
, so we obtain from (
32) that
Let
and
. Since
is the least-squares solution of
, it follows from (
32) that
Due to the orthogonality of and , we obtain the result .
Using (
12) and (
45), we obtain
and
Following from (
13) and (
33), the LM parameter satisfies
Since
converges to the solution set
, we assume that
holds for all sufficiently large
k. Then, it follows from (
46) that
It then follows from Lemmas 3 and 4 that
The proof is completed. □
We can state the quadratic convergence result of the NAALM algorithm.
Theorem 2. Let the sequence be generated by the NAALM algorithm, under Assumption 2, the sequence converges quadratically to a solution of nonlinear Equation (1). Proof. It follows from Assumption 2, Lemma 2 and (
47) that
On the other hand, it is clear that
It follows from Lemma 2 that, for any sufficiently large
k, we have
Therefore,
. This, along with (
48), indicates that
which implies that
is quadratically convergent to a solution of set
. The proof is completed. □