Next Article in Journal
Online Batch Selection for Enhanced Generalization in Imbalanced Datasets
Next Article in Special Issue
Learning Data for Neural-Network-Based Numerical Solution of PDEs: Application to Dirichlet-to-Neumann Problems
Previous Article in Journal
Acknowledgment to the Reviewers of Algorithms in 2022
Previous Article in Special Issue
A Symbolic Method for Solving a Class of Convolution-Type Volterra–Fredholm–Hammerstein Integro-Differential Equations under Nonlocal Boundary Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations

1
Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18000 Niš, Serbia
2
Laboratory “Hybrid Methods of Modelling and Optimization in Complex Systems”, Siberian Federal University, Prosp. Svobodny 79, 660041 Krasnoyarsk, Russia
3
Department of Mathematics, Faculty of Applied Sciences, State University of Tetova, St. Ilinden, n.n., 1220 Tetovo, North Macedonia
4
Department of Mathematics, Yusuf Maitama Sule University, Kano 700282, Nigeria
5
Department of Mathematics and Statistics, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
6
Faculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia
7
Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19210 Bor, Serbia
8
School of Business, Jiangnan University, Lihu Blvd, Wuxi 214122, China
9
Faculty of Science and Engineering, Zienkiewicz Centre for Computational Engineering, Swansea University, Swansea SA1 8EN, UK
*
Authors to whom correspondence should be addressed.
Algorithms 2023, 16(2), 64; https://doi.org/10.3390/a16020064
Submission received: 24 November 2022 / Revised: 7 January 2023 / Accepted: 16 January 2023 / Published: 18 January 2023
(This article belongs to the Special Issue Computational Methods and Optimization for Numerical Analysis)

Abstract

:
This research proposes and investigates some improvements in gradient descent iterations that can be applied for solving system of nonlinear equations (SNE). In the available literature, such methods are termed improved gradient descent methods. We use verified advantages of various accelerated double direction and double step size gradient methods in solving single scalar equations. Our strategy is to control the speed of the convergence of gradient methods through the step size value defined using more parameters. As a result, efficient minimization schemes for solving SNE are introduced. Linear global convergence of the proposed iterative method is confirmed by theoretical analysis under standard assumptions. Numerical experiments confirm the significant computational efficiency of proposed methods compared to traditional gradient descent methods for solving SNE.

1. Introduction, Preliminaries, and Motivation

Our intention is to solve a system of nonlinear equations (SNE) of the general form
F ( x ) = 0 , x R n ,
where R is the set of real numbers, R n denotes the set of n-dimensional vectors from R , and F : R n R n , F ( x ) = ( F 1 ( x ) , , F n ( x ) ) T , and F i : R n R is the ith component of F. It is assumed that F is a continuously differentiable mapping. The nonlinear optimization problem (1) is equivalent to the subsequent minimization of the following goal function f:
min x R n f ( x ) , f ( x ) = 1 2 F ( x ) 2 = 1 2 i = 1 n ( F i ( x ) ) 2 .
The equivalence of (1) and (2) is widely used in science and practical applications. In such problems, the solution to SNE (1) comes down to solving a related least-squares problem (2). In addition to that, the application of the adequate nonlinear optimization method in solving (1) is a common and efficient technique. Some well-known schemes for solving (1) are based on successive linearization, where the search direction d k is obtained by solving the equation
F ( x k ) + F ( x k ) d k = 0 ,
where F ( x k ) J F ( x k ) , and J F ( x ) = F 1 ( x ) x j is the Jacobian matrix of F ( x ) . Therefore, the Newton iterative scheme for solving (1) is defined as
x k + 1 = x k + t k d k = x k t k F ( x k ) 1 F ( x k ) ,
where t k is a positive parameter that stands for the steplength value.

1.1. Overview of Methods for Solving SNE

Most popular iterations for solving (1) use appropriate approximations B k of the Jacobian matrix F ( x k ) . These iterations are of the form x k + 1 = x k + t k d k , where t k is the steplength, and d k is the search direction obtained as a solution to the SNE
B k d k + F ( x k ) = 0 .
For simplicity, we will use notations
F k : = F ( x k ) , y k : = F k + 1 F k , s k : = x k + 1 x k .
The BFGS approximations are defined on the basis of the secant equation B k + 1 s k = y k . The BFGS updates
B k + 1 = B k B k s k s k T B k s k T B k s k + y k y k T y k T s k
with an initial approximation B 0 R n × n were considered in [1].
Further on, we list and briefly describe relevant minimization methods that exploit the equivalence between (1) and (2). The efficiency and applicability of these algorithms highly motivated the research presented in this paper. The number of methods that we mention below confirms the applicability of this direction in solving SNE. In addition, there is an evident need to develop and constantly upgrade the performances of optimization methods for solving (1).
There are numerous methods which can be used to solve the problem (1). Many of them are developed in [2,3,4,5,6,7]. Derivative-free methods for solving SNE were considered in [8,9,10]. These methods are proposed as appropriate adaptations of double direction and steplength methods in nonlinear optimization and the approximation of the Jacobian with a diagonal matrix whose entries are defined utilizing of an appropriate parameter. One approach based on various modifications of the Broyden method was proposed in [11,12]. A derivative-free conjugate gradient (CG) iterations for solving SNE were proposed in [13].
A descent Dai–Liao CG method for solving large-scale SNE was proposed in [14]. Novel hybrid and modified CG methods for finding a solution to SNE were originated in [15,16], respectively. An extension of a modified three-term CG method that can be applied for solving equations with convex constraints was presented in [17]. A diagonal quasi-Newton approach for solving large-scale nonlinear systems was considered in [18,19]. A quasi-Newton method, defined based on an improved diagonal Jacobian approximation, for solving nonlinear systems was proposed in [20]. Abdullah et al. in [21] proposed a double direction method for solving nonlinear equations. The first direction is the steepest descent direction, while the second direction is the proposed CG direction. Two derivative-free modifications of the CG-based method for solving large-scale systems F ( x ) = 0 were presented in [22]. These methods are applicable in the case when the Jacobian of F ( x ) is not accessible. An efficient approximation to the Jacobian matrix with a computational effort similar to that of matrix-free settings was proposed in [23]. Such efficiency was achieved when a diagonal matrix generates a Jacobian approximation. This method possesses low memory space requirements because the method is defined without computing exact gradient and Jacobian. Waziri et al. in [24] followed the approach based on the approximation of the Jacobian inverse by a nonsingular diagonal matrix. A fast and computationally efficient method concerning memory requirements was proposed in [25], and it uses an approximation of the Jacobian by an adequate diagonal matrix. A two-step generalized scheme of the Jacobian approximation was given in [26]. Further on, an iterative scheme which is based on a modification of the Dai–Liao CG method, classical Newton iterates, and the standard secant equation was suggested in [27]. A three-step method based on a proper diagonal updating was presented in [28]. A hybridization of FR and PRP conjugate gradient methods was given in [29]. The method in [29] can be considered as a convex combination of the PRP method and the FR method while using the hyperplane projection technique. A diagonal Jacobian method was derived from data from two preceding steps, and a weak secant equation was investigated in [30]. An iterative modified Newton scheme based on diagonal updating was proposed in [31]. Solving nonlinear monotone operator equations via a modified symmetric rank-one update is given in [32]. In [33], the authors used a new approach in solving nonlinear systems by simply considering them in the form of multi-objective optimization problems.
It is essential to mention that the analogous idea of avoiding the second derivative in the classical Newton’s method for solving nonlinear equations is exploited in deriving several iterative methods of various orders for solving nonlinear equations [34,35,36,37]. Moreover, some derivative-free iterative methods were developed for solving nonlinear equations [38,39]. Furthermore, some alternative approaches were conducted for solving complex symmetric linear systems [40] or a Sylvester matrix equation [41].
Trust region methods have become very popular algorithms for solving nonlinear equations and general nonlinear problems [37,42,43,44].
The systems of nonlinear equations (1) have various applications [15,29,45,46,47,48], for example in solving the 1 -norm problem arising from compressing sensing [49,50,51,52], in variational inequalities problems [53,54], and optimal power flow equations [55] among others.
Viewed statistically, the Newton method and different forms of quasi-Newton methods have been frequently used in solving SNE. Unfortunately, methods of the Newton family are not efficient in solving large-scale SNE problems since they are based on the Jacobian matrix. A similar drawback applies to all methods based on various matrix approximations of the Jacobian matrix in each iteration. Numerous adaptations and improvements of the CG iterative class exist as one solution applicable to large-scale problems. We intend to use the simplest Jacobian approximation using an appropriate diagonal matrix. Our goal is to define computationally effective methods for solving large-scale SNEs using the simplest of Jacobian approximations. The realistic basis for our expectations is the known efficient methods used to optimize individual nonlinear functions.
The remaining sections have the following general structure. The introduction, preliminaries, and motivation are included in Section 1. An overview of methods for solving SNE is presented in Section 1.1 to complete the presentation and explain the motivation. The motivation for the current study is described in Section 1.2. Section 2 proposes several multiple-step-size methods for solving nonlinear equations. Convergence analysis of the proposed methods is investigated in Section 3. Section 4 contains several numerical examples obtained on main standard test problems of various dimensions.

1.2. Motivation

The following standard designations will be used. We adopt the standard notations for the gradient g ( x ) : = f ( x ) and the Hessian G ( x ) : = 2 f ( x ) of the objective function f ( x ) . Further, g k = g ( x k ) denotes the gradient vector for f in the point x k . An appropriate identity matrix will be denoted by I.
Our research is motivated by two trends in solving minimization problems. These streams are described as two subsequent parts of the current subsection. A nonlinear multivariate unconstrained minimization problem is defined as
min f ( x ) , x R n ,
where f ( x ) : R n R is a uniformly convex or strictly convex continuously differentiable function bounded from below.

1.2.1. Improved Gradient Descent Methods as Motivation

The most general iteration for solving (7) is expressed as
x k + 1 = x k + t k d k .
In (8), x k + 1 presents a new approximation point based on the previous x k . Positive parameter t k stays for the steplength value, while d k presents the search direction vector, which is generated based on the descent condition
g k T d k < 0 .
The direction vector d k may be defined in various ways. This vital element is often determined using the features of the function gradient. In one of the earliest optimization schemes, the gradient descent method (GD), this variable is defined as negative of the gradient direction, i.e., d k = g k . In the line search variant of the Newton method, the search direction presents the solution to the system of nonlinear equations G k d = g k with respect to d , where G k : = G ( x k ) = 2 f ( x k ) denotes the Hessian matrix.
Unlike traditional GD algorithms for nonlinear unconstrained minimization, which are defined based on a single step size t k , a class of improved gradient descent ( I G D ) algorithms define the final step size using two or more steps size scaling parameters. Such algorithms were classified and investigated in [56]. Obtained numerical results confirm that the usage of appropriate additional scaling parameters decreases the number of iterations. Typically, one of the parameters is defined using the inexact line search, while the second one is defined using the first terms of the Taylor expansion of the goal function.
A frequently investigated class of minimization methods that can be applied for solving the problem (7) use the following iterative rule
x k + 1 = x k θ k t k g k .
In (9), the parameter t k represents the step size in the kth iteration. The originality of the iteration (9) is expressed though the acceleration variable θ k . This type of optimization scheme with acceleration parameter was originated in [57]. Later, in [58], the authors justifiably named such models as accelerated gradient descent methods (AGD methods shortly). Further research on this topic confirmed that the acceleration parameter generally improves the performance of the gradient method.
The Newton method with included line search technique is defined by the following iterative rule
x k + 1 = x k t k G k 1 g k ,
wherein G k 1 stands for the inverse of the Hessian matrix G k . Let B k be a symmetric positive definite matrix such that B k G k < ϵ , for arbitrary matrix norm . and for a given tolerance  ϵ . Further, let H k be a positive definite approximation of the Hessian’s inverse G k 1 . This approach leads to the relation (11) which is the quasi-Newton method with line search:
x k + 1 = x k t k H k g k .
Updates of H k can be defined as solutions to the quasi-Newton equation
H k + 1 y k = s k ,
where s k = x k + 1 x k , y k = g k + 1 g k . There is a class of iterations (11) in which there is no ultimate requirement for the H k to satisfy the quasi-Newton equation. Such a class of iterates is known as modified Newton methods [59].
The idea in [58] is usage of a proper diagonal approximation of the Hessian
B k = γ k I , γ k > 0 , γ k R .
Applying the approximation (13) of B k , the matrix H k can be approximated by the simple scalar matrix
H k = γ k 1 I .
In this way, the quasi-Newton line search scheme (11) is transformed into a kind of A G D iteration, called the S M method and presented in [58] as
x k + 1 = x k γ k 1 t k g k .
The positive quantity γ k is the convergence acceleration parameter which improves the behavior of the generated iterative loop. In [56], methods of the form (15) are termed as improved gradient descent methods (IGD). Commonly, the primary step size t k is calculated through the features of some inexact line search algorithms. An additional acceleration parameter γ k is usually determined by the Taylor expansion of the goal function. This way of generating acceleration parameter is confirmed as a good choice in [56,58,60,61,62].
The choice γ k : = 1 in the I G D iterations (15) reveal the G D iterations
x k + 1 = x k t k g k .
On the other hand, if the acceleration γ k is well-defined, then the step size t k : = 1 in the I G D iterations (15) is acceptable in most cases [63], which leads to a kind of the G D iterative principle:
x k + 1 = x k γ k 1 g k .
Barzilai and Borwein in [64] proposed two efficient I G D variants, known as B B method variants, where the steplength γ k B B was defined as an approximation H k = γ k B B I . Therefore, the replacement γ k 1 : = γ k B B in (17) leads to the B B iterative rule
x k + 1 = x k γ k B B g k .
The scaling parameter γ k B B in the basic version is defined upon the minimization of the vector norm min γ s k 1 γ y k 1 2 , which gives
γ k B B = s k 1 T y k 1 y k 1 T y k 1 .
The steplength γ k B B in the dual method is produced by the minimization min γ γ s k 1 y k 1 2 , which yields
γ k B B = s k 1 T s k 1 s k 1 T y k 1 .
The B B iterations were modified and investigated in a number of publications [65,66,67,68,69,70,71,72,73,74,75,76,77,78,79]. The so-called Scalar Correction ( S C ) method from [80] proposed the trial steplength in (17) defined by
γ k + 1 S C = s k T r k y k T r k , y k T r k > 0 s k y k , y k T r k 0 , r k = s k γ k y k .
The S C iterations are defined as
x k + 1 = x k γ k S C g k .
A kind of steepest descent and B B iterations relaxed by a parameter θ k ( 0 , 2 ) were proposed in [81]. The so-called Relaxed Gradient Descent Quasi Newton methods, (shortly R G D Q N and R G D Q N 1 ), expressed by
x k + 1 = x k θ k t k γ k 1 g k ,
are introduced in [82]. Here, θ k presents the relaxation parameter. This value is chosen randomly within the ( 0 , 1 ) interval in the R G D Q N schemes and by the relation
θ k = γ k t k γ k + 1
in the R G D Q N 1 algorithm.

1.2.2. Discretization of Gradient Neural Networks (GNN) as Motivation

Our second motivation arises from discretizing gradient neural network (GNN) design. A GNN evolution can be defined in three steps. Further details can be found in [83,84].
The bulleted lists look like this:
Step1GNN. 
Define underlying error matrix E ( t ) by the interchange of the unknown matrix in the actual problem by the unknown time-varying matrix V ( t ) , which will be approximated over time t 0 . The scalar objective of a GNN is just the Frobenius norm of E ( t ) :
ε ( t ) = E ( t ) F 2 2 , E F = Tr ( E T E ) .
Step2GNN. 
Compute the gradient ε ( t ) V = ε ( t ) of the objective ε ( t ) .
Step3GNN. 
Apply the dynamic GNN evolution, which relates the time derivative V ˙ ( t ) and direction opposite to the gradient of ε ( t ) :
V ˙ ( t ) = d V ( t ) d t = γ ε ( t ) V , V ( 0 ) = V 0 .
Here, V ( t ) is the activation state variables matrix, t [ 0 , + ) is the time, γ > 0 is the gain parameter, and V ˙ ( t ) is the time derivative of V ( t ) .
The discretization of V ˙ ( t ) by the Euler forward-difference rule is given by
V ˙ ( t ) ( V k + 1 V k ) / τ ,
where τ is the sampling time and V k = V ( t = k τ ) , k = 1 , 2 ,  [84]. The approximation (23) transforms the continuous-time GNN evolution (23) into discrete-time iterations
V k + 1 V k τ = γ ε ( t ) V = γ ε ( t ) .
Derived discretization of the GNN design is just a GD method for nonlinear optimization:
V k + 1 = V k β k ε ( t ) , β k = τ γ > 0 ,
where β k = τ γ > 0 is the step size. So, the step size β k is defined as a product of two parameters, in which the parameter γ should be “as large as possible”, while τ should be “as small as possible”. Such considerations may add additional points of view to multiple parameters gradient optimization methods.
Our idea is to generalize the IGD iterations considered in [56] to the problem of solving SNE. One observable analogy is that the gain parameter γ from (22) corresponds to the parameter γ k from (15). In addition, the sampling time τ can be considered as an analogy to the primary step size t k ( 0 , 1 ) , which is defined by an inexact line search. Iterations defined as IGD iterations adopted to solve SNE will be called IGDN class.

2. Multiple Step-Size Methods for Solving SNE

The term “multiple step-size methods” is related to the class of gradient-based iterative methods for solving SNE employing a step size defined using two or more appropriately defined parameters. The final goal is to improve the efficiency of classical gradient methods. Two strategies are used in finding approximate parameters: inexact line search and the Taylor expansion.

2.1. IGDN Methods for Solving SNE

Our aim is to simplify the update of the Jacobian F ( x k ) : = J k . Following (13), it is appropriate to approximate the Jacobian with a diagonal matrix
F ( x k ) γ k I .
Then, B k = γ k I in (5) produces the search direction d k = γ k 1 F k , and the iterations (8) are transformed into
x k + 1 = x k t k γ k 1 F k .
The final step size in iterations (26) is defined using two step size parameters: t k and γ k . Iterations that fulfill pattern (26) are an analogy of I G D methods for nonlinear unconstrained optimization and will be termed as I G D N class of methods.
Using the experience of nonlinear optimization, the steplength parameter γ k can be defined appropriately using the Taylor expansion of F ( x ) :
F k + 1 = F k + F ( ξ k ) x k + 1 x k , ξ k [ x k , x k + 1 ]
On the basis of (25), it is appropriate to use F ( ξ k ) γ k I , which implies
F k + 1 F k = γ k x k + 1 x k .
Using (27) and applying notation (6), one obtains the following updates of γ k :
γ k = y k T y k y k T s k = s k T y k s k T s k .
It can be noticed that the iterative rule (26) matches with B B iteration [64]. So, we introduced the B B method for solving SNE. Our further contribution is the introduction of appropriate restrictions on the scaling parameter. To that end, Theorem 1 reveals values of γ k which decrease the objective functions included in F k . The inequality F k + 1 F k means ( F k + 1 ) i ( F k ) i , i = 1 , , n .
Theorem 1.
If the condition γ k + 1 γ k t k is satisfied, then the I G D N iterations (26) satisfy F k + 1 F k .
Proof. 
As a consequence of (26) and (27), one can verify
F k + 1 = F k t k γ k + 1 γ k 1 F k = 1 t k γ k + 1 γ k 1 F k .
In view of t k , γ k + 1 , γ k 0 , it follows that 1 t k γ k + 1 γ k 1 1 . On the other hand, the inequality 1 t k γ k + 1 γ k 1 0 is satisfied in the case γ k + 1 γ k t k . Now, (28) implies ( F k + 1 ) i ( F k ) i , i = 1 , , n , which needs to be proven.    □
So, appropriate update γ k + 1 can be defined as follows:
γ k + 1 = y k T y k y k T s k = s k T y k s k T s k , y k T s k 0 , γ k t k , y k T s k < 0 .
Now, we are able to generate the value of the next approximation in the form
x k + 2 = x k + 1 t k + 1 γ k + 1 1 F k + 1 .
The step size t k + 1 in (30) can be determined using the nonmonotone line search. More precisely, t k is defined by t k = max 1 , s k , where s ( 0 , 1 ) , and the integer k is defined from the line search
f ( x k + t k d k ) f ( x k ) ω 1 | | t k F ( x k ) | | 2 ω 2 | | t k d k | | 2 + η k f ( x k ) ,
wherein ω 1 > 0 , ω 2 > 0 , are constants, and η k is a positive sequence such that
k = 0 η k < .
The equality (28) can be rewritten in the equivalent form
y k = t k γ k + 1 γ k 1 F k ,
which gives
γ k + 1 = γ k F k T y k t k F k T F k .
Further, an application of Theorem 1 gives the following additional update for the acceleration parameter γ k :
γ k + 1 = γ k F k T y k t k F k T F k , F k T y k F k T F k ( 1 , 0 ) , γ k t k , F k T y k F k T F k ( 1 , 0 ) .
Corollary 1.
I G D N iterations (26) determined by (34) satisfy F k + 1 F k .
Proof. 
Clearly, (34) initiates γ k + 1 γ k t k , and the proof follows from Theorem 1.    □
Further on, the implementation framework of the I G D N method is presented in Algorithm 1.
Algorithm 1 The IGDN iterations based on  (29),  (30) or (34), (30).
Require: 
Vector function F ( x ) , ϵ > 0 and initialization x 0 R n .
1:
For k = 0 chose γ 0 = 1 and F ( x 0 ) .
2:
Check the output criterion; if F ( x k ) ϵ is fulfilled then stop the algorithm; else, continue performing the next step.
3:
(Line search) Compute t k ( 0 , 1 ] using (31).
4:
Compute x k + 1 using (30).
5:
Determine γ k + 1 using (29) or (34).
6:
k : = k + 1 .
7:
Return to Step 2.
8:
Outputs: x k + 1 , F ( x k + 1 ) .
Remark 1.
The IGDN algorithm defined by (29) (resp. by (34)) will be denoted by I G D N (29) (resp. by I G D N (34)). Mathematically, I G D N (29) and I G D N (34) are equivalent. The numerical comparison of these algorithms will be performed later.

2.2. A Class of Accelerated Double Direction (ADDN) Methods

In [61], an optimization method was defined by the iterative rule
x k + 1 = x k + t k d k + t k 2 c k ,
where t k denotes the value of the steplength parameter, and d k , c k are the search directions vectors. The vector d k is defined as in the SM-method from [58], which gives d k = γ k 1 g k , and further
x k + 1 = x k t k γ k 1 g k + t k 2 c k .
We want to apply this strategy in solving (1). First of all, the vector c k can be defined according to [85]. An appropriate definition of c k is still open.
Assuming again B k = γ k I , the vector d k from (5) becomes d k = γ k 1 F k , which transforms (35) into
x k + 1 = x k t k γ k 1 F k + t k 2 c k .
We propose the steplength γ k + 1 arising from the Taylor expansion (27) and defined as in (29). In addition, it is possible to use an alternative approach. More precisely, in this case, (27) yields to
F k + 1 = F k γ k + 1 t k γ k 1 F k + t k 2 c k .
As a consequence, γ k + 1 can be defined utilizing
γ k + 1 = γ k y k T y k y k T t k F k + γ k t k 2 c k .
The problem γ k + 1 < 0 in (38) is solved using γ k + 1 = 1 .
We can easily conclude that the next iteration is then generated by
x k + 2 = x k + 1 t k + 1 F k + 1 + t k 2 c k + 1 .
The A D D N iterations are defined in Algorithm 2.
Algorithm 2 The A D D N iterations based on (37), (38).
Require: 
Functions F ( x ) , ϵ > 0 and a given initial vector x 0 R n .
1:
For k = 0 chose γ 0 = 1 and F ( x 0 ) .
2:
Check the stop criterion; if F ( x k ) ϵ is satisfied then stop the algorithm; else, continue with Step 3:.
3:
(Line search) Find t k ( 0 , 1 ] using inexact line search procedure.
4:
Compute x k + 1 using (37).
5:
Determine γ k + 1 using (38).
6:
In case γ k + 1 < 0 , apply γ k + 1 = 1 .
7:
k : = k + 1 .
8:
Back to Step 2.
9:
Outputs: x k + 1 , F ( x k + 1 ) .

2.3. A Class of Accelerated Double Step Size (ADSSN) Methods

If the steplength t k 2 is replaced by another steplength l k in (35), it can be obtained
x k + 1 = x k + t k d k + l k c k .
Here, the parameters t k , l k 0 are two independent step size values, and the vectors d k , c k define the search directions of the proposed iterative scheme (39).
Motivation for this type of iterations arises from [60]. The author of this paper suggested a model of the form (39) with two-step size parameters. This method is actually defined by substituting the parameter t k 2 from (35) with another step size parameter l k . Both step size values are computed by independent inexact line search algorithms.
Since we aim to unify search directions, it is possible to use
d k : = γ k 1 F k , c k : = F k .
The substitution of chosen parameters (40) into (39) produces
x k + 1 = x k t k γ k 1 + l k F k .
The final step size, t k γ k 1 + l k , in the iterations (41) are defined combining three step size parameters: t k , l k , and γ k . Again, the parameter γ k + 1 is defined using the Taylor series of the form
F ( x k + 1 ) = F ( x k ) γ k + 1 t k γ k 1 + l k F ( x k ) .
As a consequence, γ k + 1 can be computed by
γ k + 1 = γ k F k T y k ( t k + γ k l k ) F k T F k .
Theorem 2.
If the condition γ k + 1 γ k t k + γ k l k holds, then the iterations (41) satisfy F k + 1 F k .
Proof. 
Taking (27) in conjunction with (41), one can verify
F k + 1 = F k γ k + 1 t k γ k 1 + l k F k = F k 1 γ k + 1 t k γ k 1 + l k .
Clearly, γ k + 1 γ k t k + γ k l k implies 1 γ k + 1 t k γ k 1 + l k 0 . The proof follows from t k 0 , γ k + 1 , γ k 0 , which ensures 1 γ k + 1 t k γ k 1 + l k 1 .    □
In view of Theorem 2, it is reasonable to define the following update for γ k + 1 in the A D S S N method:
γ k + 1 = γ k F k T y k ( t k + γ k l k ) F k T F k , F k T y k F k T F k ( 1 , 0 ) , γ k t k + γ k l k , F k T y k F k T F k ( 1 , 0 ) .
Once the accelerated parameter γ k + 1 > 0 is determined, the values of step size parameters t k + 1 and l k + 1 are defined. Then, it is possible to generate the next point:
F k + 2 = F k + 1 γ k + 2 t k + 1 γ k + 1 1 + l k + 1 F k + 1 .
In order to derive appropriate values of the parameters t k + 1 and l k + 1 , we investigate the function
Φ k + 1 ( t , l ) = F k + 1 γ k + 2 γ k + 1 1 t + l F k + 1 .
The gradient of Φ k + 1 ( t , l ) is equal to
g Φ k + 1 ( t , l ) = Φ k + 1 ( t , l ) t , Φ k + 1 ( t , l ) l = γ k + 2 γ k + 1 1 F k + 1 , γ k + 2 F k + 1 .
Therefore,
Φ k + 1 ( 0 , 0 ) = F k + 1 .
In addition,
g Φ k + 1 ( t , l ) = { 0 , 0 } F k + 1 = 0 .
Therefore, the function Φ k + 1 ( t , l ) is well-defined.
Step scaling parameters t k and l k can be determined using two successive line search procedures (31).
Corollary 2.
The A D S S N iterations determined by (41) satisfy F k + 1 F k .
Proof. 
Clearly, the definition of γ k + 1 in (42) implies γ k + 1 γ k t k , and the proof follows from Theorem 2.    □
The A D S S N iterations are defined in Algorithm 3.
Algorithm 3 The A D S S N iteration based on (41) and (42).
Require: 
Chosen F ( x ) , ϵ > 0 and an initialization x 0 R n .
1:
For k = 0 chose γ 0 = 1 and F ( x 0 ) .
2:
Check the test criterion; if F ( x k ) ϵ holds, then stop; else, continue with Step 3:.
3:
Find t k using inexact line search.
4:
Find l k using inexact line search.
5:
Compute x k + 1 using (41).
6:
Determine the scalar γ k + 1 using (42).
7:
k : = k + 1 .
8:
Return to Step 2:.
9:
Outputs: x k + 1 and F ( x k + 1 ) .
Remark 2.
Step 6 of Algorithm 3 is defined according to Theorem 2.

2.4. Simplified ADSSN

Applying the relation
t k + l k = 1
between the step size parameters t k and l k in the A D S S N iterative rule (41), the A D S S N iteration is transformed to
x k + 1 = x k t k ( γ k 1 1 ) + 1 F ( x k ) .
The convex combination (45) of step size parameters t k and l k that appear in the A D S S N scheme (41) was originally proposed in [62] and applied in an iterative method for solving the unconstrained optimization problem (7). The assumption (45) represents a trade-off between the steplength parameters t k and l k . In [62], it was shown that the induced single step size method shows better performance characteristics in general. The constraint (45) initiates the reduction of the two-parameter A D S S N rule into a single step size transformed A D S S N (shortly T A D S S N ) iterative method (46).
We can spot that the T A D S S N method is a modified version of I G D N iterations, based on the replacement of the product t k γ k 1 , from the classical IGDN iteration, by the multiplying factor t k ( γ k 1 1 ) + 1 .
The substitution ϕ k : = t k ( γ k 1 1 ) + 1 will be used to simplify the presentation. Here, the accelerated parameter value γ k + 1 is calculated by (29).
Corollary 3.
Iterations (46) satisfy
F k + 1 = F k γ k + 1 ϕ k F k .
Proof. 
It follows from (27) and (46).    □
In view of (47), it is possible to conclude
γ k + 1 = F k T y k ϕ k F k T F k .
Corollary 4 gives some useful restrictions on this rule.
Corollary 4.
If the condition γ k + 1 γ k t k + γ k ( 1 t k ) holds, then the iterations (41) satisfy F k + 1 F k .
Proof. 
It follows from Theorem 1 and l k = 1 t k .    □
In view of Corollary 4, it is reasonable to define the following update for γ k + 1 in the T A D S S N method:
γ k + 1 = F k T y k ϕ k F k T F k , F k T y k ϕ k F k T F k 0 γ k t k + γ k ( 1 t k ) , F k T y k ϕ k F k T F k > 0 .
Then, x k + 2 is equal to
x k + 2 = x k + 1 ϕ k + 1 F k + 1 .
Algorithm 4 The ADSSN iteration based on (46) and (48).
Require: 
Chosen F ( x ) , ϵ > 0 and x 0 R n .
1:
For k = 0 chose γ 0 = 1 and F ( x 0 ) .
2:
Check the termination criterion; if F ( x k ) ϵ holds then stop; else, go to Step 3:.
3:
(Line search) Apply (31) and generate the step size value t k .
4:
Compute l k = 1 t k .
5:
Compute x k + 1 using (46).
6:
Determine the scaling factor γ k + 1 using (48).
7:
k : = k + 1 .
8:
Return to Step 2.
9:
Output: x k + 1 , F ( x k + 1 ) .

3. Convergence Analysis

The level set is defined as
Ω = x R n | F ( x ) F ( x 0 ) ,
where x 0 R n is an initial approximation.
Therewith, the next assumptions are needed:
( A 1 )
The level set Ω defined in (49) is bounded below.
( A 2 )
Lipschitz continuity holds for the vector function F, i.e., F ( x ) F ( y ) r x y for all x , y R n and r > 0 .
( A 3 )
The Jacobian F ( x ) is bounded.
Lemma 1.
Suppose the assumption ( A 2 ) holds. If the sequence x k is obtained by the I G D N (29) iterations, then
y k T s k r s k 2 , r > 0 .
Proof. 
Obviously,
y k T s k = s k T y k = s k T F k + 1 F k .
Therefore, assuming ( A 2 ) , it is possible to derive
y k T s k s k F k + 1 F k r s k 2 .
Previous estimation confirms that (50) is satisfied with r defined by the Lipschitz condition in ( A 2 ) . □
For the convergence results of the remaining algorithms, we need to prove the finiteness of γ k , d k , and the remaining results follow trivially.
Lemma 2.
The γ k generated by I G D N (29) is bounded by the Lipschitz constant r.
Proof. 
Clearly, the complemental step size γ k defined by (29) satisfies
γ k + 1 = y k T s k s k 2 r s k 2 s k 2 = r ,
which leads to the conclusion γ k r . □
Lemma 3.
The additional step size γ k generated by I G D N (34) is bounded as follows:
γ k 1 i = 0 k 1 t i
Proof. 
The updating rule (34) satisfies γ k + 1 γ k t k . Continuing in the same way, one concludes
γ k + 1 γ 0 i = 0 k t i .
The proof can be finished using γ 0 = 1 . □
Lemma 4.
The additional scaling parameter γ k generated by (42) is bounded as follows:
γ k 1 i = 0 k 1 ( t i + γ i l i ) .
Lemma 5.
The directions d k used in I G D N (29) and I G D N (34) algorithms are descent directions.
Proof. 
Since
d k = γ k 1 F k ,
an application of the scalar product of both sides in (56) with F k T in conjunction with Lemma 2 leads to the following conclusion for I G D N (29) iterations:
F k T d k = γ k 1 F k T F k 1 r F k 2 < 0 .
With Lemma 3, it can be concluded that I G D N (34) iterations imply the following:
F k T d k = γ k 1 F k T F k i = 0 k 1 t i F k 2 < 0 .
The proof is complete. □
Lemma 6.
The direction d k used in A D S S N algorithms is a descent direction.
Proof. 
Since
d k = t k γ k 1 + l k F k ,
after using the scalar product of both sides in (59) with F k T and taking into account Lemma 4, we obtain
F k T d k = t k γ k 1 + l k F k T F k = 1 γ k t k + l k γ k F k T F k 1 1 i = 0 k 1 ( t i + γ i l i ) t k + l k γ k F k T F k = i = 0 k ( t i + γ i l i ) F k 2 < 0 .
The proof is complete. □
Theorem 3.
The vector F k + 1 generated by I G D N (34) is a descent direction.
Proof. 
According to (34), it follows
γ k + 1 = γ k F k T y k t k F k T F k = γ k F k T F k + 1 F k t k F k T F k = γ k F k T F k + 1 t k F k 2 + γ k t k .
As a consequence, γ k + 1 γ k t k implies F k T F k + 1 0 , which means that F k + 1 is a descent direction. □
Theorem 4.
The vector F k + 1 generated by A D S S N iterations (41) is a descent direction.
Lemma 7.
If the assumptions ( A 1 ) and ( A 2 ) are valid, then the norm of the direction vector d k generated by I G D N (29) is bounded.
Proof. 
The norm d k can be estimated as
d k = γ k 1 F k γ k 1 F k .
As an implication of ( A 1 ) , one can conclude F k M , which in conjunction with Lemma 2 further approximates d k in (61) by d k w , w = 1 r M > 0 .
Lemma 8.
If the assumptions ( A 1 ) and ( A 2 ) hold, then the norm of the direction vector d k generated by I G D N (34) is bounded.
Proof. 
As an implication of ( A 1 ) , one can conclude F k M , which in conjunction with (54) and (61) further approximates d k in (61) by d k w , w = i = 0 k 1 t i M > 0 .
Lemma 9.
If the assumptions ( A 1 ) and ( A 2 ) are active, then the norm of the direction vector d k generated by A D S S N is bounded.
Proof. 
Following the proof used in Lemma 8, it can be verified that
d k i = 0 k 1 t i γ i 1 + l i M > 0 .
Now, we are going to establish the global convergence of I G D N (29) and I G D N (34) and A D S S N iterations.
Theorem 5.
If the assumptions ( A 2 ) and ( A 3 ) are satisfied and x k are iterations generated by I G D N (29), then
lim k F ( x k ) = 0 .
Proof. 
The search direction is defined by d k = γ k 1 F k . Starting from the apparent relation
F k T d k = γ k 1 F k 2 ,
we can conclude
F k 2 = F k T d k γ k .
Finally, (57) implies F k T d k < 0 , which further implies F k T d k > 0 . From Lemma 2, using (63) and F k T d k > 0 , it follows that
F k 2 = γ k | F k T d k | r F k T d k r F k d k .
Based on Lemma 7, it can be concluded
F k 2 r F k d k r w F k .
By Lemma 5, we can deduce that the norm of the function F ( x k ) is decreasing along the direction d k , which means F ( x k + 1 ) F ( x k ) is true for every k. Based on this fact, it follows
0 F k 2 r w F k 0 ,
which directly implies
lim k F ( x k ) = 0
and completes the proof. □
Theorem 6.
If the assumptions ( A 2 ) and ( A 3 ) are satisfied and x k are iterations generated by I G D N (34), then (62) is valid.
Proof. 
The search direction of I G D N (34) satisfies (63). Finally, since γ k is bounded as in (54), and d k is a descent direction (Lemma 8). iIt can be concluded
0 F k 2 1 i = 0 k 1 t i F k T d k 1 i = 0 k 1 t i F k d k 1 i = 0 k 1 t i w F k 0 ,
which implies the desired result. □
Theorem 7.
If the assumptions ( A 2 ) and ( A 3 ) are satisfied and x k are iterations generated by A D S S N iterations (41), then (62) is valid.

4. Numerical Experience

In order to confirm the efficiency of the presented I G D N and A D S S N processes, we compare them with the E M F D iterations from [8]. We explore performances of both I G D N variants defined by Algorithm 1, depending on chosen acceleration parameter γ k . These variants are denoted as I G D N (29) and I G D N (34).
The following values of needed parameters are used:
  • I G D N algorithms are defined using ω 1 = ω 2 = 10 4 , α 0 = 0.01 , s = 0.2 , ϵ = 10 4 , and η k = 1 ( k + 1 ) 2 .
  • E M F D method is defined using ω 1 = ω 2 = 10 4 , α 0 = 0.01 , s = 0.2 , ϵ = 10 4 , and η k = 1 ( k + 1 ) 2 .
We use the following initial points (IP shortly) for the iterations:
x 1 = o n e s ( 1 , , 1 ) , x 2 = 1 , 1 2 , 1 3 , , 1 n , x 3 = ( 0.1 , 0.1 , , 0.1 ) , x 4 = ( 1 n , 2 n , , 1 ) ,
x 5 = 1 1 n , 1 2 n , , 0 , x 6 = ( 1 , , 1 ) , x 7 = n 1 n , n 2 n , , n 1 , x 8 = ( 1 2 , 1 , 2 3 , , 2 n ) .
The considered nine test problems are listed below.
Problem 1 (P1) [86] Nonsmooth Function
F ( x i ) = 2 x i sin x i , for i = 1 , 2 , , n .
Problem 2 (P2) [87]
F ( x i ) = min min ( x i , x i 2 ) , max ( x i , x i 3 ) , i = 2 , 3 , , n .
Problem 3 (P3) [87] Strictly Convex Function I
F ( x i ) = exp x i 1 , for i = 1 , 2 , , n .
Problem 4 (P4) [87]
F 1 ( x ) = h x 1 + x 2 1 ,
F i ( x ) = x i 1 + h x i + x i 1 1 , i = 2 , 3 , , n 1 , h = 2.5
F n ( x ) = x n 1 + h x n 1 .
Problem 5 (P5) [87]
F 1 ( x ) = x 1 + exp ( cos ( h x 1 + x 2 ) ) ,
F i ( x ) = x i + exp ( cos ( h x i 1 + x i + x i + 1 ) ) , for i = 2 , 3 , , n 1 , h = 1 n + 1
F n ( x ) = x n + exp ( cos ( h x n 1 + x n ) )
Problem 6 (P6) [87]
F 1 ( x ) = 2 x 1 + sin ( x ) 1 ,
F i ( x ) = 2 x i 1 + 2 x i + 2 sin ( x i ) 1 , for i = 2 , 3 , , n 1 , h = 2.5
F n ( x ) = 2 x n + sin ( x n ) 1 .
Problem 7 (P7) [87]
F 1 ( x ) = 3 x 1 3 + x 2 5 + sin ( x 1 x 2 ) sin ( x 1 + x 2 ) ,
F i ( x ) = 3 x i 3 + 2 x i + 1 5 sin ( x i x i + 1 ) + 4 x i x i 1 exp ( x i 1 x i ) 3 , for i = 2 , 3 , , n 1 ,
F n ( x ) = x n 1 exp ( x n 1 x n ) + 4 x n 3 .
Problem 8 (P8) [86]
F ( x i ) = x i sin x i 1 , for i = 1 , 2 , , n .
Problem 9 (P9) [86]
F ( x i ) = 2 x i sin x i , for i = 1 , 2 , , n .
All tested methods are analyzed concerning three main computational aspects: number of iterations (iter), number of function evaluations (fval), and the CPU time (CPU). Performances of analyzed models are investigated on nine listed problems, applied on eight marked initial points, for five variables: 1000, 5000, 10,000, 50,000, 100,000.
According to obtained results, I G D N (29) and I G D N (34) have better performances in comparison to the E M F D method from [8]. Both variants of I G D N algorithms outperform the E M F D method in all considered performances. In the next Table 1 (IGDN-EMFD comparisons), we display the best comparative analysis achievements of all methods regarding three tested profiles: iter, fval, and CPU.
The I G D N (29) variant gives the best results in 52 out of 360 cases, considering the minimal number of iterations. Further, I G D N (34) has the lowest outcomes in 33 out of 230 cases. These variants have the same minimal number of iterations in total, 181 out of 360 cases. All tree models require equal minimal number of iterations in 23 out of 360 cases, while the E M F D methods give the minimal number of iterations in 71 out of 360 cases. Considering the needed number of iterations, I G D N variants reach the minimal values in 265 out of 360 cases, as stated in the column I G D N total.
Regarding the fval metric, the results are as follows: 52 out of 360 cases are in favor to I G D N (29), 33 out of 360 with respect to I G D N (34), 180 out of 360 when both I G D N variants have the same minimal fval, while in 24 out of 360 cases all three methods give equal fval minimal values, and 71 out of 360 are in favor to the E M F D method. The total minimal fval values achieved under the application of some I G D N variants are the same as the total minimal iter numbers, i.e., 265 out of 360.
Concerning the CPU time, numerical outcomes are absolutely in favor of I G D N variants, i.e., in 355 out of 360 cases, while the E M F D is faster only in 5 out of 360 outcomes.
Obtained numerical results justify better performance characteristics of the A D S S N method, which is defined by Algorithm 3, compared to the E M F D method. Actually, the A D S S N scheme outperforms the E M F D iteration regarding all analyzed metrics: iter, fval, CPU time, and additionally with respect to the norm of the objective function. The summary review of obtained numerical values is presented in Table 2 (ADSSN-EMFD comparisons).
Results arranged in Table 2 confirm huge dominance of the A D S S N scheme in comparison with the E M F D method. Considering the number of iterations, the A D S S N method obtains 282 minimal values, while the E M F D wins in only 55 instances. Similar outcomes are recorded regarding the fval profile. The most convincing results are achieved considering the CPU time metric, by which the A D S S N model outperforms the E M F D in 359 out of 360 cases.
This section finishes with a graphical analysis of the performance features of the considered methods. In the subsequent Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6, we display Dolan and Moré [88] performance profiles of compared models in relation to tested metrics: iter, fval, and CPU.
Figure 1, Figure 2 and Figure 3 exhibit the clear superiority of I G D N (29) and I G D N (34) iterations compared to corresponding E M F D iterations regarding the analyzed characteristics iter (resp. fval, CPU time). Further, the theoretical equivalence between I G D N (29) and I G D N (34) implies their identical responses on testing criteria iter and fval, represented in Figure 1 and Figure 2. However, Figure 3 demonstrates slightly better performances of I G D N (34) with respect to I G D N (29), which implies that the updating rule (34) is slightly better compared to (29) concerning the execution time. So, I G D N (34) is computationally the most effective algorithm.
In the rest of this section, we compare A D S S N and E M F D .
Figure 4, Figure 5 and Figure 6 exhibit clear superiority of A D S S N iterations compared to corresponding E M F D iterations regarding all three analyzed performance profiles, iter, fval, and CPU.

5. Conclusions

The traditional gradient descent optimization schemes for solving SNE form a class of methods termed the G D N class. A single step size parameter characterizes methods belonging to that class. We aim to upgrade the traditional G D N iterates by introducing the improved gradient descent iterations ( I G D N ), which include complex steplength values defined by several parameters. In this way, we justified the assumption that applying two or more quantities in defining the composed step size parameters generally improves the performance of an underlying iterative process.
Numerical results confirm the evident superiority of I G D N methods in comparison with E M F D iterations from [8], which indicates the superiority of I G D N methods over traditional G D N methods considering all three analyzed features: iter, fval, and CPU. Confirmation of excellent performance of the presented models is also given through graphically displayed Dolan and Moré’s performance profiles.
The problem of solving SNE by applying some efficient accelerated gradient optimization models is of great interest to the optimization community. In that regard, the question of further upgrading I G D N , A D D N , and A D S S N type of methods is still open.
One possibility for further research is proper exploitation of the results presented in Theorems 1–2 in defining proper updates of the scaling parameter γ k . In addition, it will be interesting to examine and exploit similar results in solving classical nonlinear optimization problems.

Author Contributions

Conceptualization, P.S.S. and M.J.P.; methodology, P.S.S., M.J.P. and B.I.; software, B.I. and J.S.; validation, B.I. and J.S.; formal analysis, P.S.S., B.I., A.S. (Abdullah Shah) and J.S.; investigation, X.C., S.L. and J.S.; data curation, B.I., J.S. and A.S. (Abdullah Shah); writing—original draft preparation, P.S.S., J.S. and B.I.S.; writing—review and editing, M.J.P., B.I.S., X.C., A.S. (Alena Stupina) and S.L.; visualization, B.I., J.S. and B.I.S.; project administration, A.S. (Alena Stupina); funding acquisition, A.S. (Alena Stupina). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant No. 075-15-2022-1121).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Predrag Stanimirović is supported by the Science Fund of the Republic of Serbia, (No. 7750185, Quantitative Automata Models: Fundamental Problems and Applications-QUAM). Predrag Stanimirović acknowledges support Grant No. 451-03-68/2022-14/200124 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support Grant No.174025 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support from the internal-junior project IJ-0202 given by the Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica, Serbia.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yuan, G.; Lu, X. A new backtracking inexact BFGS method for symmetric nonlinear equations. Comput. Math. Appl. 2008, 55, 116–129. [Google Scholar] [CrossRef] [Green Version]
  2. Abubakar, A.B.; Kumam, P. An improved three–term derivative–free method for solving nonlinear equations. Comput. Appl. Math. 2018, 37, 6760–6773. [Google Scholar] [CrossRef]
  3. Cheng, W. A PRP type method for systems of monotone equations. Math. Comput. Model. 2009, 50, 15–20. [Google Scholar] [CrossRef]
  4. Hu, Y.; Wei, Z. Wei–Yao–Liu conjugate gradient projection algorithm for nonlinear monotone equations with convex constraints. Int. J. Comput. Math. 2015, 92, 2261–2272. [Google Scholar] [CrossRef]
  5. La Cruz, W. A projected derivative–free algorithm for nonlinear equations with convex constraints. Optim. Methods Softw. 2014, 29, 24–41. [Google Scholar] [CrossRef]
  6. La Cruz, W. A spectral algorithm for large–scale systems of nonlinear monotone equations. Numer. Algorithms 2017, 76, 1109–1130. [Google Scholar] [CrossRef]
  7. Papp, Z.; Rapajić, S. FR type methods for systems of large–scale nonlinear monotone equations. Appl. Math. Comput. 2015, 269, 816–823. [Google Scholar] [CrossRef]
  8. Halilu, A.S.; Waziri, M.Y. En enhanced matrix-free method via double steplength approach for solving systems of nonlinear equations. Int. J. Appl. Math. Res. 2017, 6, 147–156. [Google Scholar] [CrossRef] [Green Version]
  9. Halilu, A.S.; Waziri, M.Y. A transformed double steplength method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2017, 9, 20–32. [Google Scholar]
  10. Waziri, M.Y.; Muhammad, H.U.; Halilu, A.S.; Ahmed, K. Modified matrix-free methods for solving system of nonlinear equations. Optimization 2021, 70, 2321–2340. [Google Scholar] [CrossRef]
  11. Osinuga, I.A.; Dauda, M.K. Quadrature based Broyden-like method for systems of nonlinear equations. Stat. Optim. Inf. Comput. 2018, 6, 130–138. [Google Scholar] [CrossRef]
  12. Muhammad, K.; Mamat, M.; Waziri, M.Y. A Broyden’s-like method for solving systems of nonlinear equations. World Appl. Sci. J. 2013, 21, 168–173. [Google Scholar]
  13. Ullah, N.; Sabi’u, J.; Shah, A. A derivative–free scaling memoryless Broyden–Fletcher–Goldfarb–Shanno method for solving a system of monotone nonlinear equations. Numer. Linear Algebra Appl. 2021, 28, e2374. [Google Scholar] [CrossRef]
  14. Abubakar, A.B.; Kumam, P. A descent Dai–Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 2019, 81, 197–210. [Google Scholar] [CrossRef]
  15. Aji, S.; Kumam, P.; Awwal, A.M.; Yahaya, M.M.; Kumam, W. Two Hybrid Spectral Methods With Inertial Effect for Solving System of Nonlinear Monotone Equations With Application in Robotics. IEEE Access 2021, 9, 30918–30928. [Google Scholar] [CrossRef]
  16. Dauda, M.K.; Usman, S.; Ubale, H.; Mamat, M. An alternative modified conjugate gradient coefficient for solving nonlinear system of equations. Open J. Sci. Technol. 2019, 2, 5–8. [Google Scholar] [CrossRef]
  17. Zheng, L.; Yang, L.; Liang, Y. A conjugate gradient projection method for solving equations with convex constraints. J. Comput. Appl. Math. 2020, 375, 112781. [Google Scholar] [CrossRef]
  18. Waziri, M.Y.; Aisha, H.A. A diagonal quasi-Newton method for system of nonlinear equations. Appl. Math. Comput. Sci. 2014, 6, 21–30. [Google Scholar]
  19. Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. Jacobian computation-free Newton’s method for systems of nonlinear equations. J. Numer. Math. Stochastics 2010, 2, 54–63. [Google Scholar]
  20. Waziri, M.Y.; Majid, Z.A. An improved diagonal Jacobian approximation via a new quasi-Cauchy condition for solving large-scale systems of nonlinear equations. J. Appl. Math. 2013, 2013, 875935. [Google Scholar] [CrossRef] [Green Version]
  21. Abdullah, H.; Waziri, M.Y.; Yusuf, S.O. A double direction conjugate gradient method for solving large-scale system of nonlinear equations. J. Math. Comput. Sci. 2017, 7, 606–624. [Google Scholar]
  22. Yan, Q.-R.; Peng, X.-Z.; Li, D.-H. A globally convergent derivative-free method for solving large-scale nonlinear monotone equations. J. Comput. Appl. Math. 2010, 234, 649–657. [Google Scholar] [CrossRef] [Green Version]
  23. Leong, W.J.; Hassan, M.A.; Yusuf, M.W. A matrix-free quasi-Newton method for solving large-scale nonlinear systems. Comput. Math. Appl. 2011, 62, 2354–2363. [Google Scholar] [CrossRef] [Green Version]
  24. Waziri, M.Y.; Leong, W.J.; Mamat, M. A two-step matrix-free secant method for solving large-scale systems of nonlinear equations. J. Appl. Math. 2012, 2012, 348654. [Google Scholar] [CrossRef] [Green Version]
  25. Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. A new Newton’s Method with diagonal Jacobian approximation for systems of nonlinear equations. J. Math. Stat. 2010, 6, 246–252. [Google Scholar] [CrossRef]
  26. Waziri, M.Y.; Leong, W.J.; Mamat, M.; Moyi, A.U. Two-step derivative-free diagonally Newton’s method for large-scale nonlinear equations. World Appl. Sci. J. 2013, 21, 86–94. [Google Scholar]
  27. Yakubu, U.A.; Mamat, M.; Mohamad, M.A.; Rivaie, M.; Sabi’u, J. A recent modification on Dai–Liao conjugate gradient method for solving symmetric nonlinear equations. Far East J. Math. Sci. 2018, 103, 1961–1974. [Google Scholar] [CrossRef]
  28. Uba, L.Y.; Waziri, M.Y. Three-step derivative-free diagonal updating method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2014, 6, 73–83. [Google Scholar]
  29. Zhou, Y.; Wu, Y.; Li, X. A New Hybrid PRPFR Conjugate Gradient Method for Solving Nonlinear Monotone Equations and Image Restoration Problems. Math. Probl. Eng. 2020, 2020, 6391321. [Google Scholar] [CrossRef]
  30. Waziri, M.Y.; Leong, W.J.; Mamat, M. An efficient solver for systems of nonlinear equations with singular Jacobian via diagonal updating. Appl. Math. Sci. 2010, 4, 3403–3412. [Google Scholar]
  31. Waziri, M.Y.; Leong, W.J.; Hassan, M.A. Diagonal Broyden-like method for large-scale systems of nonlinear equations. Malays. J. Math. Sci. 2012, 6, 59–73. [Google Scholar]
  32. Abubakar, A.B.; Sabi’u, J.; Kumam, P.; Shah, A. Solving nonlinear monotone operator equations via modified SR1 update. J. Appl. Math. Comput. 2021, 67, 343–373. [Google Scholar] [CrossRef]
  33. Grosan, C.; Abraham, A. A new approach for solving nonlinear equations systems. IEEE Trans. Syst. Man Cybern. 2008, 38, 698–714. [Google Scholar] [CrossRef]
  34. Dehghan, M.; Hajarian, M. New iterative method for solving nonlinear equations with fourth-order convergence. Int. J. Comput. Math. 2010, 87, 834–839. [Google Scholar] [CrossRef]
  35. Dehghan, M.; Hajarian, M. Fourth-order variants of Newton’s method without second derivatives for solving nonlinear equations. Eng. Comput. 2012, 29, 356–365. [Google Scholar] [CrossRef]
  36. Kaltenbacher, B.; Neubauer, A.; Scherzer, O. Iterative Regularization Methods for Nonlinear III—Posed Problems; De Gruyter: Berlin, Germany; New York, NY, USA, 2008. [Google Scholar]
  37. Wang, Y.; Yuan, Y. Convergence and regularity of trust region methods for nonlinear ill-posed problems. Inverse Probl. 2005, 21, 821–838. [Google Scholar] [CrossRef] [Green Version]
  38. Dehghan, M.; Hajarian, M. Some derivative free quadratic and cubic convergence iterative formulas for solving nonlinear equations. Comput. Appl. Math. 2010, 29, 19–30. [Google Scholar] [CrossRef]
  39. Dehghan, M.; Hajarian, M. On some cubic convergence iterative formulae without derivatives for solving nonlinear equations. Int. J. Numer. Methods Biomed. Eng. 2011, 27, 722–731. [Google Scholar] [CrossRef]
  40. Dehghan, M.; Shirilord, A. Accelerated double-step scale splitting iteration method for solving a class of complex symmetric linear systems. Numer. Algorithms 2020, 83, 281–304. [Google Scholar] [CrossRef]
  41. Dehghan, M.; Shirilord, A. A generalized modified Hermitian and skew-Hermitian splitting (GMHSS) method for solving complex Sylvester matrix equation. Appl. Math. Comput. 2019, 348, 632–651. [Google Scholar] [CrossRef]
  42. Bellavia, S.; Gurioli, G.; Morini, B.; Toint, P.L. Trust-region algorithms: Probabilistic complexity and intrinsic noise with applications to subsampling techniques. EURO J. Comput. Optim. 2022, 10, 100043. [Google Scholar] [CrossRef]
  43. Bellavia, S.; Krejić, N.; Morini, B.; Rebegoldi, S. A stochastic first-order trust-region method with inexact restoration for finite-sum minimization. Comput. Optim. Appl. 2023, 84, 53–84. [Google Scholar] [CrossRef]
  44. Bellavia, S.; Krejić, N.; Morini, B. Inexact restoration with subsampled trust-region methods for finite-sum minimization. Comput. Optim. Appl. 2020, 76, 701–736. [Google Scholar] [CrossRef]
  45. Eshaghnezhad, M.; Effati, S.; Mansoori, A. A Neurodynamic Model to Solve Nonlinear Pseudo-Monotone Projection Equation and Its Applications. IEEE Trans. Cybern. 2017, 47, 3050–3062. [Google Scholar] [CrossRef]
  46. Meintjes, K.; Morgan, A.P. A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 1987, 22, 333–361. [Google Scholar] [CrossRef]
  47. Crisci, S.; Piana, M.; Ruggiero, V.; Scussolini, M. A regularized affine–acaling trust–region method for parametric imaging of dynamic PET data. SIAM J. Imaging Sci. 2021, 14, 418–439. [Google Scholar] [CrossRef]
  48. Bonettini, S.; Zanella, R.; Zanni, L. A scaled gradient projection method for constrained image deblurring. Inverse Probl. 2009, 25, 015002. [Google Scholar] [CrossRef] [Green Version]
  49. Liu, J.K.; Du, X.L. A gradient projection method for the sparse signal reconstruction in compressive sensing. Appl. Anal. 2018, 97, 2122–2131. [Google Scholar] [CrossRef]
  50. Liu, J.K.; Li, S.J. A projection method for convex constrained monotone nonlinear equations with applications. Comput. Math. Appl. 2015, 70, 2442–2453. [Google Scholar] [CrossRef]
  51. Xiao, Y.; Zhu, H. A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 2013, 405, 310–319. [Google Scholar] [CrossRef]
  52. Awwal, A.M.; Wang, L.; Kumam, P.; Mohammad, H.; Watthayu, W. A Projection Hestenes–Stiefel Method with Spectral Parameter for Nonlinear Monotone Equations and Signal Processing. Math. Comput. Appl. 2020, 25, 27. [Google Scholar] [CrossRef]
  53. Fukushima, M. Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Program. 1992, 53, 99–110. [Google Scholar] [CrossRef]
  54. Qian, G.; Han, D.; Xu, L.; Yang, H. Solving nonadditive traffic assignment problems: A self-adaptive projection–auxiliary problem method for variational inequalities. J. Ind. Manag. Optim. 2013, 9, 255–274. [Google Scholar] [CrossRef]
  55. Ghaddar, B.; Marecek, J.; Mevissen, M. Optimal power flow as a polynomial optimization problem. IEEE Trans. Power Syst. 2016, 31, 539–546. [Google Scholar] [CrossRef] [Green Version]
  56. Ivanov, B.; Stanimirović, P.S.; Milovanović, G.V.; Djordjević, S.; Brajević, I. Accelerated multiple step-size methods for solving unconstrained optimization problems. Optim. Methods Softw. 2021, 36, 998–1029. [Google Scholar] [CrossRef]
  57. Andrei, N. An acceleration of gradient descent algorithm with backtracking for unconstrained optimization. Numer. Algorithms 2006, 42, 63–73. [Google Scholar] [CrossRef]
  58. Stanimirović, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algorithms 2010, 54, 503–520. [Google Scholar] [CrossRef]
  59. Sun, W.; Yuan, Y.-X. Optimization Theory and Methods: Nonlinear Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
  60. Petrović, M.J. An Accelerated Double Step Size model in unconstrained optimization. Appl. Math. Comput. 2015, 250, 309–319. [Google Scholar] [CrossRef]
  61. Petrović, M.J.; Stanimirović, P.S. Accelerated Double Direction method for solving unconstrained optimization problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef]
  62. Stanimirović, P.S.; Milovanović, G.V.; Petrović, M.J.; Kontrec, N. A Transformation of accelerated double step size method for unconstrained optimization. Math. Probl. Eng. 2015, 2015, 283679. [Google Scholar] [CrossRef] [Green Version]
  63. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
  64. Barzilai, J.; Borwein, J.M. Two-point step size gradient method. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
  65. Dai, Y.H. Alternate step gradient method. Optimization 2003, 52, 395–415. [Google Scholar] [CrossRef]
  66. Dai, Y.H.; Fletcher, R. On the asymptotic behaviour of some new gradient methods. Math. Program. 2005, 103, 541–559. [Google Scholar] [CrossRef]
  67. Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
  68. Dai, Y.H.; Yuan, J.Y.; Yuan, Y. Modified two-point step-size gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
  69. Dai, Y.H.; Yuan, Y. Alternate minimization gradient method. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
  70. Dai, Y.H.; Yuan, Y. Analysis of monotone gradient methods. J. Ind. Manag. Optim. 2005, 1, 181–192. [Google Scholar] [CrossRef]
  71. Dai, Y.H.; Zhang, H. Adaptive two-point step size gradient algorithm. Numer. Algorithms 2001, 27, 377–385. [Google Scholar] [CrossRef]
  72. Raydan, M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 1993, 13, 321–326. [Google Scholar] [CrossRef]
  73. Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
  74. Vrahatis, M.N.; Androulakis, G.S.; Lambrinos, J.N.; Magoulas, G.D. A class of gradient unconstrained minimization algorithms with adaptive step-size. J. Comput. Appl. Math. 2000, 114, 367–386. [Google Scholar] [CrossRef] [Green Version]
  75. Yuan, Y. A new step size for the steepest descent method. J. Comput. Math. 2006, 24, 149–156. [Google Scholar]
  76. Frassoldati, G.; Zanni, L.; Zanghirati, G. New adaptive step size selections in gradient methods. J. Ind. Manag. Optim. 2008, 4, 299–312. [Google Scholar] [CrossRef]
  77. Serafino, D.; Ruggiero, V.; Toraldo, G.; Zanni, L. On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 2018, 318, 176–195. [Google Scholar] [CrossRef] [Green Version]
  78. Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Spectral properties of Barzilai–Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds. SIAM J. Optim. 2020, 30, 1300–1326. [Google Scholar] [CrossRef]
  79. Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Hybrid limited memory gradient projection methods for box–constrained optimization problems. Comput. Optim. Appl. 2023, 84, 151–189. [Google Scholar] [CrossRef]
  80. Miladinović, M.; Stanimirović, P.S.; Miljković, S. Scalar Correction method for solving large scale unconstrained minimization problems. J. Optim. Theory Appl. 2011, 151, 304–320. [Google Scholar] [CrossRef]
  81. Raydan, M.; Svaiter, B.F. Relaxed steepest descent and Cauchy-Barzilai-Borwein method. Comput. Optim. Appl. 2002, 21, 155–167. [Google Scholar] [CrossRef]
  82. Djordjević, S.S. Two modifications of the method of the multiplicative parameters in descent gradient methods. Appl. Math. Comput. 2012, 218, 8672–8683. [Google Scholar]
  83. Zhang, Y.; Yi, C. Zhang Neural Networks and Neural-Dynamic Method; Nova Science Publishers, Inc.: New York, NY, USA, 2011. [Google Scholar]
  84. Zhang, Y.; Ma, W.; Cai, B. From Zhang neural network to Newton iteration for matrix inversion. IEEE Trans. Circuits Syst. I Regul. Pap. 2009, 56, 1405–1415. [Google Scholar] [CrossRef]
  85. Djuranovic-Miličić, N.I.; Gardašević-Filipović, M. A multi-step curve search algorithm in nonlinear optimization - nondifferentiable case. Facta Univ. Ser. Math. Inform. 2010, 25, 11–24. [Google Scholar]
  86. Zhou, W.J.; Li, D.H. A globally convergent BFGS method for nonlinear monotone equations without any merit functions. Math. Comput. 2008, 77, 2231–2240. [Google Scholar] [CrossRef]
  87. La Cruz, W.; Martínez, J.; Raydan, M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math. Comput. 2006, 75, 1429–1448. [Google Scholar] [CrossRef] [Green Version]
  88. Dolan, E.; Moré, J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Figure 1. Performance profile of I G D N versus E M F D [8] with respect to iter.
Figure 1. Performance profile of I G D N versus E M F D [8] with respect to iter.
Algorithms 16 00064 g001
Figure 2. Performance profile of I G D N versus E M F D [8] with respect to fval.
Figure 2. Performance profile of I G D N versus E M F D [8] with respect to fval.
Algorithms 16 00064 g002
Figure 3. Performance profile of I G D N versus E M F D [8] with respect to CPU.
Figure 3. Performance profile of I G D N versus E M F D [8] with respect to CPU.
Algorithms 16 00064 g003
Figure 4. Performance profile of A D S S N versus E M F D [8] with respect to iter.
Figure 4. Performance profile of A D S S N versus E M F D [8] with respect to iter.
Algorithms 16 00064 g004
Figure 5. Performance profile of A D S S N versus E M F D [8] with respect to fval.
Figure 5. Performance profile of A D S S N versus E M F D [8] with respect to fval.
Algorithms 16 00064 g005
Figure 6. Performance profile of A D S S N versus E M F D [8] with respect to CPU.
Figure 6. Performance profile of A D S S N versus E M F D [8] with respect to CPU.
Algorithms 16 00064 g006
Table 1. IGDN-EMFD comparisons.
Table 1. IGDN-EMFD comparisons.
Methods(29)(34)(29) = (34)(29) = (34) =  EMFD EMFD IGDN Total
iter52321812372265
fval52331802471265
CPU (sec)214141005355
Table 2. IADSSN-EMFD comparisons.
Table 2. IADSSN-EMFD comparisons.
MethodsADSSNEMFDADSSN = EMFD
iter2825523
fval2815623
CPU (sec)35910
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Stanimirović, P.S.; Shaini, B.I.; Sabi’u, J.; Shah, A.; Petrović, M.J.; Ivanov, B.; Cao, X.; Stupina, A.; Li, S. Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations. Algorithms 2023, 16, 64. https://doi.org/10.3390/a16020064

AMA Style

Stanimirović PS, Shaini BI, Sabi’u J, Shah A, Petrović MJ, Ivanov B, Cao X, Stupina A, Li S. Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations. Algorithms. 2023; 16(2):64. https://doi.org/10.3390/a16020064

Chicago/Turabian Style

Stanimirović, Predrag S., Bilall I. Shaini, Jamilu Sabi’u, Abdullah Shah, Milena J. Petrović, Branislav Ivanov, Xinwei Cao, Alena Stupina, and Shuai Li. 2023. "Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations" Algorithms 16, no. 2: 64. https://doi.org/10.3390/a16020064

APA Style

Stanimirović, P. S., Shaini, B. I., Sabi’u, J., Shah, A., Petrović, M. J., Ivanov, B., Cao, X., Stupina, A., & Li, S. (2023). Improved Gradient Descent Iterations for Solving Systems of Nonlinear Equations. Algorithms, 16(2), 64. https://doi.org/10.3390/a16020064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop