Next Article in Journal
Applying the Simulated Annealing Algorithm to the Set Orienteering Problem with Mandatory Visits
Next Article in Special Issue
Enhancing Parameters Tuning of Overlay Models with Ridge Regression: Addressing Multicollinearity in High-Dimensional Data
Previous Article in Journal
Uncovering Hidden Patterns: Approximate Resurgent Resummation from Truncated Series
Previous Article in Special Issue
Weight Vector Definition for MOEA/D-Based Algorithms Using Augmented Covering Arrays for Many-Objective Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Hybrid Descent Algorithm for Large-Scale Nonconvex Optimization and Application to Some Image Restoration Problems

1
School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
2
Department of Mathematics and Science, School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
3
School of Mathematics, Liaoning Normal University, Dalian 116026, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(19), 3088; https://doi.org/10.3390/math12193088
Submission received: 18 August 2024 / Revised: 25 September 2024 / Accepted: 27 September 2024 / Published: 2 October 2024
(This article belongs to the Special Issue Optimization Algorithms: Theory and Applications)

Abstract

:
Conjugate gradient methods are widely used and attractive for large-scale unconstrained smooth optimization problems, with simple computation, low memory requirements, and interesting theoretical information on the features of curvature. Based on the strongly convergent property of the Dai–Yuan method and attractive numerical performance of the Hestenes–Stiefel method, a new hybrid descent conjugate gradient method is proposed in this paper. The proposed method satisfies the sufficient descent property independent of the accuracy of the line search strategies. Under the standard conditions, the trust region property and the global convergence are established, respectively. Numerical results of 61 problems with 9 large-scale dimensions and 46 ill-conditioned matrix problems reveal that the proposed method is more effective, robust, and reliable than the other methods. Additionally, the hybrid method also demonstrates reliable results for some image restoration problems.

1. Introduction

In this paper, we consider the following unconstrained problem:
min x R n f ( x ) ,
where f : R n R is continuously differentiable, bound below, and its gradient is available. There are many effective methods for problem (1), such as Newton-type methods, quasi-Newton-type methods, spectral gradient methods, and conjugate gradient (CG for abbreviation) methods [1,2,3,4,5,6,7,8,9,10,11], etc. Meanwhile, there are also various free gradient optimization tools such as Nelder–Mead, generalized simulated annealing, and genetic algorithm [12,13,14], etc., for problem (1). In this part, we focus on CG methods and propose a new hybrid CG method for a large-scale problem (1). Actually, CG methods are one of the most effective methods for unconstrained problems, especially for large-scale cases, due to their low storage and globally convergent properties [3], in which the iterative point is usually generated by
x k + 1 = x k + α k d k , k = 0 , 1 , ,
where x k is the current iteration; the scalar α k > 0 is the step length, determined by some line search strategy; and d k is the search direction, defined by
d k = g k , if k = 0 , g k + β k d k 1 , if k 1 ,
where g k : = g ( x k ) = f ( x k ) and β k is called the conjugate parameter. A number of CG methods have been proposed by various modifications of the direction d k and the parameter β k ; see [4,5,6,7,8,9,10,11,15,16,17,18,19,20], etc. Some CG methods have strong convergence properties, but their numerical performances may not be good in practice due to the jamming phenomenon [4]. These methods include Fletcher–Reeves (FR) [5], Dai–Yuan (DY) [6], and Fletcher (CD) [7], with the following conjugate parameters:
β k + 1 F R = g k + 1 2 g k 2 , β k + 1 D Y = g k + 1 2 y k T d k , β k + 1 C D = g k + 1 2 g k T d k ,
where g k + 1 = f ( x k + 1 ) , y k = g k + 1 g k , and · stands for the Euclidean norm. On the other hand, some other CG methods may perform well in practice, but their convergence may be not guaranteed, especially for nonconvex functions. These methods include Hestenes–Stiefel (HS) [8], Polak–Ribière–Polyak (PRP) [9,10], and Liu–Storey (LS) [11], with the following conjugate parameters:
β k + 1 H S = g k + 1 T y k y k T d k , β k + 1 P R P = g k + 1 T y k g k 2 , β k + 1 L S = g k + 1 T y k g k T d k .
In fact, these methods possess an automatically approximate restart feature which can avoid the jamming phenomenon, that is, when the step s k is small, the factor y k tends to zero, resulting in the conjugate parameter β k + 1 becoming small and the new direction d k + 1 approximating to the steepest descent direction g k + 1 .
To attain good computational performance and maintain the attractive feature of strong global convergence, many scholars have paid special attention to hybridizing these CG methods. Specifically, the authors in [21] proposed a hybrid PRP-FR CG method (H1 method in [22]) and the corresponding conjugate parameter was defined as β k + 1 H 1 = max 0 , min { β k + 1 F R , β k + 1 P R P } . Moreover, based on the above hybrid conjugate parameter, a new form was proposed in [23], where the parameter was defined by β k + 1 = max { β k + 1 F R , min { β k + 1 F R , β k + 1 P R P } } , and the global convergence property was established for the general function without the convexity assumption. In [24], a hybrid of the HS method and DY method was proposed in which the conjugate parameter was defined by β k + 1 H 2 = max 0 , min { β k + 1 H S , β k + 1 D Y } . The numerical results indicated that the above hybrid method was more effective than the PRP algorithm. In the above hybrid CG methods, the search direction was in the form of (3). Moreover, the authors in [25] proposed a new hybrid three-term method in which the conjugate parameter is β k + 1 H 2 and the direction is d k + 1 = g k + 1 + ( 1 λ k + 1 ) β k + 1 H 2 d k + λ k + 1 θ k + 1 g k , where λ k + 1 is the convex parameter. The above hybrid method demonstrates attractive numerical performance. Furthermore, in [22], the authors proposed two new hybrid methods based on the above conjugate parameters with different search directions. Concretely, the directions have the following common form:
d k + 1 = 1 + β k + 1 g k + 1 T d k g k + 1 2 g k + 1 + β k + 1 d k ,
where β k + 1 = β k + 1 H 1 or β k + 1 = β k + 1 H 2 . A remarkable feature of the above directions is that the sufficient descent property is automatically satisfied, independent of the accuracy of the line search strategy.
Motivated by the above discussions, in this paper, we propose a new hybrid descent CG method for large-scale nonconvex problems. The proposed hybrid method automatically enjoys the sufficient descent property independent of the accuracy of the line search technique. Furthermore, the global convergence for the general functions without convexity is established under the standard conditions. Numerical results of 549 large-scale problems and 46 ill-conditioned matrix problems indicate the proposed method is attractive and promising. Finally, we also apply the proposed method to some image restoration problems, which also verifies its reliability and effectiveness.
The rest of the paper is organized as follows. In Section 2, we propose a descent hybrid CG method which is based on the MHS method and DY method. Moreover, the sufficient descent property is satisfied independent of the accuracy of the line search techniques. Global convergence is established for the general function in Section 3. Numerical results are given in Section 4 to indicate the effectiveness and reliability of the proposed algorithm. Finally, some conclusions are presented.

2. Motivation, Algorithm, and Sufficient Descent Property

As mentioned in the above section, the HS method is generally regarded as one of the most effective CG methods, but its global convergence for general nonlinear functions is still erratic. Additionally, the HS method does not guarantee the descent property during the iterative process, that is, the condition g k T d k < 0 may not be satisfied for k 1 . Therefore, many researchers have been devoted to designing some descent HS conjugate gradient methods [4,24,26,27,28,29,30], etc. Specifically, to obtain an intuitively modified conjugate parameter, the authors in [26] approximated the direction d k + 1 T H S by the two-term direction (3), where d k + 1 T H S was defined by (4) with β k + 1 = β k + 1 H S . Concretely, the least squares problem min β g k + 1 + β k + 1 d k d k + 1 T H S 2 was solved. After some algebraic manipulations, the unique solution was
β k + 1 M H S = g k + 1 T y k y k T d k 1 ( g k + 1 T d k ) 2 g k + 1 2 d k 2 = β k + 1 H S ϑ k ,
where
ϑ k = 1 ( g k + 1 T d k ) 2 g k + 1 2 d k 2 .
The above parameter and its modifications have some nice theoretical properties [26] and the method with (5) and (3) performs well. Meanwhile, it is clear that if the exact line search is adopted (i.e., g k + 1 T d k = 0 ), it holds that β k + 1 M H S = β k + 1 H S = β k + 1 P R P .
To attain attractive computational performance and good theoretical properties, many researchers have proposed hybrid CG methods. Among these methods, hybridizations of the HS method and the DY method have shown promising numerical performance [31,32,33,34], etc. The HS method has a nice property of automatically satisfying the conjugate condition d k + 1 T y k = 0 for k 0 independent of the accuracy of the line search strategies and the convexity of the objective function and performs well in practice. On the other hand, the DY method has remarkable convergence properties. These characteristics motivate us to propose new hybridizations of the HS method and the DY method which not only have attractive theoretical properties but also better numerical performance for large-scale nonconvex problems.
In the following, we focus on the conjugate parameter β k + 1 M H S and propose a new hybrid conjugate parameter of β k + 1 D Y and β k + 1 M H S :
β k + 1 N = max 0 , min β k + 1 D Y , β k + 1 M H S .
Now, based on the new hybrid conjugate parameter β k + 1 N and the modified descent direction (8), we propose our hybrid algorithm (NMHSDY) in detail.
It should be noted that the line search technique in Algorithm 1 is not fixed: It can be selected by the users. Next, we show that the search direction d k generated by Algorithm 1 automatically has a sufficient descent property independent of any line search strategy.
Algorithm 1 New descent hybrid algorithm of MHS and DY methods (NMHSDY) for nonconvex functions.
Step 0. Input and Initialization. Select an initial point x 0 R n , parameter ε 0 and compute f 0 = f ( x 0 ) and g 0 = g ( x 0 ) . Set d 0 = g 0 and k = 0 ;
Step 1. If g k ε , then stop;
Step 2. Compute step length α k along direction d k by some line search strategy;
Step 3. Let x k + 1 = x k + α k d k ;
Step 4. Compute the conjugate parameter β k + 1 N by (7) and the search direction d k + 1 by
d k + 1 = 1 + β k + 1 N g k + 1 T d k g k + 1 2 g k + 1 + β k + 1 N d k ,
Step 5. Set k : = k + 1 and go to Step 2.
Theorem 1.
Let the search direction d k be defined by (8) in Algorithm 1. Then, for any line search strategy, the sufficient descent property holds for nonconvex function f ( x ) , that is,
g k T d k = g k 2 , k 0 .
Proof. 
By the definition of d k + 1 in (8), we have
g k + 1 T d k + 1 = g k + 1 2 β k + 1 g k + 1 T d k + β k + 1 g k + 1 T d k = g k + 1 2 .
Since d 0 = g 0 , then d 0 T g 0 = g 0 2 . All in all, (9) holds. This completes the proof. □

3. Convergence for General Nonlinear Functions

In this section, the global convergence of the NMHSDY method is presented. Before that, some common assumptions are listed.
Assumption 1.
The level set L = { x R n : f ( x ) f ( x 0 ) } is bounded, where x 0 is the initial point, i.e., there exists a positive constant D > 0 such that
x D , x L .
Assumption 2.
In some neighborhood N of L , the gradient g ( x ) = f ( x ) is Lipschitz continuous, i.e., there exists a constant L 1 > 0 such that
g ( x ) g ( y ) L 1 x y , x , y N .
Based on the above assumptions, we further obtain that there exists a constant M > 0 such that
g ( x ) M , x L .
In fact, it holds that g ( x ) = g ( x ) g ( x 0 ) + g ( x 0 ) g ( x ) g ( x 0 ) + g ( x 0 ) L 1 x x 0 + g ( x 0 ) 2 L 1 D + g ( x 0 ) ; hence, M can be 2 L 1 E + g ( x 0 ) or larger than that.
The line search strategy is another important element in iterative methods. In this part, we take the standard Wolfe line search strategy:
f ( x k + α k d k ) f k + σ 1 α k g k T d k , g ( x k + α k d k ) T d k σ 2 g k T d k ,
where 0 < σ 1 < σ 2 < 1 . By property (9) and line search (13), it is satisfied that
f k + 1 f k σ 1 α k g k 2 f k ,
that is, the sequence { f k } is non-increasing and the sequence { x k } generated by Algorithm 1 is contained in the level set L . Since f is continuously differentiable and the set L is bounded, then there exists a constant f * such that
lim k f ( x k ) = f * .
The Zoutendijk condition [35] plays an essential role in the global convergence of nonlinear CG methods. For completeness, we here state the lemma but omit its proof.
Lemma 1.
Suppose that Assumptions 1 and 2 hold. Consider any nonlinear CG method, in which α k is obtained by the standard Wolfe line search (13) and d k is a descent direction ( g k T d k < 0 ). Then, we have
k = 1 ( g k T d k ) 2 d k 2 < .
Thereafter, the convergence property is presented in the following theorem for the general functions without convexity assumption.
Theorem 2.
Let Assumptions 1 and 2 hold and the sequence { x k } be generated by the NMHSDY algorithm. Set l k + 1 = β k + 1 N β k + 1 D Y , and if l k + 1 [ 1 σ 2 1 + σ 2 , 1 σ 2 ] holds, then we have
lim inf k g k = 0 .
Proof. 
We now prove (15) by contradiction and assume that there exists a constant μ > 0 such that
g k μ , k 0 .
Let γ k + 1 be 1 + β k + 1 N g k + 1 T d k g k + 1 2 , then the direction (8) can be rewritten as
d k + 1 = γ k + 1 g k + 1 + β k + 1 N d k .
After some algebraic manipulation, we have
d k + 1 2 = ( β k + 1 N ) 2 d k 2 2 γ k + 1 g k + 1 T d k + 1 γ k + 1 2 g k + 1 2 .
Dividing both sides of the above equality by ( g k + 1 T d k + 1 ) 2 , from (9), we have
d k + 1 2 g k + 1 4 = ( β k + 1 N ) 2 d k 2 g k + 1 4 + 2 γ k + 1 g k + 1 2 γ k + 1 2 g k + 1 2 = ( β k + 1 N ) 2 d k 2 g k + 1 4 + 1 g k + 1 2 ( γ k + 1 1 ) 2 g k + 1 2 , = l k + 1 2 d k 2 ( d k T y k ) 2 + 1 g k + 1 2 ( γ k + 1 1 ) 2 g k + 1 2 , l k + 1 2 ( 1 σ 2 ) 2 d k 2 g k 4 + 1 g k + 1 2 d k 2 g k 4 + 1 g k + 1 2 ,
where the first inequality holds by d k T y k ( σ 2 1 ) g k T d k = ( 1 σ 2 ) g k 2 and the last inequality holds by the bound for the scale l k + 1 . By (17) and d 0 2 = g 0 2 , it holds that
d k 2 g k 4 i = 0 k 1 g i 2 .
Then, by the above inequality and (16), it follows that
g k 4 d k 2 μ 2 k + 1 ,
which indicates that
k = 1 g k 4 d k 2 = k = 1 ( g k T d k ) 2 d k 2 = ,
which contradicts the Zoutendijk condition (14). So, (15) holds. □
Remark 1.
In [24], the authors presented a class of hybrid conjugate parameters, one of which is β k + 1 = max { 0 , min { β k + 1 D Y , β k + 1 H S } } , with the corresponding interval for l k + 1 being [ ( 1 σ 2 ) / ( 1 + σ 2 ) , 1 ] . It is reasonable that the interval in our paper is smaller since we take β k + 1 M H S instead of β k + 1 H S and 0 < ϑ k 1 .
In the following, we discuss the global convergence of Algorithm 1 for general nonlinear functions in the case of l k + 1 [ 1 σ 2 1 + σ 2 , 1 σ 2 ] . Motivated by the modified secant conditions in [36,37], in this part, based on the Wolfe line search strategy (13), we consider the following settings:
y ¯ k = y k + m s k ,
where m > 0 is a constant. With the above setting, the modified conjugate parameter becomes
β k + 1 N N = max 0 , min β k + 1 N D Y , β k + 1 N M H S ,
where β k + 1 N D Y and β k + 1 N M H S are, respectively,
β k + 1 N D Y = g k + 1 2 y ¯ k T s k , β k + 1 N M H S = g k + 1 T y ¯ k y ¯ k T s k 1 ( g k + 1 T s k ) 2 g k + 1 2 s k 2 .
Meanwhile, the corresponding direction turns to
d k + 1 N = 1 + β k + 1 N N g k + 1 T s k g k + 1 2 g k + 1 + β k + 1 N N s k ,
The following lemma indicates the property of the scalar y ¯ k T s k s k and y ¯ k .
Lemma 2.
Let y ¯ k be defined by (18); then, adopting the Wolfe line search strategy (13), we obtain
y ¯ k T s k m s k 2 ,
and
y ¯ k L 1 + m s k .
Proof. 
By the Wolfe line search strategy (13), we have
y k T s k = g k + 1 g k T s k ( σ 2 1 ) g k T s k ( 1 σ 2 ) α k g k 2 0 ,
which indicates y k T s k 0 . Therefore, it holds that
y ¯ k T s k = y k T s k + m s k 2 m s k 2 .
Hence, (21) holds. Meanwhile, we also have
y ¯ k = y k + m s k y k + m s k L 1 + m s k ,
where the second inequality holds by Assumption 2. Hence, (22) holds. This completes the proof. □
In the following, we assume that Algorithm 1 never stops and there exists a constant μ > 0 such that for all k, (16) holds.
Lemma 3.
Suppose that Assumptions 1 and 2 and (16) hold. The sequences { x k } and { d k N } are generated by Algorithm 1 with the conjugate parameter β k N N and adopting the Wolfe line search technique (13). Then, there exists a positive constant Γ such that
g k d k N ( 1 + 2 Γ ) g k .
Proof. 
Based on the Wolfe line search technique (13), it holds that
y ¯ k T d k N = y k T d k N + m α k d k N 2 y k T d k N = ( 1 σ 2 ) g k T d k N = ( 1 σ 2 ) g k 2 ,
where the first inequality holds by the non-negativity of α k and the last inequality holds by the sufficient descent property (9). Meanwhile, by (9) and the Cauchy–Schwartz inequality, it holds that, for k 0 ,
g k d k N g k T d k N = g k 2 ,
which implies that from condition (16),
d k N g k μ , k 0 .
By the definition of β k + 1 N N , we obtain that
| β k + 1 N N | max g k + 1 2 | y ¯ k T s k | , | g k + 1 T y ¯ k | | y ¯ k T s k | max g k + 1 2 m s k 2 , g k + 1 y ¯ k m s k 2 g k + 1 s k max g k + 1 m s k , L 1 + m m g k + 1 s k max M m α ¯ d k N , L 1 + m m g k + 1 s k max M m α ¯ g k , L 1 + m m g k + 1 s k max M m α ¯ μ , L 1 + m m : = Γ g k + 1 s k ,
where the second inequality holds by (21), the third inequality holds by (22), the fourth inequality holds by the condition α k α > 0 for all k 0 , the fifth inequality holds by (25), and the last inequality holds by the condition (16). Furthermore, we have
β k + 1 N N g k + 1 T s k g k + 1 2 | β k + 1 N N | g k + 1 s k g k + 1 2 Γ g k + 1 s k s k g k + 1 = Γ .
By the definition of d k N in (20) and the above discussions, it holds that
d k + 1 N g k + 1 + β k + 1 N N g k + 1 T s k g k + 1 2 g k + 1 + | k + 1 N N | s k g k + 1 + Γ g k + 1 + Γ g k + 1 s k s k = ( 1 + 2 Γ ) g k + 1 .
With the help of (25), we conclude that
g k + 1 d k + 1 N ( 1 + 2 Γ ) g k + 1 .
Hence, (23) holds. This completes the proof. □
Theorem 3.
Suppose that Assumptions 1 and 2 hold. The sequences { x k } and { d k N } are generated by Algorithm 1 with the conjugate parameter β k N N and the Wolfe line search technique (13) is adopted. Then, Algorithm 1 converges in the sense of (15).
Proof. 
We prove the conclusion by contradiction and assume that there exists a positive constant μ such that (16) holds. Otherwise, Algorithm 1 converges in the sense of (15). From (9), we conclude that the new direction enjoys the sufficient descent property. Therefore, Lemma 1 holds, which implies that
+ = k = 1 + μ 2 ( 1 + 2 Γ ) 2 k = 1 + g k 2 ( 1 + 2 Γ ) 2 = k = 1 + g k 4 ( 1 + 2 Γ ) 2 g k 2 k = 1 + ( g k T d k N ) 2 d k N 2 < + ,
where the first inequality holds by (16), the second inequality holds by (9) and (23), and the last inequality holds by Lemma 1. However, that is a contradiction and the assumption does not hold. So, the lim inf k + g k = 0 holds. This completes the proof. □

4. Numerical Performance

In this section, we focus on the numerical performance of Algorithm 1 and compare it with several effective CG methods. In the experiment, we code these algorithms in Matlab 2016b and perform them on a PC computer, whose processor has AMD 2.10 GHz, RAM of 16.00 GB and the Windows 10 operating system.

4.1. Performance on Benchmark Problems

In this subsection, we check the performance of the NMHSDY method and compare it with two effective modified HS methods in [26,28] and the hybrid method in [24]. In [26], the authors proposed an effective modified HS method (MHSCG method for abbreviation) in which the conjugate parameter is
β k + 1 = max { 0 , β ¯ k + 1 M H S } , β ¯ k + 1 M H S = β k + 1 M H S λ y k ϑ k y k T d k 2 g k + 1 T d k ,
where λ > 1 / 4 is a parameter. The direction in [26] is in the form of (3) and the corresponding method has attractive numerical performance. Dai and Kou in [28] introduced another effective class of CG schemes (DK+ method for abbreviation) depending on the parameter τ k , where the corresponding conjugate parameter β k + 1 is defined by
β k + 1 D K ( τ k ) = β k + 1 H S τ k + y k 2 y k T s k y k T s k s k 2 g k + 1 T s k y k T d k , s k = x k + 1 x k .
The direction in [28] is also in the form of (3). To establish global convergence for general nonlinear functions, a truncated strategy is used, that is,
β k + 1 D K + ( τ k ) = max β k + 1 D K ( τ k ) , η g k + 1 T d k d k 2 ,
where η [ 0 , 1 ) is a parameter. The numerical results indicated the DK+ method has good and reliable numerical performance. Dai and Yuan, in [24], proposed an effective hybrid CG method (HSDY method for abbreviation) in which the conjugate parameter is
β k + 1 = max { 0 , min { β k + 1 D Y , β k + 1 H S } } .
The hybrid method also has global convergence and attractive numerical performance.
In the following, we focus on the numerical performance and the large-scale unconstrained problems in Table 1 (see [38] for details). In order to improve numerical performance, Andrei, in [39], proposed an accelerated strategy which modified the step in a multiplicative manner. In this part, we also utilize this strategy and regard Algorithm 1 with the accelerated strategy as Algorithm 1. To compare the conjugate parameters and the search directions fairly, here we adopt the Wolfe line search technique (13) for all methods.
In the experiment, for each problem we consider nine large-scale dimensions with 300, 600, 900, 3000, 6000, 9000, 30,000, 60,000 and 90,000 variables. The parameters used in the Wolfe line search are σ 1 = 0.20 and σ 2 = 0.85 . The other parameters for the MHSCG method and the DK+ method are as default.
During the progress, the Himmeblau stopping rule is adopted: if | f ( x k ) | > ε 1 , let s t o p 1 = | f ( x k ) f ( x k + 1 ) | | f ( x k ) | , otherwise, s t o p 1 = | f ( x k ) f ( x k 1 ) | . If the conditions g k ε or s t o p 1 ε 2 are satisfied, then the progress is stopped, where the values of parameters ε , ε 1 , and ε 2 are ε = 10 6 , and ε 1 = ε 2 = 10 5 . Meanwhile, we also stop the algorithm when the number of iterations is greater than 5000.
In order to present the performances of methods more intuitively, the tool in [40] is adopted to analyze the profiles of these methods. Robustness and efficiency rates are readable on the right and left vertical axes of the corresponding performance profiles, respectively. To present a detailed numerical comparison, two different scales have been considered for the τ -axis. One is τ [ 1 , 1.5 ] , which shows what happens for the values of τ near to 1. The other is used to present the trend for large values of τ . In Figure 1, Figure 2 and Figure 3, we, respectively, show the performance of these methods relative to the number of iterations ( N I ), the number of function-gradient valuations ( N F G ; which is the sum of the number of function valuations and gradient valuations), and the CPU time consumed in seconds.
From Figure 1, Figure 2 and Figure 3, we have that Algorithm 1 is comparable and a little more effective than the HSDY method, the DK+ method, and the MHSCG method for the above problems. Meanwhile, Algorithm 1, with the accelerated strategy, is much effective and performs best in the experiment, which indicates that the accelerated technique indeed works and reduces the number of iterations and the number of function and gradient evaluations.

4.2. Comparison for Stability

In this subsection, we consider the numerical stability of Algorithm 1 for the ill-conditioned matrix problems and compare it with the MHSCG method in [26]. In fact, the quadratic objective function f ( x ) = x T I x of (1) is ill-conditioned if matrix I R n × n is in the form
I i , j = 1 i + j 1 , i , j = 1 , 2 , , n .
It is clear that the matrix I is ill-conditioned and positive definite [41], with different dimensions n = 5 , 6 , , 50 . Furthermore, the authors in [42] show that the norm condition number of the Hessian matrix I gradually increases from 9.4366 × 10 5 for n = 5 to 6.9007 × 10 20 for n = 50 . In the following, we explore the numerical performance. The experimental environment, the parameter values, and the stop rule remain the same as in the above subsection. Meanwhile, the initial point is selected as x 0 = ( 10 , 10 , , 10 ) . The corresponding numerical results are presented in Table 2 and Table 3, in which D i m is the dimension of x, N I means the number of iterations, N F G is the sum of the number of function and gradient evaluations, T i m e means the CPU time consumed in seconds, and f * denotes the optimal function obtained by the methods.
From Table 2 and Table 3, it can be found that for the dimensions from 5 to 50, Algorithm 1 and the MHSCG method successfully solve all of them and obtain reasonable optimal function values, which are all not greater than 10 5 . For most problems, Algorithm 1 needed fewer iterations and function and gradient evaluations and obtained better optimal values. In order to show numerical performance intuitively, here we also adopt the performance profiles in [40] for the NI and NFG cases. The corresponding performance profiles are given in Figure 4 and Figure 5.
Figure 4 shows that Algorithm 1 and the MHSCG method solve these testing problems with the least total number of iterations in 63% and 48% of cases, respectively. Figure 5 indicates that Algorithm 1 and the MHSCG method solve these testing problems with the least total number of function and gradient evaluations in 61% and 50% of cases, respectively. All in all, the numerical results show that Algorithm 1 is more effective and stable than the MHSCG method for these ill-conditioned matrix problems.

4.3. Application to Image Restoration

In this subsection, we apply Algorithm 1 to some image restoration problems [43,44,45]. During the process, the normal Wolfe line search technique is adopted, the corresponding parameters remain unchanged, and two noise level cases for the Barbara.png ( 512 × 512 ) and Baboon.bmp ( 512 × 512 ) images are considered. In this part, we stop the process when the following criteria are both satisfied:
| f ( x k + 1 ) f ( x k ) | | f ( x k ) | < 10 3 , g ( x k ) < 10 3 ( 1 + | f ( x k ) | ) .
Meanwhile, to assess the restoration performance qualitatively, we also utilize the peak signal to noise ratio [45] (PSNR), which is defined by P S N R = 10 log 10 255 2 1 M × N i , j ( u i , j r u i , j * ) 2 , where M and N are the true image pixels, and u i , j r and u i , j * denote the pixel values of the restored image and the original image, respectively. For the noise level, we consider two cases: 20 % (a low-level case) and 60 % (a high-level case). The consumed CPU time and the corresponding PSNR values are given in Table 4. Meanwhile, the detailed performances are presented in Figure 6 and Figure 7, respectively.
From Table 4 and Figure 6 and Figure 7, we have that Algorithm 1 and the MHSCG method can all solve the image restoration problems successfully within a suitable time, and Algorithm 1 seems to perform a little better than the MHSCG method.

5. Conclusions

Conjugate gradient methods are attractive and effective for large-scale unconstrained optimization smooth problems due to their simple computation and low memory requirements. The Dai–Yuan conjugate gradient method has good theoretical properties and generates a descent direction in each iteration. Whereas, the Hestenes–Stiefel conjugate gradient method automatically satisfies the conjugate condition y k T d k + 1 = 0 without any line search technique and performs well in practice. By the above discussions, we propose a new descent hybrid conjugate gradient method. The proposed method has a sufficient descent property independent of any line search technique. Under some mild conditions, the proposed method is globally convergent. In the experiments, we first consider 61 unconstrained problems with 9 different dimensions up to 90 , 000 . Thereafter, 46 ill-conditioned matrix problems are also tested. The primary numerical results show that the proposed method is more effective and stable. Finally, we apply the hybrid method to some image restoration problems. The results indicate our method is attractive and reliable.

Author Contributions

Conceptualization, S.W., X.W., Y.T. and L.P.; methodology, S.W. and X.W.; software, X.W.; validation, X.W., L.P. and Y.T.; formal analysis, X.W., Y.T. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Science Foundation of Zhejiang Sci-Tech University (ZSTU) under Grant No. 21062347-Y.

Data Availability Statement

All data included in this study are available upon reasonable request.

Conflicts of Interest

The authors declare no competing interests.

References

  1. Li, D.; Fukushima, M. A global and superlinear convergent Gauss-Newton-based BFGS method for symmetric nonlinear equations. SIAM J. Numer. Anal. 1999, 37, 152–172. [Google Scholar] [CrossRef]
  2. Yuan, G.; Wei, Z.; Wang, Z. Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization. Comput. Optim. Appl. 2013, 54, 45–64. [Google Scholar] [CrossRef]
  3. Dai, Y.; Han, J.; Liu, G.; Sun, D.; Yin, H.; Yuan, Y. Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 2000, 10, 345–358. [Google Scholar] [CrossRef]
  4. Hager, W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef]
  5. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef]
  6. Dai, Y.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef]
  7. Fletcher, R. Practical Methods of Optimization; Unconstrained Optimization; John Wiley & Sons: New York, NY, UAS, 1987; Volume 1. [Google Scholar]
  8. Hestenes, R.; Stiefel, L. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  9. Polyak, B.T. The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
  10. Polak, E.; Ribière, G. Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. Informat Rech. Opér. 1969, 16, 35–43. [Google Scholar] [CrossRef]
  11. Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms Part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
  12. Xiang, Y.; Gong, X.G. Efficiency of generalized simulated annealing. Phys. Rev. E 2000, 62, 4473–4476. [Google Scholar] [CrossRef] [PubMed]
  13. Yuan, Q.; Qian, F. A hybrid genetic algorithm for twice continuously differentiable NLP problems. Comput. Chem. Eng. 2010, 34, 36–41. [Google Scholar] [CrossRef]
  14. Gao, F.C.; Han, L.X. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Comput. Optim. Appl. 2012, 51, 259–277. [Google Scholar] [CrossRef]
  15. Yuan, G.; Wang, X.; Sheng, Z. The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl. 2020, 186, 590–619. [Google Scholar] [CrossRef]
  16. Yuan, G.; Wang, X.; Sheng, Z. Family weak conjugate gradient algorithms and their convergence analysis for nonconvex functions. Numer. Algorithms 2020, 84, 935–956. [Google Scholar] [CrossRef]
  17. Mousavi, A.; Esmaeilpour, M.; Sheikhahmadi, A. A new family of Polak-Ribière-Polyak conjugate gradient method for impulse noise removal. Soft Comput. 2023, 27, 17515–17524. [Google Scholar] [CrossRef]
  18. Polyak, B.T. Introduction to Optimization; Optimization Software Inc., Publications Division: New York, NY, USA, 1987. [Google Scholar]
  19. Wang, X.; Yuan, G.; Pang, L. A class of new three-term descent conjugate gradient algorithms for large-scale unconstrained optimization and applications to image restoration problems. Numer. Algorithms 2023, 93, 949–970. [Google Scholar] [CrossRef]
  20. Wang, X. A class of spectral three-term descent Hestenes-Stiefel conjugate gradient algorithms for large-scale unconstrained optimization and image restoration problems. Appl. Numer. Math. 2023, 192, 41–56. [Google Scholar] [CrossRef]
  21. Touati-Ahmed, D.; Storey, C. Efficient hybrid conjugate gradient techniques. J. Optim.Theory Appl. 1990, 64, 379–397. [Google Scholar] [CrossRef]
  22. Zhang, L.; Zhou, W. Two descent hybrid conjugate gradient methods for optimization. J. Comput. Appl. Math. 2008, 216, 251–264. [Google Scholar] [CrossRef]
  23. Gilbert, J.; Nocedal, J. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 1992, 2, 21–42. [Google Scholar] [CrossRef]
  24. Dai, Y.; Yuan, Y. An efficient hybrid conjugate gradient method for unconstrained optimization. Ann. Oper. Res. 2001, 103, 33–47. [Google Scholar] [CrossRef]
  25. Jiang, X.; Liao, W.; Yin, J.; Jian, J. A new family of hybrid three-term conjugate gradient methods with applications in image restoration. Numer. Algorithms 2022, 91, 161–191. [Google Scholar] [CrossRef]
  26. Amini, K.; Faramarzi, P.; Pirfalah, N. A modified Hestenes-Stiefel conjugate gradient method with an optimal property. Optim. Methods Softw. 2019, 34, 770–782. [Google Scholar] [CrossRef]
  27. Narushima, Y.; Yabe, H.; Ford, J. A three-term conjugate gradient method with sufficient descent property for unconstrained optimization. SIAM J. Optim. 2011, 21, 212–230. [Google Scholar] [CrossRef]
  28. Dai, Y.; Kou, C. A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search. SIAM J. Optim. 2013, 23, 296–320. [Google Scholar] [CrossRef]
  29. Woldu, T.; Zhang, H.; Zhang, X.; Yemane, H. A modified nonlinear conjugate gradient algorithm for large-scale nonsmooth convex optimization. J. Optim. Theory Appl. 2020, 185, 223–238. [Google Scholar] [CrossRef]
  30. Yuan, G.; Meng, Z.; Li, Y. A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 2016, 168, 129–152. [Google Scholar] [CrossRef]
  31. Babaie-Kafaki, S.; Fatemi, M.; Mahdavi-Amiri, N. Two effective hybrid conjugate gradient algorithms based on modified BFGS updates. Numer. Algorithms 2011, 58, 315–331. [Google Scholar] [CrossRef]
  32. Livieris, I.; Tampakas, V.; Pintelas, P. A descent hybrid conjugate gradient method based on the memoryless BFGS update. Numer. Algorithms 2018, 79, 1169–1185. [Google Scholar] [CrossRef]
  33. Khoshgam, Z.; Ashrafi, A. A new hybrid conjugate gradient method for large-scale unconstrained optimization problem with non-convex objective function. Comp. Appl. Math. 2019, 38, 186. [Google Scholar] [CrossRef]
  34. Narayanan, S.; Kaelo, P. A linear hybridization of Dai-Yuan and Hestenes-Stiefel conjugate gradient method for unconstrained optimization. Numer.-Math.-Theory Methods Appl. 2021, 14, 527–539. [Google Scholar]
  35. Zoutendijk, G. Nonlinear Programming, Computational Methods; Integer & Nonlinear Programming: Amsterdam, The Netherlands, 1970; pp. 37–86. [Google Scholar]
  36. Li, D.; Fukushima, M. A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math. 2001, 129, 15–35. [Google Scholar] [CrossRef]
  37. Babaie-Kafaki, S.; Ghanbari, R. A modified scaled conjugate gradient method with global convergence for nonconvex functions. B Bull. Belg. Math. Soc. Simon Stevin 2014, 21, 465–477. [Google Scholar] [CrossRef]
  38. Andrei, N. An unconstrained optimization test functions collection. Environ. Ence Technol. 2008, 10, 6552–6558. [Google Scholar]
  39. Andrei, N. An acceleration of gradient descent algorithm with backtracking for unconstrained optimization. Numer. Algorithms 2006, 42, 63–73. [Google Scholar] [CrossRef]
  40. Dolan, E.; Moré, J. Benchmarking optimization software with performance profiles. Math. Program 2002, 91, 201–213. [Google Scholar] [CrossRef]
  41. Watkins, S. Fundamentals of Matrix Computations; John Wiley and Sons: New York, NY, USA, 2002. [Google Scholar]
  42. Babaie-Kafaki, S. A hybrid scaling parameter for the scaled memoryless BFGS method based on the matrix norm. Int. J. Comput. Math. 2019, 96, 1595–1602. [Google Scholar] [CrossRef]
  43. Yu, G.; Huang, J.; Zhou, Y. A descent spectral conjugate gradient method for impulse noise removal. Appl. Math. Lett. 2010, 23, 555–560. [Google Scholar] [CrossRef]
  44. Yuan, G.; Lu, J.; Wang, Z. The PRP conjugate gradient algorithm with a modified WWP line search and its application in the image restoration problems. Appl. Numer. Math. 2020, 152, 1–11. [Google Scholar] [CrossRef]
  45. Bovik, A. Handbook of Image and Video Processing; Academic: New York, NY, USA, 2000. [Google Scholar]
Figure 1. Performance profiles of the methods in the number of iterations case.
Figure 1. Performance profiles of the methods in the number of iterations case.
Mathematics 12 03088 g001
Figure 2. Performance profiles of the methods in the function and gradient case.
Figure 2. Performance profiles of the methods in the function and gradient case.
Mathematics 12 03088 g002
Figure 3. Performance profiles of the methods in the CPU time consumed case.
Figure 3. Performance profiles of the methods in the CPU time consumed case.
Mathematics 12 03088 g003
Figure 4. Performance profiles of Algorithm 1 and the MHSCG method in NI case.
Figure 4. Performance profiles of Algorithm 1 and the MHSCG method in NI case.
Mathematics 12 03088 g004
Figure 5. Performance profiles of Algorithm 1 and the MHSCG method in NFG case.
Figure 5. Performance profiles of Algorithm 1 and the MHSCG method in NFG case.
Mathematics 12 03088 g005
Figure 6. The noisy Barbara image, corrupted by salt-and pepper noise (the first column); the images restored via Algorithm 1 (the second column), and via the MHSCG method (the third column).
Figure 6. The noisy Barbara image, corrupted by salt-and pepper noise (the first column); the images restored via Algorithm 1 (the second column), and via the MHSCG method (the third column).
Mathematics 12 03088 g006
Figure 7. The noisy Baboon image, corrupted by salt-and pepper noise (the first column); the images restored via Algorithm 1 (the second column), and via the MHSCG method (the third column).
Figure 7. The noisy Baboon image, corrupted by salt-and pepper noise (the first column); the images restored via Algorithm 1 (the second column), and via the MHSCG method (the third column).
Mathematics 12 03088 g007
Table 1. The test problems.
Table 1. The test problems.
No.ProblemNo.Problem
1Extended Freudenstein and Roth Function32ARWHEAD (CUTE)
2Extended Trigonometric Function33NONDIA (Shanno-78) (CUTE)
3Extended Rosenbrock Function34DQDRTIC (CUTE)
4Extended Beale Function35EG2 (CUTE)
5Extended Penalty Function36DIXMAANA (CUTE)
6Perturbed Quadratic Function37DIXMAANB (CUTE)
7Raydan 1 Function38DIXMAANC (CUTE)
8Raydan 2 Function39DIXMAANE (CUTE)
9Diagonal 3 Function40Broyden Tridiagonal
10Generalized Tridiagonal-1 Function41Almost Perturbed Quadratic
11Extended Tridiagonal-1 Function42Tridiagonal Perturbed Quadratic
12Extended Three Exponential Terms43EDENSCH Function (CUTE)
13Generalized Tridiagonal-244VARDIM Function (CUTE)
14Diagonal 4 Function45LIARWHD (CUTE)
15Diagonal 5 Function46DIAGONAL 6
16Extended Himmelblau Function47DIXMAANF (CUTE)
17Generalized PSC1 Function48DIXMAANG (CUTE)
18Extended PSC1 Function49DIXMAANH (CUTE)
19Extended Powell Function50DIXMAANI (CUTE)
20Extended Cliff Function51DIXMAANJ (CUTE)
21Quadratic Diagonal Perturbed Function52DIXMAANK (CUTE)
22Extended Wood Function53DIXMAANL (CUTE)
23Extended Hiebert Function54DIXMAAND (CUTE)
24Quadratic Function QF155ENGVAL1 (CUTE)
25Extended Quadratic Penalty QP1 Function56COSINE (CUTE)
26Extended Quadratic Penalty QP2 Function57Extended DENSCHNB (CUTE)
27A Quadratic Function QF2 Function58Extended DENSCHNF (CUTE)
28Extended EP1 Function59SINQUAD (CUTE)
29Extended Tridiagonal-2 Function60Scaled Quadratic SQ1
30BDQRTIC (CUTE)61Scaled Quadratic SQ2
31TRIDIA (CUTE)
Table 2. Numerical results of the MHSCG method and Algorithm 1 in 5–40 dimensions.
Table 2. Numerical results of the MHSCG method and Algorithm 1 in 5–40 dimensions.
MHSCG MethodAlgorithm 1
Dim NI NFG Time f * NI NFG Time f *
55220.0000000.0000005220.0000000.000000
613480.0312500.00000013480.0000000.000000
721720.0000000.00000021720.0000000.000000
821720.0000000.00000021720.0156250.000000
927920.0000000.00000624820.0000000.000000
10411430.0000000.00000729970.0000000.000000
11752460.0000000.000009551920.0000000.000000
12582030.0000000.000003702490.0000000.000008
13571900.0000000.000004702460.0000000.000000
14862760.0312500.0000081234290.0000000.000000
15752640.0312500.000009893140.0000000.000009
16983300.0000000.000008521770.0000000.000001
1724820.0000000.000002662320.0000000.000002
18652150.0312500.00000928970.0000000.000000
19311050.0000000.00000727940.0000000.000000
201113730.0312500.0000091314520.0000000.000010
211173790.0312500.000008311080.0000000.000000
22963330.0000000.000010662360.0000000.000000
23682330.0000000.000009853030.0312500.000000
24501750.0312500.000000431510.0000000.000000
25963180.0000000.0000091033770.0000000.000000
261133700.0000000.0000091124100.0000000.000004
2722780.0000000.0000001294730.0312500.000001
281043720.0000000.00000924870.0000000.000000
291565130.0000000.0000101304490.0000000.000001
30541850.0000000.0000011214460.0000000.000007
31391440.0000000.000003973410.0000000.000008
3226970.0000000.000001792920.0312500.000001
33873090.0000000.000010662340.0312500.000002
34582130.0000000.000006602170.0312500.000000
35973270.0000000.00000925950.0000000.000002
3624860.0000000.0000001324810.0000000.000003
3723840.0000000.000004551920.0000000.000001
38933120.0468750.000010401420.0000000.000003
391484870.0468750.0000051244420.1406250.000010
40662310.0000000.0000071364960.0312500.000007
Table 3. Numerical results of the MHSCG method and Algorithm 1 in 41–50 dimensions.
Table 3. Numerical results of the MHSCG method and Algorithm 1 in 41–50 dimensions.
MHSCG MethodAlgorithm 1
Dim NI NFG Time f * NI NFG Time f *
41762670.0312500.000005652390.0000000.000005
421304380.0468750.000008853020.0000000.000010
431645460.0312500.0000091364880.0000000.000009
441845940.0312500.000007291030.0000000.000006
4517640.0000000.00000017640.0000000.000000
461555190.0312500.000010622220.0000000.000000
471234210.0312500.0000101073790.0312500.000010
4820740.0000000.000000682500.0156250.000010
491354730.0312500.0000101023750.0312500.000010
501785940.0625000.0000101515420.0312500.000009
Table 4. Test results for Algorithm 1 and the MHSCG method.
Table 4. Test results for Algorithm 1 and the MHSCG method.
Algorithm 1MHSCG Method
Image Noise Level PSNR CPU Time PSNR CPU Time
Barbara 20 % 29.66382.26562529.58312.578125
Baboon 20 % 27.92232.60937527.84553.250000
Barbara 60 % 23.12563.59375023.11033.765625
Baboon 60 % 21.18363.48437521.15823.671875
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Wang, X.; Tian, Y.; Pang, L. A New Hybrid Descent Algorithm for Large-Scale Nonconvex Optimization and Application to Some Image Restoration Problems. Mathematics 2024, 12, 3088. https://doi.org/10.3390/math12193088

AMA Style

Wang S, Wang X, Tian Y, Pang L. A New Hybrid Descent Algorithm for Large-Scale Nonconvex Optimization and Application to Some Image Restoration Problems. Mathematics. 2024; 12(19):3088. https://doi.org/10.3390/math12193088

Chicago/Turabian Style

Wang, Shuai, Xiaoliang Wang, Yuzhu Tian, and Liping Pang. 2024. "A New Hybrid Descent Algorithm for Large-Scale Nonconvex Optimization and Application to Some Image Restoration Problems" Mathematics 12, no. 19: 3088. https://doi.org/10.3390/math12193088

APA Style

Wang, S., Wang, X., Tian, Y., & Pang, L. (2024). A New Hybrid Descent Algorithm for Large-Scale Nonconvex Optimization and Application to Some Image Restoration Problems. Mathematics, 12(19), 3088. https://doi.org/10.3390/math12193088

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop