Next Article in Journal
Risk Assessment for Failure Mode and Effects Analysis Using the Bonferroni Mean and TODIM Method
Next Article in Special Issue
Extended Mizoguchi-Takahashi Type Fixed Point Theorems and Their Application
Previous Article in Journal
On the Inverse Ultrahyperbolic Klein-Gordon Kernel
Previous Article in Special Issue
Approximation of Fixed Points for Suzuki’s Generalized Non-Expansive Mappings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters

College of Science, Civil Aviation University of China, Tianjin 300300, China
*
Author to whom correspondence should be addressed.
Mathematics 2019, 7(6), 535; https://doi.org/10.3390/math7060535
Submission received: 17 April 2019 / Revised: 8 June 2019 / Accepted: 9 June 2019 / Published: 12 June 2019
(This article belongs to the Special Issue Fixed Point, Optimization, and Applications)

Abstract

:
In this paper, a multi-parameter proximal scaled gradient algorithm with outer perturbations is presented in real Hilbert space. The strong convergence of the generated sequence is proved. The bounded perturbation resilience and the superiorized version of the original algorithm are also discussed. The validity and the comparison with the use or not of superiorization of the proposed algorithms were illustrated by solving the l 1 l 2 problem.

1. Introduction

The superiorization method, which was introduced by Censor in 2010 [1], can solve a broad class of nonlinear constrained optimal problems that result from many practical problems such as computed tomography [2], medical image recovery [3,4], convex feasibility problems [5,6], inverse problems of radiation therapy [7] and so on, which generates an automatic procedure based on the fact that the basic algorithm has the property of bounded perturbation resilience so that it is expected to get lower values of the objective function. In recent years, some researchers have focused on finding more applications of superiorization methodology while some other researchers have investigated the bounded perturbation resilience of algorithms—see, for examples, [8,9,10,11,12,13,14,15,16,17].
In this paper, we study the bounded perturbation resilience property and the corresponding superiorization of a proximal scaled gradient algorithm with multi-parameters for solving the following non-smooth composite optimization problem of the form
min x H [ f ( x ) + g ( x ) ] = : min x H Φ ( x ) ,
where H is a real Hilbert space endowed with an inner product < · , · > and the induced norm · . f , g Γ 0 ( H ) with Γ 0 ( H ) defined by
Γ 0 ( H ) : = { f : H ( , + ] | f is   proper   lower   semicontinuous   convex } .
In addition, f has L-Lipschitz continuous gredient f on H with L > 0 .
The proximal gradient method is one of the popular iterative methods used for solving problem (1), which has received a lot of attention in the recent past due to its fast theoretical convergence rates and strong practical performance. Given an initial value x 0 H , the proximal gradient method generates the following sequence { x n } :
x n + 1 = p r o x γ g ( I γ f ) ( x n ) , n 0 ,
where γ > 0 is the step size and p r o x γ g is the proximal operator of g of order γ (please refer to Definition 2, Section 2). Then, the generated sequence { x n } converges weakly to a solution of problem (1) if the solution set S : = A r g m i n [ f ( x ) + g ( x ) ] and 0 < γ < 2 L (see, for instance, [18], Theorem 25.8).
Xu [19] raised the following more general proximal gradient algorithm:
x n + 1 = p r o x γ n g ( I γ n f ) ( x n ) .
The weak convergence of the generated sequence { x n } was obtained. If dim H = , the strong convergence can not be guaranteed.
The scaled method was proposed by Strand [20] for increasing the rate of convergence of some algorithm. In a finite dimensional space, the selection of scaling matrices depends on the particular problem [21,22]. Jin, Censor and Jiang [13] introduced the following projected scaled gradient (PSG) algorithm:
x n + 1 = P C ( I γ n D f ) ( x n ) , n 0 ,
where D ( x n ) is a diagonal scaling matrix for each x n , P C : R n C ( R n ) is defined as P C ( x ) = inf { x y , y C } for solving the following convex minimization problems:
minimize J ( x ) subject   to x C ,
where C R n is a nonempty, closed and convex set, the objective function J : C R is convex. With the assumption that
n = 0 f ( x n ) D ( x n ) f ( x n ) <
and other conditions, the convergence of the PSG method in the presence of bounded perturbations was proved.
Motivated by [13], Guo, Cui and Guo [23] discussed the proximal gradient algorithm with perturbations:
x n + 1 = p r o x λ n g ( I λ n D f + e ) x n .
They proved that the generated sequence { x n } converges weakly to the solution of problem (1). After that, Guo and Cui [15] applied the convex combination of contraction operator and proximal gradient operator to obtain the strong convergence of the generated sequence and discussed the bounded perturbation resilience of the exact algorithm.
In this paper, we will study the following proximal scaled gradient algorithm with multi-parameters:
x n + 1 : = t n h ( x n ) + γ n x n + λ n p r o x α n g ( x n α n D ( x n ) f ( x n ) ) + e n , n 0 ,
which is a further generalization of the above algorithms. We will discuss the strong convergence of (8) and the bounded perturbation resilience of its exact algorithm just like the algorithms named above. In addition, we also study the superiorized version of the exact algorithm of (8).
The rest is organized as follows. In the next section, we introduce some basic concepts and lemmas. In Section 3, we discuss the strong convergence results of the exact and non-exact algorithms. In Section 4, we provide two numerical examples for illustrating the performances of the iterations. Finally, we summarize the main points of this paper in Section 5.

2. Preliminaries

Let H be a real Hilbert space endowed by an inner product < · , · > and the induced norm · . Let { x n } be a sequence in H. z H is said to be a weak cluster point of { x n } if there exists a subsequence { x n j } of { x n } that converges weakly to it. The set of all weak cluster points of { x n } is denoted by ω w ( x n ) . Let T : H H be a nonlinear operator. Set F i x ( T ) : = { x H : T x = x } .
The following definitions are needed in proving our main results.
Definition 1.
([24], Proposition 2.1)
(i)
T is non-expansive if
T x T y x y , x , y H .
(ii)
T is L-Lipschitz continuous with L 0 , if
T x T y L x y , x , y H .
We call T a contractive mapping if 0 L < 1 .
(iii)
T is firmly non-expansive if
T x T y 2 x y 2 ( I T ) x ( I T ) y 2 .
(iv)
T is α-averaged if there exists a non-expansive operator S : H H and α ( 0 , 1 ) , such that
T = ( 1 α ) I + α S .
In particular, a firmly non-expansive mapping is 1 2 -averaged.
(v)
T is v-inverse strongly monotone (v-ism) with v > 0 , if
T x T y , x y v T x T y 2 , x , y H .
Definition 2.
([25], Proximal Operator) Let g Γ 0 ( H ) . The proximal operator of g is defined by
p r o x g ( x ) : = arg min y H { y x 2 2 + g ( y ) } , x H .
The above definition is well defined since y x 2 2 + g ( y ) has only one minimizer on H for each x H and for given g Γ 0 ( H ) (see [18], Proposition 12.15).
The proximal operator of g of order α > 0 is defined as
p r o x α g ( x ) : = arg min y H { y x 2 2 α + g ( y ) } , x H .
The following Lemmas 1–3 describe the properties of proximal operators.
Lemma 1.
([19,26], Lemma 2.4, Lemma 3.3) Let g Γ 0 ( H ) , and α > 0 , μ > 0 . Then,
p r o x α g ( x ) = p r o x μ g ( μ α x + ( 1 μ α ) p r o x α g x ) .
Moreover, if α < μ , we also have
x p r o x α g ( I α f ) x 2 x p r o x μ g ( I μ f ) x , x H .
Lemma 2.
[18] (Non-expansiveness) Let g Γ 0 ( H ) and α > 0 . Then, the proximal operator p r o x α g is 1 2 -averaged. We obtain the non-expansiveness of the proximal operator
p r o x α g ( x ) p r o x α g ( y ) x y , x , y H .
Lemma 3.
([19], Propostion 3.2) Let f , g Γ 0 ( H ) , f be differentiable and z H , α > 0 . Then, z is a solution to (1) if and only if z is the fixed point of the following equation:
z = p r o x α g ( I α f ) z .
The following lemmas play important roles in proving the strong convergence result.
Lemma 4.
([18], Corollary 4.18) Let T : H H be a non-expansive mapping with F i x ( T ) . If { x n } is a sequence in H converging weakly to x and if { ( I T ) x n } converges strongly to 0, then ( I T ) x = 0 .
Lemma 5.
([27], Lemma 2.5) Assume that { s n } is a sequence of nonnegative real numbers satisfying
s n + 1 ( 1 γ n ) s n + γ n δ n + β n , n 0 ,
where { γ n } [ 0 , 1 ] , { δ n } R such that
(i)
n = 0 γ n = ;
(ii)
lim sup n δ n 0 ;
(iii)
n = 0 β n < .
Then, lim n s n = 0 .
Lemma 6.
([28], Lemma 2.4) Let x , y H and α , β R . Then,
(i)
x + y 2 x 2 + 2 y , x + y ;
(ii)
α x + β y 2 = α ( α + β ) x 2 + β ( α + β ) y 2 α β x y 2 ;
(iii)
α x + ( 1 α ) y 2 = α x 2 + ( 1 α ) y 2 α ( 1 α ) x y 2 .
Lemma 7.
([23], Proposition 3.3) Let f , g Γ 0 ( H ) . For any 0 < α < 2 L , where L is the Lipschitz constant of f . p r o x α g ( I α f ) is α L + 2 4 -averaged. Hence, it is non-expansive.

3. The Convergence Analysis and the Superiorized Version

In this section, we prove that the sequence { v n } generated by the exact form of (8) converges strongly to a solution of problem (1) at first. Then, we discuss the strong convergence of algorithm (8). Finally, we investigate the bounded perturbation resilience of the exact iteration by viewing it as a special case of algorithm (8). The superiorized version is also presented at the end of this section.

3.1. The Exact Form of Algorithm (8)

Given the errors e n 0 , n 0 in (8), we get the exact version of (8):
v n + 1 = t n h ( v n ) + γ n v n + λ n p r o x α n g ( v n α n D ( v n ) f ( v n ) ) = : t n h ( v n ) + γ n v n + λ n p r o x α n g ( I α n D f ) v n ,
where { t n } , { γ n } , { λ n } [ 0 , 1 ] such that 0 < inf n γ n and t n + γ n + λ n = 1 for all n 0 . h : H H is a ρ -contraction for some ρ [ 0 , 1 ) . f , g Γ 0 ( H ) , f has the Lipschitz continuous gradient f with Lipschitz constant L > 0 . D ( x ) : H H is a linear bounded operator for each x H with an upper bound N x and satisfies
n = 0 f ( v n ) D ( v n ) f ( v n ) = : n = 0 θ ( v n ) < .
Provided that { t n } , { λ n } and { α n } satisfy some additional conditions, we get the following strong convergence result of algorithm (21).
Theorem 1.
Suppose that the solution set S of (1) is not empty, (22) and the following conditions hold:
(i)
0 < α : = inf n α n α n < 2 L for all n;
(ii)
lim n t n = 0 and n = 0 t n = ;
(iii)
lim n t n λ n = 0 .
The sequence { v n } generated by algorithm (21) converges strongly to a point z S , where z is the unique solution of the following variational inequality problem:
( I h ) z , v z 0 , v S .
Proof of Theorem 1.
We will complete the proof by three steps.
Step 1. { v n } is a bounded sequence in H.
Let z S , then we have z = p r o x α n g ( I α n f ) z by Lemma 3. In view of Lemmas 2 and 7, we also get that p r o x α n g and p r o x α n g ( I α n f ) are non-expansive for n 0 . Now, let us calculate
v n + 1 z = t n ( h ( v n ) z ) + γ n ( v n z ) + λ n ( p r o x α n g ( I α n D f ) v n z ) t n h ( v n ) h ( z ) + h ( z ) z + γ n v n z + λ n p r o x α n g ( I α n D f ) v n p r o x α n g ( I α n f ) z t n ρ v n z + t n h ( z ) z + γ n v n z + λ n p r o x α n g ( I α n D f ) v n p r o x α n g ( I α n f ) v n + p r o x α n g ( I α n f ) v n p r o x α n g ( I α n f ) z ( 1 t n ( 1 ρ ) ) v n z + t n h ( z ) z + λ n α n θ ( v n ) = ( 1 t n ( 1 ρ ) ) v n z + t n ( 1 ρ ) 1 ( 1 ρ ) h ( z ) z + ( 1 t n γ n ) α n θ ( v n ) ( 1 t n ( 1 ρ ) ) v n z + t n ( 1 ρ ) 1 ( 1 ρ ) h ( z ) z + ( 1 t n + t n ρ ) α n θ ( v n ) = ( 1 t n ( 1 ρ ) ) ( v n z + α n θ ( v n ) ) + t n ( 1 ρ ) 1 ( 1 ρ ) h ( z ) z max { v n z + α n θ ( v n ) , h ( z ) z 1 ρ } .
An induction argument shows that
v n + 1 z max { v 0 z + k = 0 n α k θ ( v k ) , h ( z ) z 1 ρ } max { v 0 z + M , h ( z ) z 1 ρ } ,
where M : = n = 0 α n θ ( v n ) < as { α n } is bounded and n = 0 θ ( v n ) < . Hence, { v n } is bounded. Consequently, { h ( v n ) } is bounded since h is a ρ -contraction.
Step 2. There exists a subsequence { v n j } { v n } such that ω w ( v n j ) S .
We denote by
D n : = p r o x α n g ( I α n D f ) , T n : = p r o x α n g ( I α n f )
for briefness.
Using the notation ρ n : = λ n 1 t n , one has ρ n ( 0 , 1 ) ,
u n : = 1 1 t n ( γ n v n + λ n p r o x α n g ( I α n D f ) v n ) = ( 1 ρ n ) v n + ρ n D n v n , v n + 1 : = t n h ( v n ) + ( 1 t n ) u n .
Given z S , we consider by utilizing Lemma 6 (iii)
v n + 1 z 2 = t n ( h ( v n ) z ) + ( 1 t n ) ( u n z ) 2 = t n ( h ( v n ) h ( z ) + h ( z ) z ) + ( 1 t n ) ( u n z ) 2 t n ( h ( v n ) h ( z ) ) + ( 1 t n ) ( u n z ) 2 + 2 t n h ( z ) z , v n + 1 z t n ρ 2 v n z 2 + ( 1 t n ) u n z 2 + 2 t n h ( z ) z , v n + 1 z .
Meanwhile, we derive
u n z 2 = ( 1 ρ n ) v n + ρ n D n v n z 2 = v n z + ρ n ( D n v n v n ) 2 = v n z 2 + ρ n 2 D n v n v n 2 2 ρ n v n z , v n D n v n = v n z 2 + ρ n 2 D n v n v n 2 ρ n ( v n z 2 + D n v n v n 2 D n v n z 2 ) = ( 1 ρ n ) v n z 2 ρ n ( 1 ρ n ) D n v n v n 2 + ρ n D n v n z 2 .
Notice that
D n v n v n 2 = ( D n T n ) v n + T n v n v n 2 α n 2 θ ( v n ) 2 + T n v n v n 2 + 2 α n θ ( v n ) T n v n v n ,
and that
D n v n z 2 = D n v n T n v n + T n v n z 2 α n 2 θ ( v n ) 2 + v n z 2 + 2 α n θ ( v n ) T n v n z α n 2 θ ( v n ) 2 + v n z 2 + 2 α n θ ( v n ) v n z
since T n is non-expansive and z = T n z (see Lemma 3). We then obtain by substituting (30) and (31) into (29)
u n z 2 = ( 1 ρ n ) v n z 2 ρ n ( 1 ρ n ) D n v n v n 2 + ρ n D n v n z 2 ( 1 ρ n ) v n z 2 ρ n ( 1 ρ n ) α n 2 θ ( v n ) 2 + T n v n v n 2 + 2 α n θ ( v n ) T n v n v n + ρ n α n 2 θ ( v n ) 2 + v n z 2 + 2 α n θ ( v n ) v n z = v n z 2 ρ n ( 1 ρ n ) T n v n v n 2 + ρ n 2 α n 2 θ ( v n ) 2 2 α n ρ n ( 1 ρ n ) θ ( v n ) T n v n v n + 2 ρ n α n θ ( v n ) v n z .
Combining (28) and (32), we get
v n + 1 z 2 t n ρ 2 v n z 2 + ( 1 t n ) u n z 2 + 2 t n h ( z ) z , v n + 1 z t n ρ 2 v n z 2 + ( 1 t n ) [ v n z 2 ρ n ( 1 ρ n ) T n v n v n 2 + ρ n 2 α n 2 θ ( v n ) 2 2 α n ρ n ( 1 ρ n ) θ ( v n ) T n v n v n + 2 ρ n α n θ ( v n ) v n z ] + 2 t n h ( z ) z , v n + 1 z = ( t n ρ 2 + ( 1 t n ) ) v n z 2 ρ n ( 1 ρ n ) ( 1 t n ) T n v n v n 2 + ρ n 2 ( 1 t n ) α n 2 θ ( v n ) 2 2 α n λ n ( 1 ρ n ) θ ( v n ) T n v n v n + 2 λ n α n θ ( v n ) v n z + 2 t n h ( z ) z , v n + 1 z ( 1 t n ( 1 ρ 2 ) ) v n z 2 λ n ( 1 ρ n ) T n v n v n 2 + ρ n 2 ( 1 t n ) α n 2 θ ( v n ) 2 + 2 λ n α n θ ( v n ) v n z + 2 t n h ( z ) z , v n + 1 z = ( 1 t ¯ n ) v n z 2 + ρ n λ n α n 2 θ ( v n ) 2 + 2 λ n α n θ ( v n ) v n z + t ¯ n 2 1 ρ 2 h ( z ) z , v n + 1 z λ n γ n t n ( 1 t n ) ( 1 ρ 2 ) T n v n v n 2 ( 1 t ¯ n ) v n z 2 + t ¯ n ζ n + U n ,
where t ¯ n : = t n ( 1 ρ 2 ) ,
ζ n : = 2 1 ρ 2 h ( z ) z , v n + 1 z λ n γ n t n ( 1 t n ) ( 1 ρ 2 ) T n v n v n 2 , U n : = ρ n λ n α n 2 θ ( v n ) 2 + 2 λ n α n θ ( v n ) v n z .
Obviously, we have
| ζ n | 1 1 ρ 2 h ( z ) z v n z < ,
which implies that lim sup n ζ n is a finite number. Thus, there exists a subsequence { ζ n j } { ζ n } such that lim sup n ζ n = lim j ζ n j . In addition, without loss of generality, we may assume that v n j converges weakly to some v * H as j since { v n } is bounded. Notice that
v n j + 1 v n j = t n j ( h ( v n j ) v n j ) + λ n j ( D n j v n j v n j ) t n j h ( v n j ) v n j + λ n j D n j v n j T n j v n j + T n j v n j v n j t n j [ h ( v n j ) + v n j ] + λ n j α n j θ ( v n j ) + λ n j T n j v n j v n j 0 , a s j .
We conclude that { v n j + 1 } also converges weakly to v * . As a result, lim n h ( z ) z , v n j + 1 z exists. Hence, we have
lim sup n ζ n = lim j ζ n j = lim j 1 1 ρ 2 h ( z ) z , v n j + 1 z λ n j γ n j t n j ( 1 t n j ) ( 1 ρ 2 ) T n j v n j v n j 2 = 1 1 ρ 2 h ( z ) z , v * z lim j λ n j γ n j t n j ( 1 t n j ) ( 1 ρ 2 ) T n j v n j v n j 2 .
In light of the fact that { γ n j 1 t n j } is bounded and inf n γ n j 1 t n j > 0 , the sequence { λ n j t n j T n j v n j v n j 2 } is bounded. Then, condition (iii) implies
lim j T n j v n j v n j = 0 .
Set T : = p r o x α g ( I α f ) with α = inf n α n > 0 and apply Lemma 1. We get
lim j T v n j v n j 2 lim j T n j v n j v n j = 0 .
Then, Lemma 4 guarantees that ω w ( v n j ) S .
Step 3. { v n } converges strongly to z S .
Let us show that (33) satisfies the conditions of Lemma 5. From Step 2, it has v n j + 1 v * ( S ) as j . Therefore,
lim sup n ζ n 2 1 ρ 2 lim sup n h ( z ) z , v n + 1 z = 2 1 ρ 2 lim j h ( z ) z , v n j + 1 z = 2 1 ρ 2 h ( z ) z , v * z 0 .
In addition, it is obvious that n = 0 U n < (see (34)) since { ρ n } , { λ n } , { α n } , { v n z } are bounded sequences and n = 0 θ ( v n ) < . Finally, we apply Lemma 5 to (33) to conclude that v n z 0 , as n . This ends the proof. □

3.2. The Strong Convergence of Algorithm (8)

Theorem 2.
Suppose that the solution set S of (1) is not empty, (22) and the following conditions hold:
(i)
0 < α : = inf n α n α n < 2 L for all n;
(ii)
lim n t n = 0 and n = 0 t n = ;
(iii)
lim n t n λ n = 0 ;
(iv)
n = 0 e n < .
Then, the sequence { x n } generated by algorithms (8) converges strongly to a point z S .
Proof of Theorem 2.
Let { x n } , { v n } be generated by (8) and (21), respectively. Then, { v n } converges strongly to a solution of problem (1) according to Theorem 1. Thus, we only need to prove that x n v n 0 as n .
We denote by D n : = p r o x α n g ( I α n D f ) and T n : = p r o x α n g ( I α n f ) , respectively. T n is non-expansive according to Lemma 7. Then, we have
x n + 1 v n + 1 = t n ( h ( x n ) h ( v n ) ) + γ n ( x n v n ) + λ n [ p r o x α n g ( I α n D f ) x n p r o x α n g ( I α n D f ) v n ] + e n = t n ( h ( x n ) h ( v n ) ) + γ n ( x n v n ) + λ n ( ( D n T n ) ( x n ) ( D n T n ) ( v n ) + T n ( x n ) T n ( v n ) ) + e n t n h ( x n ) h ( v n ) + γ n x n v n + λ n ( ( D n T n ) ( x n ) + ( D n T n ) ( v n ) + T n ( x n ) T n ( v n ) ) + e n t n ρ x n v n + γ n x n v n + λ n [ α n ( θ ( x n ) + θ ( v n ) ) + x n v n ] + e n ( 1 t n ( 1 ρ ) ) x n v n + λ n α n ( θ ( x n ) + θ ( v n ) ) + e n .
Applying Lemma 5 to inequality (41), we get x n v n 0 as n . We then have completed the proof owing to Theorem 1. □

3.3. Bounded Perturbation Resilience

This subsection is devoted to verifying the bounded perturbation resilience property of algorithm (21) and showing the superiorized version of it.
Given a problem Ψ , let A : H H be a basic algorithm operator.
Definition 3.
[9] An algorithmic operator A is bounded perturbation resilient if the sequence { v n } , generated by v n + 1 = A v n with v 0 H , converges to a solution to Ψ; then, any sequence { x n } generated by x n + 1 = A ( x n + β n y n ) with any x 0 H , also converges to a solution of Ψ, where { y n } n = 0 H is bounded, and { β n } n = 0 R are such that β n 0 for all n 0 and n = 0 β n < .
If we take algorithm (21) as the basic algorithm A, the following iteration is the bounded perturbation of it:
x n + 1 : = t n h ( x n + β n y n ) + γ n ( x n + β n y n ) + λ n p r o x α n g ( I α n D f ) ( x n + β n y n ) , n 0 .
We have the following result.
Theorem 3.
Let H be a real Hilbert space, and h : H H a ρ-contractive operator with ρ ( 0 , 1 ) , f, g Γ 0 ( H ) . Assume that the solution set S to (1) is nonempty and that f has Lipschitz continuous gradient f on H with the Lipschitz constant L > 0 . { β n } , { y n } satisfy the conditions in Definition 3, { t n } , { γ n } , { λ n } and { α n } satisfy the conditions in Theorem 1, respectively. Then, any sequence { x n } generated by (42) converges strongly to a point x * in S. Thus, algorithm (21) is bounded perturbation resilient.
Proof of Theorem 3.
We can rewrite algorithm (42) as
x n + 1 = t n h ( x n ) + γ n x n + λ n p r o x α n g ( I α n D f ) x n + e ˜ n
with
e ˜ n : = t n ( h ( x n + β n y n ) h ( x n ) ) + γ n β n y n + λ n [ p r o x α n g ( I α n D f ) ( x n + β n y n ) p r o x α n g ( I α n D f ) x n ] ,
which is obviously the same form as (8) if we certify that n = 0 e ˜ n < . In fact, we have
e ˜ n = t n ( h ( x n + β n y n ) h ( x n ) ) + γ n β n y n + λ n [ p r o x α n g ( I α n D f ) ( x n + β n y n ) p r o x α n g ( I α n D f ) x n ] = t n ( h ( x n + β n y n ) h ( x n ) ) + γ n β n y n + λ n [ ( D n T n ) ( x n + β n y n ) ( D n T n ) x n + T n ( x n + β n y n ) T n x n ] t n ρ β n y n + γ n β n y n + λ n [ α n ( θ ( x n + β n y n ) + θ ( x n ) ) + β n y n ] = ( 1 t n ( 1 ρ ) ) β n y n + λ n α n ( θ ( x n + β n y n ) + θ ( x n ) ) ,
where D n and T n are defined by (26), respectively. Then, it is easy to conclude that
n = 0 e ˜ n <
since { β n } , { θ ( x n + β n y n ) } , { θ ( x n ) } are all summable. Hence, Theorem 2 guarantees that any sequence { x n } generated by (42) converges strongly to a solution of (1). That is to say, algorithm (21) is bounded perturbation resilient. □
The superiorized version is equipped with an optimization criterion, which is usually a function ϕ : H R , with the convention that, for x H , ϕ ( x ) being smaller is considered superior. To ensure this, it needs a new concept, named nonascending direction for ϕ at x. A vector v H is called nonascending for ϕ at x H if v 1 , and there exists a constant δ > 0 such that, for all λ [ 0 , δ ] , ϕ ( x + λ v ) ϕ ( x ) . Such v at least exists one, namely, zero vector. The superiorization method then provides us with an automatic way of turning the original iterative algorithm for solving problem (1) into an algorithm, for which the value of the objective function at each iteration is not larger than that under the original iterative algorithm. At the same time, the value of ϕ is smaller than it is under the original algorithm. Superiorization does this by assuming that there are a summable sequence { β k } of positive real numbers and a bounded vector sequence { v k } H (Each { v k } is a nonascending direction for ϕ at some x H , and β k v k , together with the original iterative point, generates a new iterative point), and further by depending on a I steering steps aimed at reducing the values of ϕ at these iterative points. In addition, it makes use of a logical variable called loop. In this paper, we choose the optimization criterion function as the objective function in problem (1). Then, the superiorized version of (21) is as specified below:

4. Numerical Experiments

In this section, we solve the l 1 l 2 norm problem by two numerical examples to illustrate the performance of the proposed iterations. The concerned algorithms are Algorithm 1 (MPGAS), the bounded perturbation algorithm (42) (MPGAB) and basic algorithm (21) (MPGA). All of these experiments were done on a quad core Intel i7-8550U CPU @1.8 GHz with 16 GB DDR4 memory.
Algorithm 1: Superiorized Version of (21)
1: Given x 0
2: set k = 0 , l = 1
3: set x k = x 0
4: set Error = Constant
5: while Error > ε
6:      set n = 0
7:      set x k n = x k
8:      while n I
9:        set y n to be a nonascending vector for ϕ at x k n
10:       set loop = true
11:      while loop
12:           l = l + 1
13:          set β n = c l
14:          set z n = x k n + β n y n
15:          if ϕ ( z n ) ϕ ( x k n ) and Φ ( z n ) Φ ( x k n )
16:             set n = n + 1
17:             set x k n = z n
18:             set loop = false
19:         end if
20:       end while
21:      end while
22:     set x k + 1 = t k h ( x k I ) + γ k x k I + ( 1 t k γ k ) p r o x α k g ( I α k D f ) ( x k I )
23:     set Error = x k + 1 x k
24:     set k = k + 1 .

4.1. The l 1 l 2 Norm Problem

Let b k , 1 k N be an orthogonal basis of R N , μ k , 1 k N be strictly positive real numbers, let A R M × N \ { 0 } , and d R M . The l 1 l 2 problem has the following form:
min x R N k = 1 N μ k ( | x T b k | ) + 1 2 A x d 2 2 .
In signal recovery problems, d is the observed signal and the original signal x is known to have a sparse representation.
We take f ( x ) = 1 2 A x b 2 2 , g ( x ) = k = 1 N μ k ( | x T b k | ) . Then, f ( x ) = A T ( A x b ) with Lipschtz constant L = A T A , where A T refers to the transpose of A. The above l 1 l 2 problem is a special case to problem (1).
In this case, the k t h component of p r o x α n g ( x ) is
[ p r o x α n g ( x ) ] k = x k + α n μ k , x k < α n μ k , 0 , x k [ α n μ k , α n μ k ] , x k α n μ k , x k > α n μ k ,
where x = ( x 1 , x 2 , , x N ) T R N . Then, given v 0 , x 0 R N , arbitrarily, sequences { v n } , generated by (21), and { x n } , generated by (42), can be rewritten as
v n + 1 = t n h ( v n ) + γ n v n + λ n p r o x α n g ( v n α n D ( v n ) A T ( A ( v n ) b ) ) ; x n + 1 = t n h ( x n + β n y n ) + γ n ( x n + β n y n )
+ λ n p r o x α n g ( x n + β n y n α n D ( x n + β n y n ) A T ( A ( x n + β n y n ) b ) ) ,
respectively.

4.2. Numerical Examples

Example 1.
Let H = R 2 , μ 1 = μ 2 = 1 ,
A = 1 2 0 1 , d = 1 2 .
A straightforward calculation shows that the solution set S = { ( 0 , 0.6 ) T } to (47) and the minimum value of the objective function for (47) is 1.6 . We solve this problem with the algorithms proposed in this paper. The numerical results can be found in Table 1.
Suppose that the contraction h ( x ) = 1 3 x , and the diagonal scaling matrix D ( x n ) = d i a g { d i i ( x n ) } = d i a g { ( 1 + 1 n 2 ) } . We choose t n = 1 3 n , γ n = 0.01 + 1 3 k , λ n = 1 t n γ n , and the step size sequence α n = n L ( n + 1 ) . For algorithm (21) with bounded perturbations, we choose the bounded sequence { y n } as
y n = x n x n 1 , i f x n 0 , 0 , i f x n = 0 ,
the summable nonnegative real sequence { β n } as β n = c n for some c ( 0 , 1 ) . For the superiorized version of (21), we take the function ϕ as the objective function in problem (47), that is
ϕ ( x ) = k = 1 2 ( | x T b k | ) + 1 2 A x d 2 2 .
The iteration numbers (“Iter"), the values of x n (“ x n ”), the values of the objective function (“Obj”) are reported in Table 1 when the stopping criterion
E r r : = x n ( 0 , 0.6 ) T 2 < 10 3
is reached.
The following Figure 1 is for x 0 = 2 r a n d ( 2 , 1 ) .
From Table 1 and Figure 1 above, we see that the superiorized version and the bounded perturbation algorithm of (21) arrived at the minimum and the unique minimum point by nine iterations while the original algorithm (21) took 47 iterations to attain the same minimum with the zero initial value. Similar results were also obtained with the initial value x 0 from uniform distribution.
We now discuss a general case of problem (47) by the above-mentioned algorithms.
Example 2.
Let the system matrix A R 50 × 200 be stimulated by standard Gaussian distribution, μ k = 1 , k = 1 , 2 , , 200 . Let the vector d R 50 be generated from a uniform distribution in the interval [−2,2]. Solve the optimal problem (47) with the above-mentioned algorithms.
We take the parameters in the algorithms as follows:
1. 
Algorithm parameters:
The contraction h ( x ) = x / 3 . The diagonal scaling matrix
D ( x n ) = d i a g { d i i ( x n ) } = d i a g { 1 1 n 2 } .
t n = 1 3 n , γ n = 0.01 + 1 2 n , then λ n = 1 t n γ n . The step size sequence α n = n L ( n + 1 ) .
2. 
Algorithm parameters for the superiorized version:
The summable nonnegative real sequence { β n } : β n = c n for some c ( 0 , 1 ) and I = 10 . We set ϕ as the objective function in problem (47).
The iteration numbers I t e r , the computing time in seconds (“ C P U ( s ) ”), the error’s values (“ E r r ”) are reported in Table 2 with a random initial guess x 0 = 2 r a n d ( 200 , 1 ) when the stopping criterion
E r r : = x n + 1 x n 2 < ε
is reached, where ε is a given small positive constant.
We find from Table 2 that there is no increase in the execution time of the computer by running the superiorized version, MPGAS, of original algorithm (21). In contrast, compared to the algorithms MPGAB and MPGA, MPGAS even reduces the operation time to get a smaller objective function value under the same stop criterions and initial value x 0 .

5. Conclusions

In this paper, we have proposed a proximal scaled gradient algorithm with multi-parameters and studied the strong convergence of it in a real Hilbert space for solving a composite optimization problem. We have investigated the bounded perturbation resilience and the superiorized version of it as well. The validity of the proposed algorithm and the comparison of the original iteration, the bounded perturbation form and the superiorized version of it were illustrated by numerical examples. The results and numerical examples in this paper are a new attempt or application of a newly developed superiorization method. It shows that this method works well to some degree for the proposed algorithm.

Author Contributions

All authors contributed equally and significantly to this paper. Conceptualization, Y.G.; Data curation, Y.G. and X.Z.; Formal analysis, Y.G. and X.Z.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (Grant No. 3122018L004) and China Scholarship Council (Grant No. 201807315013).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Censor, Y.; Davidi, R.; Herman, G.T. Perturbation resilience and superiorization of iterative algorithms. Inverse Probl. 2010, 26, 65008. [Google Scholar] [CrossRef] [PubMed]
  2. Davidi, R.; Schulte, R.W.; Censor, Y.; Xing, L. Fast superiorization using a dual perturbation scheme for proton computed tomography. Trans. Am. Nucl. Soc. 2012, 106, 73–76. [Google Scholar]
  3. Davidi, R.; Herman, G.T.; Censor, Y. Perturbation-resilient block-iterative projection methods with application to image reconstruction from projections. Int. Trans. Oper. Res. 2009, 16, 505–524. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Nikazad, T.; Davidi, R.; Herman, G.T. Accelerated perturbation-resilient block-iterative projection methods with application to image reconstruction. Inverse Probl. 2012, 28, 035005. [Google Scholar] [CrossRef] [PubMed]
  5. Censor, Y.; Chen, W.; Combettes, P.L.; Davidi, R.; Herman, G.T. On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput. Optim. Appl. 2012, 51, 1065–1088. [Google Scholar] [CrossRef]
  6. Censor, Y.; Zaslavski, A.J. Strict Fejér monotonicity by superiorization of feasibility-seeking projection methods. J. Optim. Theory Appl. 2015, 165, 172–187. [Google Scholar] [CrossRef]
  7. Davidi, R.; Censor, Y.; Schulte, R.W.; Geneser, S.; Xing, L. Feasibility-seeking and superiorization algorithm applied to inverse treatment plannning in rediation therapy. Contemp. Math. 2015, 636, 83–92. [Google Scholar]
  8. Censor, Y.; Zaslavaski, A.J. Convergence and perturbation resilience of dynamic string averageing projection methods. Comput. Optim. Appl. 2013, 54, 65–76. [Google Scholar] [CrossRef]
  9. Censor, Y.; Davidi, R.; Herman, G.T.; Schulte, R.W.; Tetruashvili, L. Projected subgradient minimization versus superiorization. J. Optim. Theory Appl. 2014, 160, 730–747. [Google Scholar] [CrossRef]
  10. Dong, Q.L.; Lu, Y.Y.; Yang, J. The extragradient algorithm with inertial effects for solving the variational inequality. Optimization 2016, 65, 2217–2226. [Google Scholar] [CrossRef]
  11. Garduño, E.; Herman, G. Superiorization of the ML-EM algorithm. IEEE Trans. Nucl. Sci. 2014, 61, 162–172. [Google Scholar]
  12. He, H.; Xu, H.K. Perturbation resilience and superiorization methodology of averaged mappings. Inverse Probl. 2017, 33, 040301. [Google Scholar] [CrossRef]
  13. Jin, W.; Censor, Y.; Jiang, M. Bounded perturbation resilience of projected scaled gradient methods. J. Comput. Optim. Appl. 2016, 63, 365–392. [Google Scholar] [CrossRef]
  14. Schrapp, M.J.; Herman, G.T. Data fusion in X-ray computed tomography using a superiorization approach. Rev. Sci. Instrum. 2014, 85, 055302. [Google Scholar] [CrossRef] [PubMed]
  15. Guo, Y.N.; Cui, W. Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm. J. Inequal. Appl. 2018, 2018, 103. [Google Scholar] [CrossRef] [PubMed]
  16. Zhu, J.H.; Penfold, S. Total variation superiorization in dualenergy CT reconstruction for proton therapy treatment planning. Inverse Probl. 2017, 33, 044013. [Google Scholar] [CrossRef]
  17. Zibetti, M.V.W.; Lin, C.A.; Herman, G.T. Total variation superiorized conjugate gradient method for image reconstruction. Inverse Probl. 2018, 34, 034001. [Google Scholar] [CrossRef] [Green Version]
  18. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Space; Dilcher, K., Taylor, K., Eds.; Springer: New York, NY, USA, 2011. [Google Scholar]
  19. Xu, H.K. Properties and Iterative Methods for the Lasso and Its Variants. Chin. Ann. Math. 2014, 35, 501–518. [Google Scholar] [CrossRef]
  20. Strand, O.N. Theory and methods related to the singular-function expansion and Landweber iteration for integral equations of the first kind. SIAM J. Numer. Anal. 1974, 11, 798–825. [Google Scholar] [CrossRef]
  21. Piana, M.; Bertero, M. Projected Landweber method and preconditioning. Inverse Probl. 1997, 13, 441–463. [Google Scholar] [CrossRef]
  22. Neto, E.S.; Helou, D.; Álvaro, R. Convergence results for scaled gradient algorithms in positron emission tomography. Inverse Probl. 2005, 21, 1905–1914. [Google Scholar] [CrossRef]
  23. Guo, Y.N.; Cui, W.; Guo, Y.S. Perturbation resilience of proximal gradient algorithm for composite objectives. J. Nonlinear Sci. Appl. 2017, 10, 5566–5575. [Google Scholar] [CrossRef] [Green Version]
  24. Xu, H.K. Iterative methods for the split feasibility problem in infinite-dimensional Hilbert space. Inverse Probl. 2010, 26, 105018. [Google Scholar] [CrossRef]
  25. Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
  26. Marino, G.; Xu, H.K. Convergence of generalized proximal point algorithm. Commun. Pure Appl. Anal. 2004, 3, 791–808. [Google Scholar]
  27. Xu, H.K. Iterative algorithms for nonlinear operators. J. Lond. Math. Soc. 2002, 66, 240–256. [Google Scholar] [CrossRef]
  28. Xu, H.K. Error sensitivity for strongly convergent modifications of the proximal point algorithm. J. Optim. Theory Appl. 2015, 168, 901–916. [Google Scholar]
Figure 1. The number of iterations with Algorithm 1(MPGAS), algorithm (42)(MPGAB) and algorithm (21)(MPGA).
Figure 1. The number of iterations with Algorithm 1(MPGAS), algorithm (42)(MPGAB) and algorithm (21)(MPGA).
Mathematics 07 00535 g001
Table 1. Results for Example 1.
Table 1. Results for Example 1.
Methods x 0 = ( 0 , 0 ) T x 0 = 2 rand ( 2 , 1 )
Iter x n ObjIter x n Obj
MPGAS9 [ 0.0000 , 0.6000 ] T 1.6009 [ 0.0000 , 0.6000 ] T 1.600
MPGAB9 [ 0.0000 , 0.6000 ] T 1.6009 [ 0.0000 , 0.6001 ] T 1.600
MPGA47 [ 0.0000 , 0.599044 ] T 1.60047 [ 0.0000 , 0.599044 ] T 1.600
Table 2. Results for Example 2 with x 0 = 2 r a n d ( 200 , 1 ) .
Table 2. Results for Example 2 with x 0 = 2 r a n d ( 200 , 1 ) .
Methods Err < 10 4 Err < 10 6
Iter CPU ( s ) Obj Iter CPU ( s ) Obj
MPGAS3674.26562527.12716523.34375022.790
MPGAB11545.81250028.56323547.59375022.793
MPGA20849.78125032.38623548.04687522.793

Share and Cite

MDPI and ACS Style

Guo, Y.; Zhao, X. Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters. Mathematics 2019, 7, 535. https://doi.org/10.3390/math7060535

AMA Style

Guo Y, Zhao X. Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters. Mathematics. 2019; 7(6):535. https://doi.org/10.3390/math7060535

Chicago/Turabian Style

Guo, Yanni, and Xiaozhi Zhao. 2019. "Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters" Mathematics 7, no. 6: 535. https://doi.org/10.3390/math7060535

APA Style

Guo, Y., & Zhao, X. (2019). Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters. Mathematics, 7(6), 535. https://doi.org/10.3390/math7060535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop