Next Article in Journal
Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications
Previous Article in Journal
Marshall–Olkin Length-Biased Maxwell Distribution and Its Applications
Previous Article in Special Issue
Nonlinear Analysis for a Type-1 Diabetes Model with Focus on T-Cells and Pancreatic β-Cells Behavior
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Proximal Gradient Method for Solving Bilevel Optimization Problems

by
Seifu Endris Yimer
1,2,
Poom Kumam
1,3,* and
Anteneh Getachew Gebrie
2
1
KMUTTFixed Point Research Laboratory, SCL 802 Fixed Point Laboratory & Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi (KMUTT), 126 Pracha-Uthit Road, Bang Mod, Thrung Khru, Bangkok 10140, Thailand
2
Department of Mathematics, College of Computational and Natural Science, Debre Berhan University, P.O. Box 445 Debre Berhan, Ethiopia
3
Center of Excellence in Theoretical and Computational Science (TaCS-CoE), Science Laboratory Building, Faculty of Science, King Mongkut’s University of Technology Thonburi (KMUTT), 126 Pracha-Uthit Road, Bang Mod, Thrung Khru, Bangkok 10140, Thailand
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2020, 25(4), 66; https://doi.org/10.3390/mca25040066
Submission received: 1 June 2020 / Revised: 16 August 2020 / Accepted: 17 August 2020 / Published: 4 October 2020
(This article belongs to the Special Issue Numerical and Evolutionary Optimization 2019)

Abstract

:
In this paper, we consider a bilevel optimization problem as a task of finding the optimum of the upper-level problem subject to the solution set of the split feasibility problem of fixed point problems and optimization problems. Based on proximal and gradient methods, we propose a strongly convergent iterative algorithm with an inertia effect solving the bilevel optimization problem under our consideration. Furthermore, we present a numerical example of our algorithm to illustrate its applicability.

1. Introduction

Let H be a real Hilbert space and consider the constrained minimization problem:
min   h s . t . x C
where C is a nonempty closed convex subset of H and h : H R is a convex and continuously differentiable function. The gradient–projection algorithm (GPA, for short) is usually applied to solve the minimization problem (1) and has been studied extensively by many authors; see, for instance, [1,2,3] and references therein. This algorithm generates a sequence { x n } through the recursion:
x n + 1 = P C ( x n γ h ( x n ) ) ,
where h is the gradient of h, x 0 is the initial guess chosen arbitrarily from C, γ is a stepsize which may be chosen in different ways, and P C is the metric projection from H onto C. By the optimality condition on problem (1), it follows that
x ¯ C   solves   ( 1 ) if   and   only   if h ( x ¯ ) , y x ¯ 0 , y C .
If h is Lipschitz continuous and strongly monotone, i.e., there exists L h > 0 and σ > 0 such that for all x , y H ,
h ( x ) h ( y ) L h x y and h ( x ) h ( y ) , x y σ x y 2 ,
then the operator T γ = P C ( I γ h ) is a contraction provided that 0 < γ < 2 σ L h 2 . Therefore, for 0 < γ < 2 σ L h 2 , we can apply Banach’s contraction principle to get that the sequence { x n } defined by (2) converges strongly to the unique fixed point of T γ (or the unique solution of the minimization (1)). Moreover, if you set C = H in (1), then we have an unconstrained optimization problem, and hence the gradient algorithm
x n + 1 = x n γ h ( x n )
generates a sequence { x n } strongly convergent to the global minimizer point of h.
Consider the other most well-known problem called unconstrained minimization problem:
min   g s . t . x H ,
where H is a real Hilbert space and g : H R { + } is a proper, convex, lower semicontinuous function. An analogous method for solving (3) with better properties is based on the notion of proximal mapping introduced by Moreau [4], i.e., the proximal operator of the function g with scaling parameter λ > 0 is a mapping prox λ g : H H given by
prox λ g ( x ) = arg min y H { g ( y ) + 1 2 λ x y 2 } .
Proximal operators are firmly nonexpansive and the optimality condition of (3) is
x ¯ H   solves   ( 3 )   if   and   only   if   prox λ g ( x ¯ ) = x ¯ .
Many properties of proximal operator can be found in [5] and the references therein. We know that the so called proximal point algorithm, i.e., x n + 1 = prox λ g ( x n ) , is the most popular method solving optimization problem (3) (introduced by Martinet [6,7] and later by Rockafellar [8]).
The split inverse problem (SIP) [9] is formulated by linking problems installed in two different places X and Y connected by a linear transformations, i.e., SIP is a problem of finding a point in space X solving a problem IP1 installed in X and its image under linear transformation solves a problem IP2 installed in another space Y. The presence of step size choice dependent on operator norm is not quite recommended in the iterative method of solving SIPs, as it is not always easy to estimate the norm of an operator; see, for example, the Theorem of Hendrickx and Olshevsky in [10]. For example, in the early study of the iterative method of solving the split feasibility problem [11,12,13], the determination of the step-size depends on the operator norm (or at least estimate value of the operator norm) and this is not as easy of a task. To overcome this difficulty, Lopez et al. [14] introduced a new way of selecting the step sizes that the information of operator norm is not necessary for solving a split feasibility problem (SFP):
find   x ¯ C   such   that   A x ¯ Q
where C and Q are closed convex subsets of real Hilbert spaces H 1 and H 2 , respectively. To be precise, Lopez et al. [14] introduced an iterative algorithm that generates a sequence { x n } by
x n + 1 = P C ( I γ n A * ( I P Q ) A ) x n .
The parameter γ n appeared in (4) by γ n = ρ n l ( x n ) l ( x n ) 2 , n 1 , where ρ n ( 0 , 4 ) , l ( x n ) = 1 2 ( I P Q ) A x n 2 and l ( x n ) = A * ( I P Q ) A x n .
A bilevel problem is a two-level hierarchical problem such that the solution of the lower level problem determines the feasible space of the upper level problem. In general, Yimer et al. [15] presented a bilevel problem as an archetypal model given by
f i n d   x ¯ S X   t h a t   s o l v e s   p r o b l e m   P 1   i n s t a l l e d   i n   a   s p a c e   X ,
where S is the solution set of the problem
f i n d   x * Y X   t h a t   s o l v e s   p r o b l e m   P 2   i n s t a l l e d   i n   a   s p a c e   X .
According to [16], the bilevel problem (problem (5) and (6)) is a hierarchical game of two players as decision makers who make their decisions according to a hierarchical order. The problem is also called the leader’s and follower’s problem where the problem (5) is called the leader’s problem and (6) is called the follower’s problem, meaning, the first player (which is called the leader) makes his selection first and communicates it to the second player (the so-called follower). There are many studies for several type bilevel problems, see, for example, [15,17,18,19,20,21,22,23,24]. The bilevel optimization problem is a bilevel problem when the hierarchical structure involves the optimization problem. Bilevel optimization problems have become an increasingly important class of optimization problems during the last few years and decades due their to vast application of solving the real life problems. For example, in toll-setting problem [25], in chemical engineering [26], in electricity markets [27], and in supply chain problems [28].
Motivated by the above theoretical results and inspired by the applicability of the bilevel problem, we consider the following bilevel optimization problem given by
min   h s . t . x i = 1 N Fix U i ,   A ( x ) j = 1 M arg min g j ,
where A : H 1 H 2 is a linear transformation, h : H 1 R is convex function, g j : H 2 R { + } is convex nonsmooth function, and arg min g j = { z ¯ H 2 : g j ( z ¯ ) g j ( z ) , z H 2 } for j { 1 , , M } , U i : H 1 H 1 is demimetric mapping and Fix U i = { x H 1 : U i ( x ) = x } for i { 1 , , N } , and H 1 and H 2 are two real Hilbert spaces.
For a real Hilbert space H, the mapping U : H H with Fix U is called ω -demimetric if ω ( , 1 ) and
x x ¯ , x U x 1 ω 2 x U x 2 , x H , x ¯ Fix U .
The demimetric mapping is introduced by Takahashi [29] in a smooth, strictly convex and reflexive Banach space. For a real Hilbert space H, (8) is equivalent to the following:
U x x ¯ 2 x x ¯ 2 + ω x U x 2 , x H , x ¯ Fix U ,
and Fix U is a closed and convex subset of H [29]. The class of demimetric mappings contains the classes of strict pseudocontractions, firmly quasi-nonexpansive mappings, and quasi-nonexpansive mappings, see [29,30] and the references therein.
Assume that Ω is the set of solutions of lower level problems of the bilevel optimization problem (7), that is,
Ω = x i = 1 N Fix U i : A ( x ) j = 1 M arg min g j .
Therefore, the bilevel optimization problem (7) is simply
min   h s . t . x Ω ,
where Ω is given by (9). If H 1 = H 2 = H , A = I (identity operator), g = g j for all j { 1 , , M } , the problem (7) is reduced to the bilevel optimization problem:
min   h s . t . x arg min g .
Bilevel problems like (10) have already been considered in the literature, for example, [23,31,32] for the case H = R p .
Note that, to the best of our knowledge, the bilevel optimization problem (7), with a finite intersection of fixed point sets of the broadest class of nonlinear mappings and finite intersection of minimize point sets of non-smooth functions as a lower level, has not been addressed before.
An inertial term is a two-step iterative method, and the next iterate is defined by making use of the previous two iterates. It is firstly introduced by Polyak [33] as an acceleration process in solving a smooth convex minimization problem. It is well known that combining algorithms with an inertial term speeds up or accelerates the rate of convergence of the sequence generated by the algorithm. In this paper, we introduce a proximal gradient inertial algorithm with a strong convergence result for approximating a bilevel optimization problem (7), where our algorithm is designed to address a way of selecting the step-sizes such that its implementation does not need any prior information about the operator norm.

2. Preliminary

Let C be a nonempty closed convex subset of a real Hilbert space H. The metric projection on C is a mapping P C : H C defined by
P C ( x ) = arg min { y x : y C } , x H .
For x H and z C , then z = P C ( x ) if and only if x z , y z 0 , y C .
Let T : H H . Then,
(a) 
T is L-Lipschitz if there exists L > 0 such that
T x T y L x y , x , y H .
If L ( 0 , 1 ) , then we call T a contraction with constant L. If L = 1 , then T is called a nonexpansive mapping.
(b) 
T is strongly monotone if there exists σ > 0 such that
T x T y , x y σ x y 2 , x , y H .
In this case, T is called σ -strongly monotone.
(c) 
T is firmly nonexpansive if
T x T y 2 x y 2 | | ( I T ) x ( I T ) y 2 , x , y H ,
which is equivalent to
T x T y 2 T x T y , x y , x , y H .
If T is firmly nonexpansive, I T is also firmly nonexpansive.
Let H be a real Hilbert space. If G : H 2 H is maximal monotone set-valued mapping, then we define the resolvent operator J λ G associated with G and λ > 0 as follows:
J λ G ( x ) = ( I + λ G ) 1 ( x ) , x H .
It is well known that J λ G is single-valued, nonexpansive, and 1-inverse strongly monotone (firmly nonexpansive). Moreover, 0 G ( x ¯ ) if and only if x ¯ is a fixed point of J λ G for all λ > 0 ; see more about maximal monotone and its associated resolvent operator and examples of maximal monotone operators in [34].
The subdifferential of a convex function f : H R { + } at x H , denoted by f ( x ) , is defined by
f ( x ) = { ξ H : f ( z ) f ( x ) + ξ , z x , z H } .
If f ( x ) , f is said to be subdifferentiable at x. If the function f is continuously differentiable, then f ( x ) = { f ( x ) } ; this is the gradient of f. If f is a proper, lower semicontinuous function, the subdifferential operator is a maximal monotone operator, and the proximal operator is the resolvent of the subdifferential operator (see, for example, in [5]), i.e.,
prox λ f = J λ f = ( I + λ f ) 1
Thus, this results in proximal operators being firmly nonexpansive, and a point x ¯ minimizes f if and only if prox λ f ( x ¯ ) = x ¯ .
Definition 1.
Let H be a real Hilbert space. A mapping U : H H is called demiclosed if, for a sequence { x n } in H such that { x n } converges weakly to x ¯ and lim n x n U x n = 0 , U x ¯ = x ¯ holds.
Lemma 1.
For a real Hilbert space H, we have
(i) 
x + y 2 = x 2 + y 2 + 2 x , y , x , y H ;
(ii) 
x + y 2 x 2 + 2 y , x + y , x , y H ;
(iii) 
x , y = 1 2 x 2 + 1 2 y 2 1 2 x y 2 , x , y H .
Lemma 2.
[35] Let { c n } and { γ n } be a sequences of nonnegative real numbers, { β n } be a sequences of real numbers such that
c n + 1 ( 1 α n ) c n + β n + γ n , n 1 ,
where 0 < α n < 1 and γ n < .
(i) 
If β n α n M for some M 0 , then { c n } is a bounded sequence.
(ii) 
If α n = and lim sup n β n α n 0 , then c n 0 as n .
Definition 2.
Let { Γ n } be a real sequence. Then, { Γ n } decreases at infinity if there exists n 0 N such that Γ n + 1 Γ n for n n 0 . In other words, the sequence { Γ n } does not decrease at infinity, if there exists a subsequence { Γ n t } t 1 of { Γ n } such that Γ n t < Γ n t + 1 for all t 1 .
Lemma 3.
[36] Let { Γ n } be a sequence of real numbers that does not decrease at infinity. In addition, consider the sequence of integers { φ ( n ) } n n 0 defined by
φ ( n ) = max { k N : k n , Γ k Γ k + 1 } .
Then, { φ ( n ) } n n 0 is a nondecreasing sequence verifying lim n φ ( n ) = 0 , and, for all n n 0 , the following two estimates hold:
Γ φ ( n ) Γ φ ( n ) + 1   a n d   Γ n Γ φ ( n ) + 1 .
Let D be a closed, convex subset of a real Hilbert space H and g : D × D R be a bifunction. Then, we say that g satisfies condition C O on D if the following four assumptions are satisfied:
(a) 
g ( u , u ) = 0 , for all u D ;
(b) 
g is monotone on D, i.e., g ( u , v ) + g ( v , u ) 0 , for all u , v D ;
(c) 
for each u , v , w D , lim sup α 0 g ( α w + ( 1 α ) u , v ) g ( u , v ) ;
(d) 
g ( u , . ) is convex and lower semicontinuous on D for each u D .
Lemma 4.
[37] (Lemma 2.12) Let g satisfy condition CO on D. Then, for each r > 0 and u H 2 , define a mapping (called resolvant of g), given by
T r g ( u ) = { w D : g ( w , v ) + 1 r v w , w u 0 , v D } .
Then, the following holds:
(i) 
T r g is single-valued;
(ii) 
T r g is a firmly nonexpansive, i.e., for all u , v H ,
T r g ( u ) T r g ( v ) 2 T r g ( u ) T r g ( v ) , u v ;
(iii) 
Fix ( T r g ) = S E P ( g , D ) = { x ¯ D : g ( x ¯ , y ) 0 , y D } , where Fix ( T r g ) is the fixed point set of T r g ;
(iv) 
SEP ( g , D ) is closed and convex.

3. Main Results

Our approach here is based on taking an existing algorithm on (1), (3), and the fixed point problem of nonlinear mapping, and determining how it can be used in the setting of bilevel optimization problem (7) considered in this paper. We present a self-adaptive proximal gradient algorithm with an inertial effect for generating a sequence that converges to the unique solution of the bilevel optimization problem (7) under the the following basic assumptions.
Assumption 1.
Assume that A, h, g j ( j { 1 , , M } ) and U i ( i { 1 , , N } ) in a bilevel optimization problem (7) satisfies
A1. 
Each A is nonzero bounded linear operator;
A2. 
h is proper, convex, continuously differentiable, and the gradient h is a σ-strongly monotone operator and L h -Lipschitz continuous;
A3. 
Each U i is ω i -demimetric and demiclosed mapping for all i { 1 , , N } ;
A4. 
Each g j is a proper, convex, lower semicontinuous function for all j { 1 , , M } .
Assumption 2.
Let θ [ 0 , 1 ) and γ be a real number, and the real sequences { β n ( i ) } ( i { 1 , , N } ), { δ n ( j ) } ( j { 1 , , M } ), { α n } , { ρ n } , { ε n } satisfy the following conditions:
(C1) 
γ ( 0 , 2 σ L h 2 ) .
(C2) 
0 < lim inf n ζ n ( i ) lim sup n ζ n ( i ) < 1 , i { 1 , , N } , and i = 1 N ζ n ( i ) = 1 .
(C3) 
0 < lim inf n δ n ( j ) lim sup n δ n ( j ) < 1 , j { 1 , , M } , and j = 1 M δ n ( j ) = 1 .
(C4) 
0 < lim inf n β n lim sup n β n < β = min { 1 ω 1 , , 1 ω N } .
(C5) 
0 < α n < 1 , lim n α n = 0 and n = 1 α n = .
(C6) 
0 < ρ n < 4 and lim inf n ρ n ( 4 ρ n ) > 0 .
(C7) 
ε n > 0 and ε n = o ( α n ) .
Assuming that the Assumption 1 is satisfied, the solution set Ω of the lower level problem of (7) is nonempty, and, for each j { 1 , , M } , define l ( j ) by
l ( j ) ( x ) = 1 2 ( I prox λ g j ) A ( x ) 2 .
Note that, from Aubin [38], if g j is indicator function, then l ( j ) is convex, w-lsc and differentiable for each j { 1 , , M } , and l ( j ) is given by
l ( j ) ( x ) = A * ( I prox λ g j ) A ( x )
Next, we present and analyze the strong convergence of Algorithm 1 using l ( j ) and l ( j ) by assuming that l ( j ) is differentiable.
Algorithm 1: Self-adaptive proximal gradient algorithm with inertial effect.
Initialization: Let the real number γ and the real sequences { β n ( i ) } ( i { 1 , , N } ), { δ n ( j ) }
 ( j { 1 , , M } ), { α n } , { β n } , { ρ n } and { ε n } satisfy the conditions in Assumption 2
(C1)–(C7).
 Choose x 0 , x 1 H arbitrarily and proceed with the following computations:
Step 1.
Given the iterates x n 1 and x n ( n 1 ), choose θ n such that 0 θ n θ ¯ n , where
θ ¯ n : = min θ , ε n x n 1 x n , if   x n 1 x n θ , otherwise .
Step 2.
Evaluate y n = x n + θ n ( x n x n 1 ) .
Step 3.
Evaluate
s n = i = 1 N ζ n ( i ) ( ( 1 β n ) I β n U i ) y n .
Step 4.
Find
z n = s n j = 1 M δ n ( j ) τ n ( j ) l ( j ) ( s n ) ,
where τ n ( j ) = ρ n l ( j ) ( s n ) ( η n ( j ) ) 2 for η n ( j ) = max { 1 , l ( j ) ( s n ) } .
Step 5.
Find x n + 1 = α n ( y n γ h ( y n ) ) + ( 1 α n ) z n .
Step 6.
Set n : = n + 1 and go to Step 1.
Remark 1.
From Condition (C7) and Step 1 of Algorithm 1, we have that
θ n α n x n x n 1 0 , n .
Since { α n } is bounded, we also have θ n x n x n 1 0 , n . Note that Step 1 of Algorithm 1 is easily implemented in numerical computation since the value of x n x n 1 is a priori known before choosing θ n .
Note that: Let V γ = I γ h , where γ ( 0 , 2 σ L h 2 ) . Then, we have
V γ ( x ) V γ ( y ) 2 x y 2 + γ 2 h ( x ) h ( y ) 2 2 γ h ( x ) h ( y ) , x y x y 2 + γ 2 L h 2 x y 2 2 γ σ x y 2 = μ 2 x y 2 , x , y H 1 ,
where for μ = 1 γ ( 2 σ γ L h 2 ) . Therefore, for γ ( 0 , 2 σ L h 2 ) , the mapping V γ is a contraction mapping with constant μ . Consequently, the mapping P Ω V γ is also a contraction mapping with constant μ , i.e., P Ω V γ ( x ) P Ω V γ ( y ) V γ ( x ) V γ ( y ) μ x y , x , y H 1 . Hence, by the Banach contraction principle, there exists a unique element x ¯ H 1 such that x ¯ = P Ω V γ ( x ¯ ) . Clearly, x ¯ Ω and we have
x ¯ = P Ω V γ ( x ¯ ) h ( x ¯ ) , y x ¯ 0 , y Ω .
Lemma 5.
For the sequences { s n } , { y n } and { z n } generated by Algorithm 1 and for x ¯ Ω , we have
(i) 
z n x ¯ 2 y n x ¯ 2 i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 .
(ii) 
z n x ¯ s n x ¯ y n x ¯ .
Proof. 
Let x ¯ Ω . Now, since I prox λ g j are firmly nonexpansive, and since A ( x ¯ ) is the minimizer of each g j , we have for all x H 1
l ( j ) ( x ) , x x ¯ = A * ( I prox λ g j ) A ( x ) , x x ¯ = ( I prox λ g j ) A ( x ) , A ( x ) A k ( x ¯ ) ( I prox λ g j ) A ( x ) 2 = 2 l ( j ) ( x ) .
By the definition of z n , we get
s n x ¯ 2 = i = 1 N ζ n ( i ) ( ( 1 β n ) I β n U i ) y n x ¯ 2 i = 1 N ζ n ( i ) ( ( 1 β n ) I β n U i ) y n x ¯ 2 = i = 1 N ζ n ( i ) ( y n x ¯ 2 + β n 2 ( I U i ) y n 2 2 β n y n x ¯ , ( I U i ) y n ) = i = 1 N ζ n ( i ) ( y n x ¯ 2 + β n 2 ( I U i ) y n 2 β n ( 1 ω i ) ( I U i ) y n 2 ) = y n x ¯ 2 i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 .
Using the definition of y n , Lemma 1 ( i i ) , and (13), we have
z n x ¯ 2 = s n j = 1 M δ n ( j ) τ n ( j ) l ( j ) ( s n ) x ¯ 2 j = 1 M δ n ( j ) s n τ n ( j ) l ( j ) ( s n ) x ¯ 2 j = 1 M δ n ( j ) s n x ¯ 2 + τ n ( j ) l ( j ) ( s n ) 2 2 τ n ( j ) l ( j ) ( s n ) , s n x ¯ j = 1 M δ n ( j ) s n x ¯ 2 + ( τ n ( j ) l ( j ) ( s n ) ) 2 4 τ n ( j ) l ( j ) ( s n ) j = 1 M δ n ( j ) s n x ¯ 2 + ( τ n ( j ) η n ( j ) ) 2 4 τ n ( j ) l ( j ) ( s n ) = j = 1 M δ n ( j ) s n x ¯ 2 + ρ n l ( j ) ( s n ) ( η n ( j ) ) 2 η n ( j ) 2 4 ρ n l ( j ) ( s n ) ( η n ( j ) ) 2 l ( j ) ( s n ) = s n x ¯ 2 ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 .
The result (i) follows from (14) and (15), and, in view of (C2)–(C6), the result (ii) follows from (14) and (15). □
Theorem 1.
The sequence { x n } generated by Algorithm 1 converges strongly to the solution of problem (7).
Proof. 
Claim 1: The sequences { x n } , { y n } and { z n } are bounded.
Let x ¯ Ω . Now, from the definition of y n , we get
y n x ¯ = x n + θ n ( x n x n 1 ) x ¯ x n x ¯ + θ n x n x n 1 .
Using (16) and the definition of x n + 1 , we get
x n + 1 x ¯ = ( 1 α n ) ( z n x ¯ ) + α n ( V γ ( y n ) V γ ( x ¯ ) ) + α n ( V γ ( x ¯ ) x ¯ ) = ( 1 α n ) z n x ¯ + α n V γ ( y n ) V γ ( x ¯ ) + α n V γ ( x ¯ ) x ¯ = ( 1 α n ) z n x ¯ + α n μ y n x ¯ + α n V γ ( x ¯ ) x ¯ ( 1 α n ( 1 μ ) ) y n x ¯ + α n V γ ( x ¯ ) x ¯ ( 1 α n ( 1 μ ) ) x n x ¯ + ( 1 α n ( 1 μ ) ) θ n x n x n 1 + α n V γ ( x ¯ ) x ¯ ( 1 α n ( 1 μ ) ) x n x ¯ + α n ( 1 μ ) ( 1 α n ( 1 μ ) ) 1 μ θ n α n x n x n 1 + V γ ( x ¯ ) x ¯ 1 μ .
Observe that, by (C6) and Remark 1, we see that
lim n ( 1 α n ( 1 μ ) ) 1 μ θ n α n x n x n 1 = 0 .
Let L ^ = 2 max V γ ( x ¯ ) x ¯ 1 μ , sup n 1 ( 1 α n ( 1 μ ) ) 1 μ θ n α n x n x n 1 .
Then, (17) becomes
x n + 1 x ¯ ( 1 α n ( 1 μ ) ) x n x ¯ + α n ( 1 μ ) L ^ .
Thus, by Lemma 2, the sequence { x n } is bounded. As a consequence, { y n } , { z n } { V γ ( y n ) } and { s n } are also bounded.
Claim 2: The sequence { x n } converges strongly to x ¯ Ω , where x ¯ = P Ω V γ ( x ¯ ) .
Now,
y n x ¯ 2 = x n + θ n ( x n x n 1 ) x ¯ 2 = x n x ¯ 2 + θ n 2 x n x n 1 2 + 2 θ n x n x ¯ , x n x n 1 .
From Lemma 1 ( i i i ) , we have
x n x ¯ , x n x n 1 = 1 2 x n x ¯ 2 1 2 x n 1 x ¯ 2 + 1 2 x n x n 1 2 .
From (18) and (19) and since 0 θ n < 1 , we get
y n x ¯ 2 = x n x ¯ 2 + θ n 2 x n x n 1 + θ n ( x n x ¯ 2 x n 1 x ¯ 2 + x n x n 1 2 ) x n x ¯ 2 + 2 θ n x n x n 1 2 + θ n ( x n x ¯ 2 x n 1 x ¯ 2 ) .
Using the definition of x n + 1 and Lemma 1 ( i i ) , we have
x n + 1 x ¯ 2 = α n V γ ( y n ) + ( 1 α n ) z n x ¯ 2 = α n ( V γ ( y n ) x ¯ ) + ( 1 α n ) ( z n x ¯ ) 2 ( 1 α n ) 2 z n x ¯ 2 + 2 α n V γ ( y n ) x ¯ , x n + 1 x ¯ z n x ¯ 2 + 2 α n V γ ( y n ) x ¯ , x n + 1 x ¯ .
Lemma 5 ( i ) together with (20) and (21) give
x n + 1 x ¯ 2 z n x ¯ 2 + 2 α n V γ ( y n ) x ¯ , x n + 1 x ¯ y n x ¯ 2 + 2 α n V γ ( y n ) x ¯ , x n + 1 x ¯ i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 x n x ¯ 2 + 2 θ n x n x n 1 2 + θ n ( x n x ¯ 2 x n 1 x ¯ 2 ) + 2 α n V γ ( y n ) x ¯ , x n + 1 x ¯ i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 .
Since the sequence { x n } and { V γ ( y n ) } are bounded, there exists M 1 such that 2 V γ ( y n ) x ¯ , x n + 1 x ¯ M 1 for all n 1 . Thus, from (22), we obtain
x n + 1 x ¯ 2 x n x ¯ 2 + 2 θ n x n x n 1 2 + θ n ( x n x ¯ 2 x n 1 x ¯ 2 ) + α n M 1 i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 .
Let us distinguish the following two cases related to the behavior of the sequence { Γ n } where Γ n = x n x ¯ 2 .
Case 1. Suppose the sequence { Γ n } decreases at infinity. Thus, there exists n 0 N such that Γ n + 1 Γ n for n n 0 . Then, { Γ n } converges and Γ n Γ n + 1 0 as n 0 .
From (23), we have
i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 ( Γ n Γ n + 1 ) + α n M 1 + θ n ( Γ n Γ n 1 ) + 2 θ n x n x n 1 2 ,
and
ρ n ( 4 ρ n ) j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 ( Γ n Γ n + 1 ) + α n M 1 + θ n ( Γ n Γ n 1 ) + 2 θ n x n x n 1 2 .
Since Γ n Γ n + 1 0 ( Γ n 1 Γ n 0 ) and using (C5), (C6), and Remark 1 (noting α n 0 , θ n x n x n 1 0 , { x n } is bounded and lim inf n ρ n ( 4 ρ n ) > 0 ); we have, from (24) and (25),
j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 0   and   i = 1 N ζ n ( i ) β n ( 1 ω i β n ) ( I U i ) y n 2 0 .
In view of (26) and conditions (C2)–(C6), we have
( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 0   and   ( I U i ) y n 0 , n ,
for all j { 1 , , M } and for all i { 1 , , N } .
Using (26), we have
s n z n 2 16 j = 1 M δ n ( j ) ( l ( j ) ( s n ) ) 2 ( η n ( j ) ) 2 0 , n .
Similarly, from (26), we have
y n s n 2 β 2 i = 1 N ζ n ( i ) ( I U i ) y n 2 0 , n .
Using the definition of y n and Remark 1, we have
x n y n = x n x n θ n ( x n x n 1 ) = θ n x n x n 1 0 , n .
Moreover, using the definiton of x n + 1 and boundedness of { V γ ( y n ) } and { z n } together with condition (C5), we have
x n + 1 z n = α n V γ ( y n ) z n 0 , n .
Therefore, from (28)–(31), we have
x n + 1 x n x n + 1 z n + z n s n + s n y n + x n y n 0 , n .
For each j { 1 , , M } , l ( j ) ( . ) are Lipschitz continuous with constant A 2 . Therefore, the sequence { ( η n ( j ) ) 2 } n = 1 is bounded sequence for each j { 1 , , M } , and hence, using (27), we have lim n l ( j ) ( s n ) = 0 for all j { 1 , , M } .
Let p be a weak cluster point of { x n } ; there exists a subsequence { x n l } of { x n } such that x n l p as l . Since x n y n 0 as n (from (30)), we have y n l p as l . Hence, using y n l p , (27) and demiclosedness of U i , we have p Fix U i for all i { 1 , , N } .
Moreover, since x n s n 0 as n (from (29) and (30)), we have s n l p as l . Hence, the weak lower-semicontinuity of l ( j ) ( . ) implies that
0 l ( j ) ( p ) lim inf l l ( j ) ( s n l ) = lim n l ( j ) ( s n ) = 0
for all j { 1 , , M } . That is, l ( j ) ( p ) = 1 2 ( I prox λ g j ) A ( p ) 2 = 0 for all j { 1 , , M } . Thus, A ( p ) Ω .
We now show that lim sup n ( I V γ ) x ¯ , x ¯ x n 0 . Indeed, since x ¯ = P Ω V γ ( x ¯ ) and, from above, p is a weak cluster point of { x n } , i.e., x n l p , and p Ω , we obtain that
lim sup n ( I V γ ) x ¯ , x ¯ x n = lim l ( I V γ ) x ¯ , x ¯ x n l = ( I V γ ) x ¯ , x ¯ p 0 .
Since x n + 1 x n 0 from (32), from (33), we obtain
lim sup n ( I V γ ) x ¯ , x ¯ x n + 1 0 .
Now, using Lemma 5 (ii), we get
x n + 1 x ¯ 2 = α n V γ ( y n ) + ( 1 α n ) z n x ¯ , x n + 1 x ¯ = α n V γ ( y n ) V γ ( x ¯ ) , x n + 1 x ¯ + ( 1 α n ) z n x ¯ , x n + 1 x ¯ + α n V γ ( x ¯ ) x ¯ , x n + 1 x ¯ μ α n y n x ¯ x n + 1 x ¯ + ( 1 α n ) z n x ¯ x n + 1 x ¯ + α n V ( x ¯ ) x ¯ , x n + 1 x ¯ ( 1 α n ( 1 μ ) ) y n x ¯ x n + 1 x ¯ + α n V γ ( x ¯ ) x ¯ , x n + 1 x ¯ ( 1 α n ( 1 μ ) ) y n x ¯ 2 2 + x n + 1 x ¯ 2 2 + α n V γ ( x ¯ ) x ¯ , x n + 1 x ¯ .
Therefore, from (35), we have
x n + 1 x ¯ 2 1 α n ( 1 μ ) 1 + α n ( 1 μ ) y n x ¯ 2 + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ = 1 2 α n ( 1 μ ) 1 + α n ( 1 μ ) y n x ¯ 2 + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ .
Combining (36) and
y n x ¯ = x n + θ n ( x n x n 1 ) x ¯ x n x ¯ + θ n x n x n 1 ,
it holds that
x n + 1 x ¯ 2 1 2 α n ( 1 μ ) 1 + α n ( 1 μ ) x n x ¯ + θ n x n x n 1 2 + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ = 1 2 α n ( 1 μ ) 1 + α n ( 1 μ ) ( x n x ¯ 2 + θ n 2 x n x n 1 2 + 2 θ n x n x ¯ x n x n 1 ) + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ 1 2 α n ( 1 μ ) 1 + α n ( 1 μ ) x n x ¯ 2 + θ n 2 x n x n 1 2 + 2 θ n x n x ¯ x n x n 1 + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ .
Since { x n } is bounded, there exists M 2 > 0 such that x n x ¯ M 2 for all n 1 . Thus, in view of (37), we have
Γ n + 1 1 2 α n ( 1 μ ) 1 + α n ( 1 μ ) Γ n + θ n x n x n 1 ( θ n x n x n 1 + 2 M 2 ) + 2 α n 1 + α n ( 1 μ ) V γ ( x ¯ ) x ¯ , x n + 1 x ¯ = 1 a n Γ n + a n ϑ n ,
where a n = 2 α n ( 1 μ ) 1 + α n ( 1 μ ) and
ϑ n = 1 + α n ( 1 μ ) 2 ( 1 μ ) θ n α n x n x n 1 θ n x n x n 1 + 2 M 2 + 1 1 μ V γ ( x ¯ ) x ¯ , x n + 1 x ¯ .
From (C5), Remark 1 and (34), we have n = 1 a n = and lim sup n ϑ n 0 . Thus, using Lemma 2 and (38), we get Γ n 0 as n . Hence, x n x ¯ as n .
Case 2. Assume that { Γ n } does not decrease at infinity. Let φ : N N be a mapping for all n n 0 (for some n 0 large enough) defined by
φ ( n ) = max { k N : k n , Γ k Γ k + 1 } .
By Lemma 3, { φ ( n ) } n = n 0 is a nondecreasing sequence, φ ( n ) as n and
Γ φ ( n ) Γ φ ( n ) + 1   and   Γ n Γ φ ( n ) + 1 , n n 0 .
In view of x φ ( n ) x ¯ 2 x φ ( n ) + 1 x ¯ 2 = Γ φ ( n ) Γ φ ( n ) + 1 0 for all n n 0 and (23), we have for all n n 0
ρ φ ( n ) ( 4 ρ φ ( n ) ) j = 1 M δ φ ( n ) ( j ) ( l ( j ) ( s φ ( n ) ) ) 2 ( η φ ( n ) ( j ) ) 2 ( Γ φ ( n ) Γ φ ( n ) + 1 ) + α φ ( n ) M 1 + θ φ ( n ) ( Γ φ ( n ) Γ φ ( n ) 1 ) + 2 θ φ ( n ) x φ ( n ) x φ ( n ) 1 2 α φ ( n ) M 1 + θ φ ( n ) ( Γ φ ( n ) Γ φ ( n ) 1 ) + 2 θ φ ( n ) x φ ( n ) x φ ( n ) 1 2 α φ ( n ) M 1 + θ φ ( n ) x φ ( n ) x φ ( n ) 1 Γ φ ( n ) + Γ φ ( n ) 1 + 2 θ φ ( n ) x φ ( n ) x φ ( n ) 1 2 .
Similarly, from (23), we have for all n n 0
i = 1 N ζ φ ( n ) ( i ) β φ ( n ) ( 1 ω i β φ ( n ) ) ( I U i ) y φ ( n ) 2 α φ ( n ) M 1 + θ φ ( n ) x φ ( n ) x φ ( n ) 1 Γ φ ( n ) + Γ φ ( n ) 1 + 2 θ φ ( n ) x φ ( n ) x φ ( n ) 1 2 .
Thus, for (40) and (41) together with (C3)–(C6) and Remark 1, we have for each j { 1 , M } and i { 1 , N } ,
( l ( j ) ( s φ ( n ) ) ) 2 ( η φ ( n ) ( j ) ) 2 0 ,   and   ( I U i ) y φ ( n ) 0 , n .
Using a similar procedure as above in Case 1, we have
lim n x φ ( n ) s φ ( n ) = lim n x φ ( n ) y φ ( n ) = lim n x φ ( n ) + 1 x φ ( n ) = 0 .
By the similar argument as above in Case 1, since { x φ ( n ) } is bounded, there exists a subsequence of { x φ ( n ) } which converges weakly to p Ω and this gives lim sup n ( I V γ ) x ¯ , x ¯ x φ ( n ) + 1 0 . Thus, from (38), we have
Γ φ ( n ) + 1 1 a φ ( n ) Γ φ ( n ) + a φ ( n ) ϑ φ ( n ) ,
where a φ ( n ) = 2 α φ ( n ) ( 1 μ ) 1 + α φ ( n ) ( 1 μ ) and
ϑ φ ( n ) = 1 + α φ ( n ) ( 1 μ ) 2 ( 1 μ ) ( θ φ ( n ) α φ ( n ) x φ ( n ) x φ ( n ) 1 ) { θ φ ( n ) x φ ( n ) x φ ( n ) 1 + 2 M 2 } + 1 1 μ V ( x ¯ ) x ¯ , x φ ( n ) + 1 x ¯ .
Using Γ φ ( n ) Γ φ ( n ) + 1 0 for all n n 0 and ϑ φ ( n ) > 0 , the last inequality gives
0 a φ ( n ) Γ φ ( n ) + a φ ( n ) ϑ φ ( n ) .
Since a φ ( n ) > 0 , we obtain x φ ( n ) x ¯ 2 = Γ φ ( n ) ϑ φ ( n ) . Moreover, since lim sup n ϑ φ ( n ) 0 , we have lim n x φ ( n ) x ¯ = 0 . Thus, lim n x φ ( n ) x ¯ = 0 together with lim n x φ ( n ) + 1 x φ ( n ) = 0 , gives lim n Γ φ ( n ) + 1 = 0 . Therefore, from (39), we obtain lim n Γ n = 0 , that is, x n x ¯ as n . □
For l ( x ) = 1 2 ( I prox λ g j ) ( x ) 2   and   l ( x ) = ( I prox λ g j ) ( x ) , we have the following results solving the bilevel problem (10):
Corollary 1.
If γ ( 0 , 2 σ L h 2 ) , the sequence { x n } generated by
y n = x n + θ n ( x n x n 1 ) , τ n = ρ n l ( y n ) ( η n ) 2 , η n = max { 1 , l ( y n ) } , z n = y n τ n l ( y n ) , x n + 1 = α n ( y n γ h ( y n ) ) + ( 1 α n ) z n ,
converges strongly to the solution the bilevel problem (10) if { α n } , { ρ n } and { θ n } are real sequences such that
(C1) 
0 < α n < 1 , lim n α n = 0 and n = 1 α n = .
(C2) 
0 < ρ n < 4 and lim inf n ρ n ( 4 ρ n ) > 0 .
(C3) 
lim n θ n α n x n x n 1 = 0 where θ n [ 0 , θ ) for θ [ 0 , 1 ) .

4. Applications

4.1. Application to the Bilevel Variational Inequality Problem

Let H 1 and H 2 be two real Hilbert spaces. Assume that F : H 1 H 1 is L h -Lipschitz continuous and σ -strongly monotone on H 1 , A : H 1 H 2 is a bounded linear operator, g j : H 2 R { + } is a proper, convex, lower semicontinuous function for all j { 1 , , M } , and U i : H 1 H 1 is ω i -demimetric and demiclosed mapping for all i { 1 , , N } . Then, replacing h by F in Algorithm 1, we obtain strong convergence for an approximation of a solution of the bilevel variational inequality problem
find   x ¯ Ω   such   that   F ( x ¯ ) , x x ¯ 0 , x Ω ,
where Ω is the solution set of
find   x i = 1 N Fix U i   such   that   A ( x ) j = 1 M arg min g j .

4.2. Application to a Bilevel Optimization Problem with a Feasibility Set Constraint, Inclusion Constraint, and Equilibrium Constraint

Let H 1 and H 2 be two real Hilbert spaces, A : H 1 H 2 be a linear transformation and h : H 1 R be proper, convex, continuously differentiable, and the gradient h is σ -strongly monotone operator and L h -Lipschitz continuous.
Now, consider the bilevel optimization problem with a feasibility set constraint
min   h s . t . A ( x ) i = 1 M Q j ,
where each Q j is a closed convex subset of H 2 for j { 1 , , M } . Replacing U i = I for all i { 1 , , N } and prox λ g j by projection mapping P Q j in Algorithm 1, we obtain strong convergence for an approximation of the solution of the bilevel problem (44).
Consider the bilevel optimization problem with inclusion constraint
min   h s . t . 0 i = 1 M G j ( A ( x ) ) ,
where G j : H 2 2 H 2 is maximal monotone mapping for j { 1 , , M } . Setting U i = I for all i { 1 , , N } and, replacing the proximal mapping g j in Algorithm 1 by the resolvent operators J λ G j = ( I + λ G j ) 1 (for λ > 0 ), and following the method of proof in theorems, we obtain a strong convergence result for approximation of the solution of the bilevel problem (45).
Consider the bilevel optimization problem with equilibrium constraint
min   h s . t . A ( x ) i = 1 M S E P ( g j , H 2 ) ,
where g j : H 2 × H 2 R is a bifunction and each g j satisfies condition C O on H 2 . We have strong convergence results solving (46) by setting U i = I for all i { 1 , , N } and replacing the proximal mappings by the resolvent operators T r g j in Algorithm 1 (see (11) and properties of it in Lemma 4 (i)–(iv)).

5. Numerical Example

Taking the bilevel optimization problem (7) for H 1 = R p , H 2 = R q , the linear transformations A : R p R q are given by A ( x ) = G q × p , where G q × p is q × p matrix, and for x H 1 = R p , z H 2 = R q , we have
h ( x ) = 1 2 x T D x + 1 2 x p 2 , U i ( x ) = ϵ i x , i { 1 , N } , g 1 ( z ) = 1 2 z T B z , g 2 ( z ) = z q , g 3 ( z ) = t = 1 q Φ ( z t ) ,
where D and B are invertible symmetric positive semidefinite p × p and q × q matrix, respectively, ϵ i 1 i { 1 , , N } , z = ( z 1 , , z q ) R q , . p is the Euclidean norm in R p , . q is the Euclidean norm in R q , and Φ ( z t ) = max { | z t | 1 , 0 } for t = 1 , 2 , , q .
Here, h ( x ) = f ( x ) + 1 2 x p 2 where f ( x ) = 1 2 x T D x and hence the gradient f is D -Lipschitz. Thus, the gradient h is 1-strongly monotone and ( D + 1 )-Lipschitz. We choose γ = 1 ( D + 1 ) 2 .
Now, for λ = 1 , the proximal g 1 , g 2 and g 3 is given by
prox λ g 1 ( z ) = ( I + B ) 1 ( z ) , i Φ ,
prox λ g 2 ( z ) = 1 1 z q z , z q 1 0 , otherwise .
and prox λ g 3 ( z ) = ( prox λ Φ ( z 1 ) , , prox λ Φ ( z q ) ) , where
prox λ Φ ( z t ) = z t , if   | z t | < 1 sign ( z t ) , if   1 | z t | 2 sign ( z t 1 ) , if   | z t | > 2 .
We consider for p = q , ϵ i = 1 i + 1 for i { 1 , , N } and G q × p = I p × p , where I p × p is identity p × p matrix. The parameters are chosen are β n ( i ) = i 1 + + N for i { 1 , , N } , δ n ( j ) = j 6 for j { 1 , 2 , 3 } , α n = 1 n + 1 , ε n = 1 ( n + 1 ) 2 , ρ n = 1 and θ n = θ ¯ n .
For the purpose of testing our algorithm, we took the following data:
  • D and B are randomly generated invertible symmetric positive semidefinite p × p matrices, respectively.
  • x 0 and x 1 are randomly generated starting points.
  • The stopping criteria x n + 1 x n x 2 x 1 < T O L .
Table 1 and Table 2 and Figure 1 illustrate the numerical results of our algorithms for this example under the parameters and data given above and for θ = 0.5 . The number of iterations (Iter(n)), CPU time in seconds (CPU(s)), and the error e r r ( n ) = x n x ¯ , where x ¯ is the solution set of the bilevel optimization problem ( x ¯ = 0 here in this example), are reported in Table 1.
We now compare our algorithm for different θ n , i.e., for non-inertial accelerated case ( θ n = 0 ) and for inertial accelerated case ( θ n 0 ). For the non-inertial accelerated case, we just simply take θ = 0 , and, for the inertial accelerated case, we take a very small θ with θ ( 0 , 1 ) so that θ n = θ ¯ n = θ . Numerical comparisons of our proposed algorithm with inertial version ( θ n 0 ) and its non-inertial version ( θ n = 0 ) are presented in Table 3.
Remark 2.
Table 1 and Table 2 show that the CPU time and number of iterations of the algorithm increase linearly with the size or complexity of the problem (with the size of dimension p and q, number of mappings R and N, and number of functions M). From Table 3, we can see that our algorithm has a better performance for the stepsize choice θ n 0 . This implies that the inertial version of our algorithm has a better convergence analysis.

6. Conclusions

In this paper, we have proposed the problem of minimizing a convex function over the solution set of the split feasiblity problem of fixed point problems of demimetric mappings and constrained minimization problems of nonsmooth convex functions. We have showed that this problem can be solved by proximal and gradient methods where the gradient method is used for an upper level problem and the proximal method is used for a lower level problem. Most of the standard bilevel problems are particular cases of our framework.

Author Contributions

S.E.Y., P.K. and A.G.G. contributed equally in this research paper particularly on the conceptualization, methodology, validation, formal analysis, resource, and writing and preparing the original draft of the manuscript; however, P.K. fundamentally plays a great role in supervision and funding acquisition as well. Moreover, A.G.G. particularly wrote the code and run the algorithm in the MATLAB program. All authors have read and agreed to the published version of the manuscript.

Funding

Funding was received from the Petchra Pra Jom Klao Ph.D. Research Scholarship from King Mongkut’s University of Technology Thonburi (KMUTT) and Theoretical and Computational Science (TaCS) Center. Moreover, Poom Kumam was supported by the Thailand Research Fund and the King Mongkut’s University of Technology Thonburi under the TRF Research Scholar Grant No.RSA6080047. The Rajamangala University of Technology Thanyaburi (RMUTTT) (Grant No. NSF62D0604).

Acknowledgments

The authors acknowledge the financial support provided by the Center of Excellence in Theoretical and Computational Science (TaCS-CoE), KMUTT. Seifu Endris Yimer is supported by the Petchra Pra Jom Klao Ph.D. Research Scholarship from King Mongkut’s University of Technology Thonburi (Grant No.9/2561).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Su, M.; Xu, H.K. Remarks on the gradient-projection algorithm. J. Nonlinear Anal. Optim. 2010, 1, 35–43. [Google Scholar]
  2. Xu, H.K. Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 2011, 150, 360–378. [Google Scholar] [CrossRef]
  3. Ceng, L.C.; Ansari, Q.H.; Yao, J.C. Some iterative methods for finding fixed points and for solving constrained convex minimization problems. Nonlinear Anal. Theor. 2011, 74, 5286–5302. [Google Scholar] [CrossRef]
  4. Rockafellar, R.T.; Wets, R.J. Variational Analysis; Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
  5. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
  6. Martinet, B. Brève communication. Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle Série Rouge 1970, 4, 154–158. [Google Scholar] [CrossRef] [Green Version]
  7. Martinet, B. Détermination approchée d’un point fixe d’une application pseudo-contractante. CR Acad. Sci. Paris 1972, 274, 163–165. [Google Scholar]
  8. Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef] [Green Version]
  9. Censor, Y.; Gibali, A.; Reich, S. Algorithms for the split variational inequality problem. Numer. Algorithms 2012, 59, 301–323. [Google Scholar] [CrossRef]
  10. Hendrickx, J.M.; Olshevsky, A. Matrix p-Norms Are NP-Hard to Approximate If p ≠ 1,2,∞. SIAM J. Matrix Anal. A 2010, 31, 2802–2812. [Google Scholar] [CrossRef] [Green Version]
  11. Censor, Y.; Elfving, T. A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 1994, 8, 221–239. [Google Scholar] [CrossRef]
  12. Qu, B.; Xiu, N. A note on the CQ algorithm for the split feasibility problem. Inverse Probl. 2005, 21, 1655. [Google Scholar] [CrossRef]
  13. Byrne, C. Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 2002, 18, 441. [Google Scholar] [CrossRef]
  14. López, G.; Martín-Márquez, V.; Wang, F.; Xu, H.K. Solving the split feasibility problem without prior knowledge of matrix norms. Inverse Probl. 2012, 28, 085004. [Google Scholar] [CrossRef]
  15. Yimer, S.E.; Kumam, P.; Gebrie, A.G.; Wangkeeree, R. Inertial Method for Bilevel Variational Inequality Problems with Fixed Point and Minimizer Point Constraints. Mathematics 2019, 7, 841. [Google Scholar] [CrossRef] [Green Version]
  16. Dempe, S.; Kalashnikov, V.; Pérez-Valdés, G.A.; Kalashnykova, N. Bilevel programming problems. In Energy Systems; Springer: Berlin, Germany, 2015. [Google Scholar]
  17. Anh, T.V. Linesearch methods for bilevel split pseudomonotone variational inequality problems. Numer. Algorithms 2019, 81, 1067–1087. [Google Scholar] [CrossRef]
  18. Anh, T.V.; Muu, L.D. A projection-fixed point method for a class of bilevel variational inequalities with split fixed point constraints. Optimization 2016, 65, 1229–1243. [Google Scholar] [CrossRef]
  19. Yuying, T.; Van Dinh, B.; Plubtieng, S. Extragradient subgradient methods for solving bilevel equilibrium problems. J. Inequal. Appl. 2018. [Google Scholar] [CrossRef] [Green Version]
  20. Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Method Softw. 2019. [Google Scholar] [CrossRef]
  21. Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optimiz. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
  22. Boţ, R.I.; Csetnek, E.R.; Nimana, N. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J. Math. 2018, 46, 53–71. [Google Scholar] [CrossRef] [Green Version]
  23. Solodov, M. An explicit descent method for bilevel convex optimization. J. Convex Anal. 2007, 14, 227. [Google Scholar]
  24. Chang, S.S.; Quan, J.; Liu, J. Feasible iterative algorithms and strong convergence theorems for bi-level fixed point problems. J. Nonlinear Sci. Appl. 2016, 9, 1515–1528. [Google Scholar] [CrossRef] [Green Version]
  25. Didi-Biha, M.; Marcotte, P.; Savard, G. Path-based formulations of a bilevel toll setting problem. In Optimization with Multivalued Mappings; Springer: Boston, MA, USA, 2006; pp. 29–50. [Google Scholar]
  26. Mohideen, M.J.; Perkins, J.D.; Pistikopoulos, E.N. Optimal design of dynamic systems under uncertainty. AIChE J. 1996, 42, 2251–2272. [Google Scholar] [CrossRef]
  27. Fampa, M.; Barroso, L.A.; Candal, D.; Simonetti, L. Bilevel optimization applied to strategic pricing in competitive electricity markets. Comput. Optim. Appl. 2008, 39, 121–142. [Google Scholar] [CrossRef]
  28. Calvete, H.I.; Galé, C.; Oliveros, M.J. Bilevel model for production–distribution planning solved by using ant colony optimization. Comput. Oper. Res. 2011, 38, 320–327. [Google Scholar] [CrossRef]
  29. Takahashi, S.; Takahashi, W. The split common null point problem and the shrinking projection method in Banach spaces. Optimization 2016, 65, 281–287. [Google Scholar] [CrossRef]
  30. Song, Y. Iterative methods for fixed point problems and generalized split feasibility problems in Banach spaces. J. Nonlinear Sci. Appl. 2018, 11, 198–217. [Google Scholar] [CrossRef] [Green Version]
  31. Cabot, A. Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization. SIAM J. Optim. 2005, 15, 555–572. [Google Scholar] [CrossRef]
  32. Helou, E.S.; Simões, L.E. ϵ-subgradient algorithms for bilevel convex optimization. Inverse Probl. 2017, 33, 5. [Google Scholar] [CrossRef]
  33. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–7. [Google Scholar] [CrossRef]
  34. Lemaire, B. Which fixed point does the iteration method select? In Recent Advances in Optimization; Springer: Berlin/Heidelberg, Germany, 1997; pp. 154–167. [Google Scholar]
  35. Xu, H.K. Iterative algorithms for nonlinear operators. J. Lond. Math. Soc. 2002, 66, 240–256. [Google Scholar] [CrossRef]
  36. Maingé, P.E. Strong convergence of projected subgradient methods for nonsmooth and nonstrictly convex minimization. Set Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
  37. Combettes, P.L.; Hirstoaga, S.A. Equilibrium programming in Hilbert spaces. J. Nonlinear Convex Anal. 2005, 6, 117–136. [Google Scholar]
  38. Aubin, J.P. Optima and Equilibria: An Introduction to Nonlinear Analysis; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Figure 1. Algorithm 1 for different N and different dimensions p = q .
Figure 1. Algorithm 1 for different N and different dimensions p = q .
Mca 25 00066 g001
Table 1. Performance of Algorithm 1 for different N and different dimensions p = q with T O L = 10 3 .
Table 1. Performance of Algorithm 1 for different N and different dimensions p = q with T O L = 10 3 .
Iter(n)CPU(s) err ( n )
N = 3 p = 3 110.06000.3995
p = 8 160.08540.4194
p = 16 240.08010.3935
N = 6 p = 40 820.089200.5332
p = 80 1310.161090.8043
p = 150 2500.217230.9099
N = 10 p = 50 990.12950.6543
p = 100 1370.14630.7004
p = 200 2630.29690.7841
Table 2. Performance of Algorithm 1 for N = 8 and different p = q with T O L = 10 4 .
Table 2. Performance of Algorithm 1 for N = 8 and different p = q with T O L = 10 4 .
p = 5 p = 15
Iter(n)CPU(s) err ( n ) CPU(s) err ( n )
1 114.1709 372.5614
2 112.5227 367.2210
3 110.6359 361.0786
4 108.5059 354.1380
5 106.1671 346.5137
100 0.0428 3.2360
101 0.0416 3.0748
181 0.0385 2.8765
1820.37020.0375 2.6385
207 0.0756
208 0.53110.0746
Table 3. Performance of Algorithm 1 for for different dimensions θ n , N = 3 with T O L = 10 3 .
Table 3. Performance of Algorithm 1 for for different dimensions θ n , N = 3 with T O L = 10 3 .
P = 4 P = 20
Iter(n)CPU(s) err ( n ) Iter(n)CPU(s) err ( n )
θ = 0.1 120.08110.3295190.10050.4401
θ = 0.01 110.08440.3112170.09680.3224
θ = 0 160.09600.5255240.12030.4362

Share and Cite

MDPI and ACS Style

Yimer, S.E.; Kumam, P.; Gebrie, A.G. Proximal Gradient Method for Solving Bilevel Optimization Problems. Math. Comput. Appl. 2020, 25, 66. https://doi.org/10.3390/mca25040066

AMA Style

Yimer SE, Kumam P, Gebrie AG. Proximal Gradient Method for Solving Bilevel Optimization Problems. Mathematical and Computational Applications. 2020; 25(4):66. https://doi.org/10.3390/mca25040066

Chicago/Turabian Style

Yimer, Seifu Endris, Poom Kumam, and Anteneh Getachew Gebrie. 2020. "Proximal Gradient Method for Solving Bilevel Optimization Problems" Mathematical and Computational Applications 25, no. 4: 66. https://doi.org/10.3390/mca25040066

APA Style

Yimer, S. E., Kumam, P., & Gebrie, A. G. (2020). Proximal Gradient Method for Solving Bilevel Optimization Problems. Mathematical and Computational Applications, 25(4), 66. https://doi.org/10.3390/mca25040066

Article Metrics

Back to TopTop