Proximal Gradient Method for Solving Bilevel Optimization Problems

Yimer, Seifu Endris; Kumam, Poom; Gebrie, Anteneh Getachew

doi:10.3390/mca25040066

Open AccessArticle

Proximal Gradient Method for Solving Bilevel Optimization Problems

by

Seifu Endris Yimer

^1,2

,

Poom Kumam

^1,3,*

and

Anteneh Getachew Gebrie

²

¹

KMUTTFixed Point Research Laboratory, SCL 802 Fixed Point Laboratory & Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi (KMUTT), 126 Pracha-Uthit Road, Bang Mod, Thrung Khru, Bangkok 10140, Thailand

²

Department of Mathematics, College of Computational and Natural Science, Debre Berhan University, P.O. Box 445 Debre Berhan, Ethiopia

³

Center of Excellence in Theoretical and Computational Science (TaCS-CoE), Science Laboratory Building, Faculty of Science, King Mongkut’s University of Technology Thonburi (KMUTT), 126 Pracha-Uthit Road, Bang Mod, Thrung Khru, Bangkok 10140, Thailand

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2020, 25(4), 66; https://doi.org/10.3390/mca25040066

Submission received: 1 June 2020 / Revised: 16 August 2020 / Accepted: 17 August 2020 / Published: 4 October 2020

(This article belongs to the Special Issue Numerical and Evolutionary Optimization 2019)

Download

Browse Figure

Versions Notes

Abstract

:

In this paper, we consider a bilevel optimization problem as a task of finding the optimum of the upper-level problem subject to the solution set of the split feasibility problem of fixed point problems and optimization problems. Based on proximal and gradient methods, we propose a strongly convergent iterative algorithm with an inertia effect solving the bilevel optimization problem under our consideration. Furthermore, we present a numerical example of our algorithm to illustrate its applicability.

Keywords:

bilevel problem; fixed point problem; proximal method; gradient method

1. Introduction

Let H be a real Hilbert space and consider the constrained minimization problem:

\begin{array}{l} \min h \\ s . t . x \in C \end{array}

(1)

where C is a nonempty closed convex subset of H and

h : H \to R

is a convex and continuously differentiable function. The gradient–projection algorithm (GPA, for short) is usually applied to solve the minimization problem (1) and has been studied extensively by many authors; see, for instance, [1,2,3] and references therein. This algorithm generates a sequence

{x_{n}}

through the recursion:

x_{n + 1} = P_{C} (x_{n} - γ \nabla h (x_{n})),

(2)

where

\nabla h

is the gradient of h,

x_{0}

is the initial guess chosen arbitrarily from C,

γ

is a stepsize which may be chosen in different ways, and

P_{C}

is the metric projection from H onto C. By the optimality condition on problem (1), it follows that

\begin{matrix} \bar{x} \in C solves (1) if and only if 〈 \nabla h (\bar{x}), y - \bar{x} 〉 \geq 0, \forall y \in C . \end{matrix}

If

\nabla h

is Lipschitz continuous and strongly monotone, i.e., there exists

L_{h} > 0

and

σ > 0

such that for all

x, y \in H

,

∥ \nabla h (x) - \nabla h (y) ∥ \leq L_{h} ∥ x - y ∥ and 〈 \nabla h (x) - \nabla h (y), x - y 〉 \geq {σ ∥ x - y ∥}^{2},

then the operator

T_{γ} = P_{C} (I - γ \nabla h)

is a contraction provided that

0 < γ < \frac{2 σ}{L_{h}^{2}}

. Therefore, for

0 < γ < \frac{2 σ}{L_{h}^{2}}

, we can apply Banach’s contraction principle to get that the sequence

{x_{n}}

defined by (2) converges strongly to the unique fixed point of

T_{γ}

(or the unique solution of the minimization (1)). Moreover, if you set

C = H

in (1), then we have an unconstrained optimization problem, and hence the gradient algorithm

x_{n + 1} = x_{n} - γ \nabla h (x_{n})

generates a sequence

{x_{n}}

strongly convergent to the global minimizer point of h.

Consider the other most well-known problem called unconstrained minimization problem:

\begin{array}{l} \min g \\ s . t . x \in H, \end{array}

(3)

where H is a real Hilbert space and

g : H \to R \cup {+ \infty}

is a proper, convex, lower semicontinuous function. An analogous method for solving (3) with better properties is based on the notion of proximal mapping introduced by Moreau [4], i.e., the proximal operator of the function g with scaling parameter

λ > 0

is a mapping

{prox}_{λ g} : H \to H

given by

{prox}_{λ g} (x) = arg \min_{y \in H} {g (y) + \frac{1}{2 λ} {∥ x - y ∥}^{2}} .

Proximal operators are firmly nonexpansive and the optimality condition of (3) is

\begin{matrix} \bar{x} \in H solves (3) if and only if {prox}_{λ g} (\bar{x}) = \bar{x} . \end{matrix}

Many properties of proximal operator can be found in [5] and the references therein. We know that the so called proximal point algorithm, i.e.,

x_{n + 1} = {prox}_{λ g} (x_{n}),

is the most popular method solving optimization problem (3) (introduced by Martinet [6,7] and later by Rockafellar [8]).

The split inverse problem (SIP) [9] is formulated by linking problems installed in two different places X and Y connected by a linear transformations, i.e., SIP is a problem of finding a point in space X solving a problem IP1 installed in X and its image under linear transformation solves a problem IP2 installed in another space Y. The presence of step size choice dependent on operator norm is not quite recommended in the iterative method of solving SIPs, as it is not always easy to estimate the norm of an operator; see, for example, the Theorem of Hendrickx and Olshevsky in [10]. For example, in the early study of the iterative method of solving the split feasibility problem [11,12,13], the determination of the step-size depends on the operator norm (or at least estimate value of the operator norm) and this is not as easy of a task. To overcome this difficulty, Lopez et al. [14] introduced a new way of selecting the step sizes that the information of operator norm is not necessary for solving a split feasibility problem (SFP):

find \bar{x} \in C such that A \bar{x} \in Q

where C and Q are closed convex subsets of real Hilbert spaces

H_{1}

and

H_{2}

, respectively. To be precise, Lopez et al. [14] introduced an iterative algorithm that generates a sequence

{x_{n}}

by

x_{n + 1} = P_{C} (I - γ_{n} A^{*} (I - P_{Q}) A) x_{n} .

(4)

The parameter

γ_{n}

appeared in (4) by

γ_{n} = \frac{ρ_{n} l (x_{n})}{∥ \nabla l (x_{n}) ∥^{2}}, \forall n \geq 1,

where

ρ_{n} \in (0, 4)

,

l (x_{n}) = \frac{1}{2} {∥ (I - P_{Q}) A x_{n} ∥}^{2}

and

\nabla l (x_{n}) = A^{*} (I - P_{Q}) A x_{n}

.

A bilevel problem is a two-level hierarchical problem such that the solution of the lower level problem determines the feasible space of the upper level problem. In general, Yimer et al. [15] presented a bilevel problem as an archetypal model given by

\begin{matrix} f i n d \bar{x} \in S \subset X t h a t s o l v e s p r o b l e m P 1 i n s t a l l e d i n a s p a c e X, \end{matrix}

(5)

where S is the solution set of the problem

\begin{matrix} f i n d x^{*} \in Y \subset X t h a t s o l v e s p r o b l e m P 2 i n s t a l l e d i n a s p a c e X . \end{matrix}

(6)

According to [16], the bilevel problem (problem (5) and (6)) is a hierarchical game of two players as decision makers who make their decisions according to a hierarchical order. The problem is also called the leader’s and follower’s problem where the problem (5) is called the leader’s problem and (6) is called the follower’s problem, meaning, the first player (which is called the leader) makes his selection first and communicates it to the second player (the so-called follower). There are many studies for several type bilevel problems, see, for example, [15,17,18,19,20,21,22,23,24]. The bilevel optimization problem is a bilevel problem when the hierarchical structure involves the optimization problem. Bilevel optimization problems have become an increasingly important class of optimization problems during the last few years and decades due their to vast application of solving the real life problems. For example, in toll-setting problem [25], in chemical engineering [26], in electricity markets [27], and in supply chain problems [28].

Motivated by the above theoretical results and inspired by the applicability of the bilevel problem, we consider the following bilevel optimization problem given by

\begin{array}{l} \min h \\ s . t . x \in ⋂_{i = 1}^{N} Fix U_{i}, \\ A (x) \in ⋂_{j = 1}^{M} arg min g_{j}, \end{array}

(7)

where

A : H_{1} \to H_{2}

is a linear transformation,

h : H_{1} \to R

is convex function,

g_{j} : H_{2} \to R \cup {+ \infty}

is convex nonsmooth function, and

arg min g_{j} = {\bar{z} \in H_{2} : g_{j} (\bar{z}) \leq g_{j} (z), \forall z \in H_{2}}

for

j \in {1, \dots, M}

,

U_{i} : H_{1} \to H_{1}

is demimetric mapping and

Fix U_{i} = {x \in H_{1} : U_{i} (x) = x}

for

i \in {1, \dots, N}

, and

H_{1}

and

H_{2}

are two real Hilbert spaces.

For a real Hilbert space H, the mapping

U : H \to H

with

Fix U \neq \emptyset

is called

ω

-demimetric if

ω \in (- \infty, 1)

and

〈 x - \bar{x}, x - U x 〉 \geq \frac{1 - ω}{2} {∥ x - U x ∥}^{2}, \forall x \in H, \forall \bar{x} \in Fix U .

(8)

The demimetric mapping is introduced by Takahashi [29] in a smooth, strictly convex and reflexive Banach space. For a real Hilbert space H, (8) is equivalent to the following:

∥ U x - \bar{x} ∥^{2} \leq ∥ x - \bar{x} ∥^{2} + ω {∥ x - U x ∥}^{2}, \forall x \in H, \forall \bar{x} \in Fix U,

and

Fix U

is a closed and convex subset of H [29]. The class of demimetric mappings contains the classes of strict pseudocontractions, firmly quasi-nonexpansive mappings, and quasi-nonexpansive mappings, see [29,30] and the references therein.

Assume that

Ω

is the set of solutions of lower level problems of the bilevel optimization problem (7), that is,

Ω = \{x \in ⋂_{i = 1}^{N} Fix U_{i} : A (x) \in ⋂_{j = 1}^{M} arg min g_{j}\} .

(9)

Therefore, the bilevel optimization problem (7) is simply

\begin{array}{l} \min h \\ s . t . x \in Ω, \end{array}

where

Ω

is given by (9). If

H_{1} = H_{2} = H

,

A = I

(identity operator),

g = g_{j}

for all

j \in {1, \dots, M}

, the problem (7) is reduced to the bilevel optimization problem:

\begin{array}{l} \min h \\ s . t . x \in arg min g . \end{array}

(10)

Bilevel problems like (10) have already been considered in the literature, for example, [23,31,32] for the case

H = R^{p}

.

Note that, to the best of our knowledge, the bilevel optimization problem (7), with a finite intersection of fixed point sets of the broadest class of nonlinear mappings and finite intersection of minimize point sets of non-smooth functions as a lower level, has not been addressed before.

An inertial term is a two-step iterative method, and the next iterate is defined by making use of the previous two iterates. It is firstly introduced by Polyak [33] as an acceleration process in solving a smooth convex minimization problem. It is well known that combining algorithms with an inertial term speeds up or accelerates the rate of convergence of the sequence generated by the algorithm. In this paper, we introduce a proximal gradient inertial algorithm with a strong convergence result for approximating a bilevel optimization problem (7), where our algorithm is designed to address a way of selecting the step-sizes such that its implementation does not need any prior information about the operator norm.

2. Preliminary

Let C be a nonempty closed convex subset of a real Hilbert space H. The metric projection on C is a mapping

P_{C} : H \to C

defined by

P_{C} (x) = arg min {∥ y - x ∥ : y \in C}, x \in H .

For

x \in H

and

z \in C

, then

z = P_{C} (x)

if and only if

〈 x - z, y - z 〉 \leq 0, \forall y \in C .

Let

T : H \to H

. Then,

(a): T is L-Lipschitz if there exists $L > 0$ such that

$∥ T x - T y ∥ \leq L ∥ x - y ∥, \forall x, y \in H .$

If $L \in (0, 1)$ , then we call T a contraction with constant L. If $L = 1$ , then T is called a nonexpansive mapping.
(b): T is strongly monotone if there exists $σ > 0$ such that

$〈 T x - T y, x - y 〉 \geq {σ ∥ x - y ∥}^{2}, \forall x, y \in H .$

In this case, T is called $σ$ -strongly monotone.
(c): T is firmly nonexpansive if

${∥ T x - T y ∥}^{2} \leq {∥ x - y ∥}^{2} {- | | (I - T) x - (I - T) y ∥}^{2}, \forall x, y \in H,$

which is equivalent to

${∥ T x - T y ∥}^{2} \leq 〈 T x - T y, x - y 〉, \forall x, y \in H .$

If T is firmly nonexpansive, $I - T$ is also firmly nonexpansive.

Let H be a real Hilbert space. If

G : H \to 2^{H}

is maximal monotone set-valued mapping, then we define the resolvent operator

J_{λ}^{G}

associated with G and

λ > 0

as follows:

J_{λ}^{G} (x) = {(I + λ G)}^{- 1} (x), x \in H .

It is well known that

J_{λ}^{G}

is single-valued, nonexpansive, and 1-inverse strongly monotone (firmly nonexpansive). Moreover,

0 \in G (\bar{x})

if and only if

\bar{x}

is a fixed point of

J_{λ}^{G}

for all

λ > 0

; see more about maximal monotone and its associated resolvent operator and examples of maximal monotone operators in [34].

The subdifferential of a convex function

f : H \to R \cup {+ \infty}

at

x \in H

, denoted by

\partial f (x)

, is defined by

\partial f (x) = {ξ \in H : f (z) \geq f (x) + 〈 ξ, z - x 〉, \forall z \in H} .

If

\partial f (x) \neq \emptyset

, f is said to be subdifferentiable at x. If the function f is continuously differentiable, then

\partial f (x) = {\nabla f (x)}

; this is the gradient of f. If f is a proper, lower semicontinuous function, the subdifferential operator is a maximal monotone operator, and the proximal operator is the resolvent of the subdifferential operator (see, for example, in [5]), i.e.,

{prox}_{λ f} = J_{λ}^{\partial f} = {(I + λ \partial f)}^{- 1}

Thus, this results in proximal operators being firmly nonexpansive, and a point

\bar{x}

minimizes f if and only if

{prox}_{λ f} (\bar{x}) = \bar{x} .

Definition 1.

Let H be a real Hilbert space. A mapping

U : H \to H

is called demiclosed if, for a sequence

{x_{n}}

in H such that

{x_{n}}

converges weakly to

\bar{x}

and

lim_{n \to \infty} ∥ x_{n} - U x_{n} ∥ = 0

,

U \bar{x} = \bar{x}

holds.

Lemma 1.

For a real Hilbert space H, we have

(i): ${∥ x + y ∥}^{2} = {∥ x ∥}^{2} + {∥ y ∥}^{2} + 2 〈 x, y 〉, \forall x, y \in H;$
(ii): ${∥ x + y ∥}^{2} \leq {∥ x ∥}^{2} + 2 〈 y, x + y 〉, \forall x, y \in H;$
(iii): $〈 x, y 〉 = \frac{1}{2} {∥ x ∥}^{2} + \frac{1}{2} {∥ y ∥}^{2} - \frac{1}{2} {∥ x - y ∥}^{2}, \forall x, y \in H .$

Lemma 2.

[35] Let

{c_{n}}

and

{γ_{n}}

be a sequences of nonnegative real numbers,

{β_{n}}

be a sequences of real numbers such that

c_{n + 1} \leq (1 - α_{n}) c_{n} + β_{n} + γ_{n}, n \geq 1,

where

0 < α_{n} < 1

and

\sum γ_{n} < \infty

.

(i): If $β_{n} \leq α_{n} M$ for some $M \geq 0$ , then ${c_{n}}$ is a bounded sequence.
(ii): If $\sum α_{n} = \infty$ and $\underset{n \to \infty}{lim sup} \frac{β_{n}}{α_{n}} \leq 0$ , then $c_{n} \to 0$ as $n \to \infty$ .

Definition 2.

Let

{Γ_{n}}

be a real sequence. Then,

{Γ_{n}}

decreases at infinity if there exists

n_{0} \in N

such that

Γ_{n + 1} \leq Γ_{n}

for

n \geq n_{0}

. In other words, the sequence

{Γ_{n}}

does not decrease at infinity, if there exists a subsequence

{Γ_{n_{t}}}_{t \geq 1}

of

{Γ_{n}}

such that

Γ_{n_{t}} < Γ_{n_{t} + 1}

for all

t \geq 1

.

Lemma 3.

[36] Let

{Γ_{n}}

be a sequence of real numbers that does not decrease at infinity. In addition, consider the sequence of integers

{φ (n)}_{n \geq n_{0}}

defined by

φ (n) = max {k \in N : k \leq n, Γ_{k} \leq Γ_{k + 1}} .

Then,

{φ (n)}_{n \geq n_{0}}

is a nondecreasing sequence verifying

lim_{n \to \infty} φ (n) = 0

, and, for all

n \geq n_{0}

, the following two estimates hold:

Γ_{φ (n)} \leq Γ_{φ (n) + 1} a n d Γ_{n} \leq Γ_{φ (n) + 1} .

Let D be a closed, convex subset of a real Hilbert space H and

g : D \times D \to R

be a bifunction. Then, we say that g satisfies condition

C O

on D if the following four assumptions are satisfied:

(a): $g (u, u) = 0$ , for all $u \in D$ ;
(b): g is monotone on D, i.e., $g (u, v) + g (v, u) \leq 0$ , for all $u, v \in D$ ;
(c): for each $u, v, w \in D$ , ${lim sup}_{α ↓ 0} g (α w + (1 - α) u, v) \leq g (u, v);$
(d): $g (u, .)$ is convex and lower semicontinuous on D for each $u \in D$ .

Lemma 4.

[37] (Lemma 2.12) Let g satisfy condition CO on D. Then, for each

r > 0

and

u \in H_{2}

, define a mapping (called resolvant of g), given by

T_{r}^{g} (u) = {w \in D : g (w, v) + \frac{1}{r} 〈 v - w, w - u 〉 \geq 0, \forall v \in D} .

(11)

Then, the following holds:

(i): $T_{r}^{g}$ is single-valued;
(ii): $T_{r}^{g}$ is a firmly nonexpansive, i.e., for all $u, v \in H$ ,

$∥ T_{r}^{g} (u) - T_{r}^{g} {(v) ∥}^{2} \leq 〈 T_{r}^{g} (u) - T_{r}^{g} (v), u - v 〉;$
(iii): Fix $(T_{r}^{g}) = S E P (g, D) = {\bar{x} \in D : g (\bar{x}, y) \geq 0, \forall y \in D}$ , where Fix $(T_{r}^{g})$ is the fixed point set of $T_{r}^{g}$ ;
(iv): SEP $(g, D)$ is closed and convex.

3. Main Results

Our approach here is based on taking an existing algorithm on (1), (3), and the fixed point problem of nonlinear mapping, and determining how it can be used in the setting of bilevel optimization problem (7) considered in this paper. We present a self-adaptive proximal gradient algorithm with an inertial effect for generating a sequence that converges to the unique solution of the bilevel optimization problem (7) under the the following basic assumptions.

Assumption 1.

Assume that A, h,

g_{j}

(

j \in {1, \dots, M}

) and

U_{i}

(

i \in {1, \dots, N}

) in a bilevel optimization problem (7) satisfies

A1.: Each A is nonzero bounded linear operator;
A2.: h is proper, convex, continuously differentiable, and the gradient $\nabla h$ is a σ-strongly monotone operator and $L_{h}$ -Lipschitz continuous;
A3.: Each $U_{i}$ is $ω_{i}$ -demimetric and demiclosed mapping for all $i \in {1, \dots, N}$ ;
A4.: Each $g_{j}$ is a proper, convex, lower semicontinuous function for all $j \in {1, \dots, M}$ .

Assumption 2.

Let

θ \in [0, 1)

and γ be a real number, and the real sequences

{β_{n}^{(i)}}

(

i \in {1, \dots, N}

),

{δ_{n}^{(j)}}

(

j \in {1, \dots, M}

),

{α_{n}}

,

{ρ_{n}}

,

{ε_{n}}

satisfy the following conditions:

(C1): $γ \in (0, \frac{2 σ}{L_{h}^{2}}) .$
(C2): $0 < \underset{n \to \infty}{lim inf} ζ_{n}^{(i)} \leq \underset{n \to \infty}{lim sup} ζ_{n}^{(i)} < 1, \forall i \in {1, \dots, N}$ , and $\sum_{i = 1}^{N} ζ_{n}^{(i)} = 1$ .
(C3): $0 < \underset{n \to \infty}{lim inf} δ_{n}^{(j)} \leq \underset{n \to \infty}{lim sup} δ_{n}^{(j)} < 1, \forall j \in {1, \dots, M}$ , and $\sum_{j = 1}^{M} δ_{n}^{(j)} = 1$ .
(C4): $0 < \underset{n \to \infty}{lim inf} β_{n} \leq \underset{n \to \infty}{lim sup} β_{n} < β = min {1 - ω_{1}, \dots, 1 - ω_{N}} .$
(C5): $0 < α_{n} < 1$ , $lim_{n \to \infty} α_{n} = 0$ and $\sum_{n = 1}^{\infty} α_{n} = \infty$ .
(C6): $0 < ρ_{n} < 4$ and $\underset{n \to \infty}{lim inf} ρ_{n} (4 - ρ_{n}) > 0$ .
(C7): $ε_{n} > 0$ and $ε_{n} = o (α_{n})$ .

Assuming that the Assumption 1 is satisfied, the solution set

Ω

of the lower level problem of (7) is nonempty, and, for each

j \in {1, \dots, M}

, define

l^{(j)}

by

l^{(j)} (x) = \frac{1}{2} {∥ (I - {prox}_{λ g_{j}}) A (x) ∥}^{2} .

Note that, from Aubin [38], if

g_{j}

is indicator function, then

l^{(j)}

is convex, w-lsc and differentiable for each

j \in {1, \dots, M}

, and

\nabla l^{(j)}

is given by

\nabla l^{(j)} (x) = A^{*} (I - {prox}_{λ g_{j}}) A (x)

Next, we present and analyze the strong convergence of Algorithm 1 using

l^{(j)}

and

\nabla l^{(j)}

by assuming that

l^{(j)}

is differentiable.

Algorithm 1: Self-adaptive proximal gradient algorithm with inertial effect.

Initialization: Let the real number

γ

and the real sequences

{β_{n}^{(i)}}

(

i \in {1, \dots, N}

),

{δ_{n}^{(j)}}

(

j \in {1, \dots, M}

),

{α_{n}}

,

{β_{n}}

,

{ρ_{n}}

and

{ε_{n}}

satisfy the conditions in Assumption 2
(C1)–(C7).
Choose

x_{0}, x_{1} \in H

arbitrarily and proceed with the following computations:

Step 1.: Given the iterates $x_{n - 1}$ and $x_{n}$ ( $n \geq 1$ ), choose $θ_{n}$ such that $0 \leq θ_{n} \leq {\bar{θ}}_{n}$ , where

${\bar{θ}}_{n} : = \{\begin{matrix} min \{θ, \frac{ε_{n}}{∥ x_{n - 1} - x_{n} ∥}\}, & if x_{n - 1} \neq x_{n} \\ θ, & otherwise . \end{matrix}$
Step 2.: Evaluate $y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}) .$
Step 3.: Evaluate

$s_{n} = \sum_{i = 1}^{N} ζ_{n}^{(i)} ((1 - β_{n}) I - β_{n} U_{i}) y_{n} .$
Step 4.: Find

$z_{n} = s_{n} - \sum_{j = 1}^{M} δ_{n}^{(j)} τ_{n}^{(j)} \nabla l^{(j)} (s_{n}),$

where $τ_{n}^{(j)} = ρ_{n} \frac{l^{(j)} (s_{n})}{{(η_{n}^{(j)})}^{2}}$ for $η_{n}^{(j)} = max {1, ∥ \nabla l^{(j)} (s_{n}) ∥} .$
Step 5.: Find $x_{n + 1} = α_{n} (y_{n} - γ \nabla h (y_{n})) + (1 - α_{n}) z_{n} .$
Step 6.: Set $n : = n + 1$ and go to Step 1.

Remark 1.

From Condition (C7) and Step 1 of Algorithm 1, we have that

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0, n \to \infty .

Since

{α_{n}}

is bounded, we also have

θ_{n} ∥ x_{n} - x_{n - 1} ∥ \to 0, n \to \infty .

Note that Step 1 of Algorithm 1 is easily implemented in numerical computation since the value of

∥ x_{n} - x_{n - 1} ∥

is a priori known before choosing

θ_{n}

.

Note that: Let

V_{γ} = I - γ \nabla h

, where

γ \in (0, \frac{2 σ}{L_{h}^{2}})

. Then, we have

\begin{matrix} ∥ V_{γ} (x) - V_{γ} {(y) ∥}^{2} & \leq & {∥ x - y ∥}^{2} + γ^{2} {∥ \nabla h (x) - \nabla h (y) ∥}^{2} \\ - 2 γ 〈 \nabla h (x) - \nabla h (y), x - y 〉 \\ \leq & {∥ x - y ∥}^{2} + γ^{2} L_{h}^{2} {∥ x - y ∥}^{2} - 2 γ σ {∥ x - y ∥}^{2} \\ = & μ^{2} {∥ x - y ∥}^{2}, \forall x, y \in H_{1}, \end{matrix}

(12)

where for

μ = \sqrt{1 - γ (2 σ - γ L_{h}^{2})}

. Therefore, for

γ \in (0, \frac{2 σ}{L_{h}^{2}})

, the mapping

V_{γ}

is a contraction mapping with constant

μ

. Consequently, the mapping

P_{Ω} V_{γ}

is also a contraction mapping with constant

μ

, i.e.,

∥ P_{Ω} V_{γ} (x) - P_{Ω} V_{γ} (y) ∥ \leq ∥ V_{γ} (x) - V_{γ} (y) ∥ \leq μ ∥ x - y ∥, \forall x, y \in H_{1} .

Hence, by the Banach contraction principle, there exists a unique element

\bar{x} \in H_{1}

such that

\bar{x} = P_{Ω} V_{γ} (\bar{x})

. Clearly,

\bar{x} \in Ω

and we have

\bar{x} = P_{Ω} V_{γ} (\bar{x}) \Leftrightarrow 〈 \nabla h (\bar{x}), y - \bar{x} 〉 \geq 0, \forall y \in Ω .

Lemma 5.

For the sequences

{s_{n}}

,

{y_{n}}

and

{z_{n}}

generated by Algorithm 1 and for

\bar{x} \in Ω

, we have

(i): $\begin{matrix} ∥ z_{n} - \bar{x} ∥^{2} & \leq ∥ y_{n} - \bar{x} ∥^{2} - \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} . \end{matrix}$
(ii): $∥ z_{n} - \bar{x} ∥ \leq ∥ s_{n} - \bar{x} ∥ \leq ∥ y_{n} - \bar{x} ∥ .$

Proof.

Let

\bar{x} \in Ω

. Now, since

I - {prox}_{λ g_{j}}

are firmly nonexpansive, and since

A (\bar{x})

is the minimizer of each

g_{j}

, we have for all

x \in H_{1}

\begin{matrix} 〈 \nabla l^{(j)} (x), x - \bar{x} 〉 & = & 〈 A^{*} (I - {prox}_{λ g_{j}}) A (x), x - \bar{x} 〉 \\ = & 〈 (I - {prox}_{λ g_{j}}) A (x), A (x) - A_{k} (\bar{x}) 〉 \\ \geq & ∥ (I - {prox}_{λ g_{j}}) A (x) ∥^{2} = 2 l^{(j)} (x) . \end{matrix}

(13)

By the definition of

z_{n}

, we get

\begin{matrix} ∥ s_{n} - \bar{x} ∥^{2} & = ∥ \sum_{i = 1}^{N} ζ_{n}^{(i)} ((1 - β_{n}) I - β_{n} U_{i}) y_{n} - \bar{x} ∥^{2} \\ \leq \sum_{i = 1}^{N} ζ_{n}^{(i)} {∥ ((1 - β_{n}) I - β_{n} U_{i}) y_{n} - \bar{x} ∥}^{2} \\ = \sum_{i = 1}^{N} ζ_{n}^{(i)} (∥ y_{n} - \bar{x} ∥^{2} + β_{n}^{2} {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - 2 β_{n} 〈 y_{n} - \bar{x}, (I - U_{i}) y_{n} 〉) \\ = \sum_{i = 1}^{N} ζ_{n}^{(i)} (∥ y_{n} - \bar{x} ∥^{2} + β_{n}^{2} {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - β_{n} (1 - ω_{i}) {∥ (I - U_{i}) y_{n} ∥}^{2}) \\ = ∥ y_{n} - \bar{x} ∥^{2} - \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} . \end{matrix}

(14)

Using the definition of

y_{n}

, Lemma 1

(i i)

, and (13), we have

\begin{matrix} ∥ z_{n} - \bar{x} ∥^{2} & = & ∥ s_{n} - \sum_{j = 1}^{M} δ_{n}^{(j)} τ_{n}^{(j)} \nabla l^{(j)} (s_{n}) - \bar{x} ∥^{2} \\ \leq & \sum_{j = 1}^{M} δ_{n}^{(j)} ∥ s_{n} - τ_{n}^{(j)} \nabla l^{(j)} (s_{n}) - \bar{x} ∥^{2} \\ \leq & \sum_{j = 1}^{M} δ_{n}^{(j)} (∥ s_{n} - \bar{x} ∥^{2} + {∥ τ_{n}^{(j)} \nabla l^{(j)} (s_{n}) ∥}^{2} - 2 τ_{n}^{(j)} 〈 \nabla l^{(j)} (s_{n}), s_{n} - \bar{x} 〉) \\ \leq & \sum_{j = 1}^{M} δ_{n}^{(j)} (∥ s_{n} - \bar{x} ∥^{2} + (τ_{n}^{(j)} ∥ \nabla l^{(j)} (s_{n}) {∥)}^{2} - 4 τ_{n}^{(j)} l^{(j)} (s_{n})) \\ \leq & \sum_{j = 1}^{M} δ_{n}^{(j)} (∥ s_{n} - \bar{x} ∥^{2} + {(τ_{n}^{(j)} η_{n}^{(j)})}^{2} - 4 τ_{n}^{(j)} l^{(j)} (s_{n})) \\ = & \sum_{j = 1}^{M} δ_{n}^{(j)} \{∥ s_{n} - \bar{x} ∥^{2} + {(ρ_{n} \frac{l^{(j)} (s_{n})}{{(η_{n}^{(j)})}^{2}} η_{n}^{(j)})}^{2} - 4 ρ_{n} \frac{l^{(j)} (s_{n})}{{(η_{n}^{(j)})}^{2}} l^{(j)} (s_{n})\} \\ = & ∥ s_{n} - \bar{x} ∥^{2} - ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} . \end{matrix}

(15)

The result (i) follows from (14) and (15), and, in view of (C2)–(C6), the result (ii) follows from (14) and (15). □

Theorem 1.

The sequence

{x_{n}}

generated by Algorithm 1 converges strongly to the solution of problem (7).

Proof.

Claim 1: The sequences

{x_{n}}

,

{y_{n}}

and

{z_{n}}

are bounded.

Let

\bar{x} \in Ω

. Now, from the definition of

y_{n}

, we get

\begin{matrix} ∥ y_{n} - \bar{x} ∥ & = & ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) - \bar{x} ∥ \leq ∥ x_{n} - \bar{x} ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ . \end{matrix}

(16)

Using (16) and the definition of

x_{n + 1}

, we get

\begin{array}{l} ∥ x_{n + 1} - \bar{x} ∥ = ∥ (1 - α_{n}) (z_{n} - \bar{x}) + α_{n} (V_{γ} (y_{n}) - V_{γ} (\bar{x})) + α_{n} (V_{γ} (\bar{x}) - \bar{x}) ∥ \\ = (1 - α_{n}) ∥ z_{n} - \bar{x} ∥ + α_{n} ∥ V_{γ} (y_{n}) - V_{γ} (\bar{x}) ∥ + α_{n} ∥ V_{γ} (\bar{x}) - \bar{x} ∥ \\ = (1 - α_{n}) ∥ z_{n} - \bar{x} ∥ + α_{n} μ ∥ y_{n} - \bar{x} ∥ + α_{n} ∥ V_{γ} (\bar{x}) - \bar{x} ∥ \\ \leq (1 - α_{n} (1 - μ)) ∥ y_{n} - \bar{x} ∥ + α_{n} ∥ V_{γ} (\bar{x}) - \bar{x} ∥ \\ \leq (1 - α_{n} (1 - μ)) ∥ x_{n} - \bar{x} ∥ + (1 - α_{n} (1 - μ)) θ_{n} ∥ x_{n} - x_{n - 1} ∥ \\ + α_{n} ∥ V_{γ} (\bar{x}) - \bar{x} ∥ \\ \leq (1 - α_{n} (1 - μ)) ∥ x_{n} - \bar{x} ∥ \\ + α_{n} (1 - μ) \{\frac{(1 - α_{n} (1 - μ))}{1 - μ} \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + \frac{∥ V_{γ} (\bar{x}) - \bar{x} ∥}{1 - μ}\} . \end{array}

(17)

Observe that, by (C6) and Remark 1, we see that

lim_{n \to \infty} \frac{(1 - α_{n} (1 - μ))}{1 - μ} \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ = 0 .

Let

\hat{L} = 2 max \{\frac{∥ V_{γ} (\bar{x}) - \bar{x} ∥}{1 - μ}, {sup}_{n \geq 1} \frac{(1 - α_{n} (1 - μ))}{1 - μ} \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥\} .

Then, (17) becomes

∥ x_{n + 1} - \bar{x} ∥ \leq (1 - α_{n} (1 - μ)) ∥ x_{n} - \bar{x} ∥ + α_{n} (1 - μ) \hat{L} .

Thus, by Lemma 2, the sequence

{x_{n}}

is bounded. As a consequence,

{y_{n}}

,

{z_{n}}

{V_{γ} (y_{n})}

and

{s_{n}}

are also bounded.

Claim 2: The sequence

{x_{n}}

converges strongly to

\bar{x} \in Ω

, where

\bar{x} = P_{Ω} V_{γ} (\bar{x})

.

Now,

\begin{matrix} ∥ y_{n} - \bar{x} ∥^{2} & = & ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) - \bar{x} ∥^{2} \\ = & ∥ x_{n} - \bar{x} ∥^{2} + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} + 2 θ_{n} 〈 x_{n} - \bar{x}, x_{n} - x_{n - 1} 〉 . \end{matrix}

(18)

From Lemma 1

(i i i)

, we have

〈 x_{n} - \bar{x}, x_{n} - x_{n - 1} 〉 = \frac{1}{2} ∥ x_{n} - \bar{x} ∥^{2} - \frac{1}{2} ∥ x_{n - 1} - \bar{x} ∥^{2} + \frac{1}{2} {∥ x_{n} - x_{n - 1} ∥}^{2} .

(19)

From (18) and (19) and since

0 \leq θ_{n} < 1

, we get

\begin{matrix} ∥ y_{n} - \bar{x} ∥^{2} & = & ∥ x_{n} - \bar{x} ∥^{2} + θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥ \\ + θ_{n} (∥ x_{n} - \bar{x} ∥^{2} - ∥ x_{n - 1} - \bar{x} ∥^{2} + ∥ x_{n} - x_{n - 1} ∥^{2}) \\ \leq & ∥ x_{n} - \bar{x} ∥^{2} + 2 θ_{n} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + θ_{n} (∥ x_{n} - \bar{x} ∥^{2} - ∥ x_{n - 1} - \bar{x} ∥^{2}) . \end{matrix}

(20)

Using the definition of

x_{n + 1}

and Lemma 1

(i i)

, we have

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} & = & ∥ α_{n} V_{γ} (y_{n}) + (1 - α_{n}) z_{n} - \bar{x} ∥^{2} \\ = & ∥ α_{n} (V_{γ} (y_{n}) - \bar{x}) + (1 - α_{n}) (z_{n} - \bar{x}) ∥^{2} \\ \leq & {(1 - α_{n})}^{2} {∥ z_{n} - \bar{x} ∥}^{2} + 2 α_{n} 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq & ∥ z_{n} - \bar{x} ∥^{2} + 2 α_{n} 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 . \end{matrix}

(21)

Lemma 5

(i)

together with (20) and (21) give

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} & \leq ∥ z_{n} - \bar{x} ∥^{2} + 2 α_{n} 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq ∥ y_{n} - \bar{x} ∥^{2} + 2 α_{n} 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ - \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} \\ \leq ∥ x_{n} - \bar{x} ∥^{2} + 2 θ_{n} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + θ_{n} (∥ x_{n} - \bar{x} ∥^{2} - ∥ x_{n - 1} - \bar{x} ∥^{2}) \\ + 2 α_{n} 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ - \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} . \end{matrix}

(22)

Since the sequence

{x_{n}}

and

{V_{γ} (y_{n})}

are bounded, there exists

M_{1}

such that

2 〈 V_{γ} (y_{n}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \leq M_{1}

for all

n \geq 1

. Thus, from (22), we obtain

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} & \leq ∥ x_{n} - \bar{x} ∥^{2} + 2 θ_{n} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + θ_{n} (∥ x_{n} - \bar{x} ∥^{2} - ∥ x_{n - 1} - \bar{x} ∥^{2}) + α_{n} M_{1} \\ - \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \\ - ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} . \end{matrix}

(23)

Let us distinguish the following two cases related to the behavior of the sequence

{Γ_{n}}

where

Γ_{n} = {∥ x_{n} - \bar{x} ∥}^{2}

.

Case 1. Suppose the sequence

{Γ_{n}}

decreases at infinity. Thus, there exists

n_{0} \in N

such that

Γ_{n + 1} \leq Γ_{n}

for

n \geq n_{0}

. Then,

{Γ_{n}}

converges and

Γ_{n} - Γ_{n + 1} \to 0

as

n \to 0

.

From (23), we have

\begin{matrix} \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \leq (Γ_{n} - Γ_{n + 1}) + α_{n} M_{1} \\ + θ_{n} (Γ_{n} - Γ_{n - 1}) + 2 θ_{n} {∥ x_{n} - x_{n - 1} ∥}^{2}, \end{matrix}

(24)

and

\begin{matrix} ρ_{n} (4 - ρ_{n}) \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} \leq (Γ_{n} - Γ_{n + 1}) + α_{n} M_{1} \\ + θ_{n} (Γ_{n} - Γ_{n - 1}) + 2 θ_{n} {∥ x_{n} - x_{n - 1} ∥}^{2} . \end{matrix}

(25)

Since

Γ_{n} - Γ_{n + 1} \to 0

(

Γ_{n - 1} - Γ_{n} \to 0

) and using (C5), (C6), and Remark 1 (noting

α_{n} \to 0

,

θ_{n} ∥ x_{n} - x_{n - 1} ∥ \to 0

,

{x_{n}}

is bounded and

\underset{n \to \infty}{lim inf} ρ_{n} (4 - ρ_{n}) > 0

); we have, from (24) and (25),

\sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} \to 0 and \sum_{i = 1}^{N} ζ_{n}^{(i)} β_{n} (1 - ω_{i} - β_{n}) {∥ (I - U_{i}) y_{n} ∥}^{2} \to 0 .

(26)

In view of (26) and conditions (C2)–(C6), we have

\frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} \to 0 and ∥ (I - U_{i}) y_{n} ∥ \to 0, n \to \infty,

(27)

for all

j \in {1, \dots, M}

and for all

i \in {1, \dots, N}

.

Using (26), we have

∥ s_{n} - z_{n} ∥^{2} \leq 16 \sum_{j = 1}^{M} δ_{n}^{(j)} \frac{{(l^{(j)} (s_{n}))}^{2}}{{(η_{n}^{(j)})}^{2}} \to 0, n \to \infty .

(28)

Similarly, from (26), we have

\begin{matrix} ∥ y_{n} - s_{n} ∥^{2} & \leq & β^{2} \sum_{i = 1}^{N} ζ_{n}^{(i)} {∥ (I - U_{i}) y_{n} ∥}^{2} \to 0, n \to \infty . \end{matrix}

(29)

Using the definition of

y_{n}

and Remark 1, we have

∥ x_{n} - y_{n} ∥ = ∥ x_{n} - x_{n} - θ_{n} (x_{n} - x_{n - 1}) ∥ = θ_{n} ∥ x_{n} - x_{n - 1} ∥ \to 0, n \to \infty .

(30)

Moreover, using the definiton of

x_{n + 1}

and boundedness of

{V_{γ} (y_{n})}

and

{z_{n}}

together with condition (C5), we have

∥ x_{n + 1} - z_{n} ∥ = α_{n} ∥ V_{γ} (y_{n}) - z_{n} ∥ \to 0, n \to \infty .

(31)

Therefore, from (28)–(31), we have

∥ x_{n + 1} - x_{n} ∥ \leq ∥ x_{n + 1} - z_{n} ∥ + ∥ z_{n} - s_{n} ∥ + ∥ s_{n} - y_{n} ∥ + ∥ x_{n} - y_{n} ∥ \to 0, n \to \infty .

(32)

For each

j \in {1, \dots, M}

,

\nabla l^{(j)} (.)

are Lipschitz continuous with constant

{∥ A ∥}^{2}

. Therefore, the sequence

{{(η_{n}^{(j)})}^{2}}_{n = 1}^{\infty}

is bounded sequence for each

j \in {1, \dots, M}

, and hence, using (27), we have

lim_{n \to \infty} l^{(j)} (s_{n}) = 0

for all

j \in {1, \dots, M}

.

Let p be a weak cluster point of

{x_{n}}

; there exists a subsequence

{x_{n_{l}}}

of

{x_{n}}

such that

x_{n_{l}} ⇀ p

as

l \to \infty

. Since

∥ x_{n} - y_{n} ∥ \to 0

as

n \to \infty

(from (30)), we have

y_{n_{l}} ⇀ p

as

l \to \infty

. Hence, using

y_{n_{l}} ⇀ p

, (27) and demiclosedness of

U_{i}

, we have

p \in Fix U_{i}

for all

i \in {1, \dots, N}

.

Moreover, since

∥ x_{n} - s_{n} ∥ \to 0

as

n \to \infty

(from (29) and (30)), we have

s_{n_{l}} ⇀ p

as

l \to \infty

. Hence, the weak lower-semicontinuity of

l^{(j)} (.)

implies that

0 \leq l^{(j)} (p) \leq \underset{l \to \infty}{lim inf} l^{(j)} (s_{n_{l}}) = lim_{n \to \infty} l^{(j)} (s_{n}) = 0

for all

j \in {1, \dots, M}

. That is,

l^{(j)} (p) = \frac{1}{2} {∥ (I - {prox}_{λ g_{j}}) A (p) ∥}^{2} = 0

for all

j \in {1, \dots, M}

. Thus,

A (p) \in Ω

.

We now show that

\underset{n \to \infty}{lim sup} 〈 (I - V_{γ}) \bar{x}, \bar{x} - x_{n} 〉 \leq 0

. Indeed, since

\bar{x} = P_{Ω} V_{γ} (\bar{x})

and, from above, p is a weak cluster point of

{x_{n}}

, i.e.,

x_{n_{l}} ⇀ p

, and

p \in Ω

, we obtain that

\begin{matrix} \underset{n \to \infty}{lim sup} 〈 (I - V_{γ}) \bar{x}, \bar{x} - x_{n} 〉 & = & lim_{l \to \infty} 〈 (I - V_{γ}) \bar{x}, \bar{x} - x_{n_{l}} 〉 \\ = & 〈 (I - V_{γ}) \bar{x}, \bar{x} - p 〉 \leq 0 . \end{matrix}

(33)

Since

∥ x_{n + 1} - x_{n} ∥ \to 0

from (32), from (33), we obtain

\underset{n \to \infty}{lim sup} 〈 (I - V_{γ}) \bar{x}, \bar{x} - x_{n + 1} 〉 \leq 0 .

(34)

Now, using Lemma 5 (ii), we get

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} = 〈 α_{n} V_{γ} (y_{n}) + (1 - α_{n}) z_{n} - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ = α_{n} 〈 V_{γ} (y_{n}) - V_{γ} (\bar{x}), x_{n + 1} - \bar{x} 〉 + (1 - α_{n}) 〈 z_{n} - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ + α_{n} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq μ α_{n} ∥ y_{n} - \bar{x} ∥ ∥ x_{n + 1} - \bar{x} ∥ + (1 - α_{n}) ∥ z_{n} - \bar{x} ∥ ∥ x_{n + 1} - \bar{x} ∥ \\ + α_{n} 〈 V (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq (1 - α_{n} (1 - μ)) ∥ y_{n} - \bar{x} ∥ ∥ x_{n + 1} - \bar{x} ∥ + α_{n} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq (1 - α_{n} (1 - μ)) (\frac{∥ y_{n} - \bar{x} ∥^{2}}{2} + \frac{∥ x_{n + 1} - \bar{x} ∥^{2}}{2}) \\ + α_{n} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 . \end{matrix}

(35)

Therefore, from (35), we have

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} \leq \frac{1 - α_{n} (1 - μ)}{1 + α_{n} (1 - μ)} {∥ y_{n} - \bar{x} ∥}^{2} + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ = (1 - \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}) {∥ y_{n} - \bar{x} ∥}^{2} + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 . \end{matrix}

(36)

Combining (36) and

∥ y_{n} - \bar{x} ∥ = ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) - \bar{x} ∥ \leq ∥ x_{n} - \bar{x} ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥,

it holds that

\begin{matrix} ∥ x_{n + 1} - \bar{x} ∥^{2} \leq (1 - \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}) {(∥ x_{n} - \bar{x} ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥)}^{2} \\ + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ = (1 - \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}) (∥ x_{n} - \bar{x} ∥^{2} + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} ∥ x_{n} - \bar{x} ∥ ∥ x_{n} - x_{n - 1} ∥) + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ \leq (1 - \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}) ∥ x_{n} - \bar{x} ∥^{2} + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} ∥ x_{n} - \bar{x} ∥ ∥ x_{n} - x_{n - 1} ∥ + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 . \end{matrix}

(37)

Since

{x_{n}}

is bounded, there exists

M_{2} > 0

such that

∥ x_{n} - \bar{x} ∥ \leq M_{2}

for all

n \geq 1

. Thus, in view of (37), we have

\begin{matrix} Γ_{n + 1} & \leq & (1 - \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}) Γ_{n} + θ_{n} ∥ x_{n} - x_{n - 1} ∥ (θ_{n} ∥ x_{n} - x_{n - 1} ∥ + 2 M_{2}) \\ + \frac{2 α_{n}}{1 + α_{n} (1 - μ)} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 \\ = & (1 - a_{n}) Γ_{n} + a_{n} ϑ_{n}, \end{matrix}

(38)

where

a_{n} = \frac{2 α_{n} (1 - μ)}{1 + α_{n} (1 - μ)}

and

\begin{matrix} ϑ_{n} = \frac{1 + α_{n} (1 - μ)}{2 (1 - μ)} (\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥) & \{θ_{n} ∥ x_{n} - x_{n - 1} ∥ + 2 M_{2}\} \\ + \frac{1}{1 - μ} 〈 V_{γ} (\bar{x}) - \bar{x}, x_{n + 1} - \bar{x} 〉 . \end{matrix}

From (C5), Remark 1 and (34), we have

\sum_{n = 1}^{\infty} a_{n} = \infty

and

\underset{n \to \infty}{lim sup} ϑ_{n} \leq 0

. Thus, using Lemma 2 and (38), we get

Γ_{n} \to 0

as

n \to \infty

. Hence,

x_{n} \to \bar{x}

as

n \to \infty

.

Case 2. Assume that

{Γ_{n}}

does not decrease at infinity. Let

φ : N \to N

be a mapping for all

n \geq n_{0}

(for some

n_{0}

large enough) defined by

φ (n) = max {k \in N : k \leq n, Γ_{k} \leq Γ_{k + 1}} .

By Lemma 3,

{φ (n)}_{n = n_{0}}^{\infty}

is a nondecreasing sequence,

φ (n) \to \infty

as

n \to \infty

and

Γ_{φ (n)} \leq Γ_{φ (n) + 1} and Γ_{n} \leq Γ_{φ (n) + 1}, \forall n \geq n_{0} .

(39)

In view of

∥ x_{φ (n)} - \bar{x} ∥^{2} - {∥ x_{φ (n) + 1} - \bar{x} ∥}^{2} = Γ_{φ (n)} - Γ_{φ (n) + 1} \leq 0

for all

n \geq n_{0}

and (23), we have for all

n \geq n_{0}

\begin{matrix} ρ_{φ (n)} (4 - ρ_{φ (n)}) \sum_{j = 1}^{M} δ_{φ (n)}^{(j)} \frac{{(l^{(j)} (s_{φ (n)}))}^{2}}{{(η_{φ (n)}^{(j)})}^{2}} \\ \leq (Γ_{φ (n)} - Γ_{φ (n) + 1}) + α_{φ (n)} M_{1} + θ_{φ (n)} (Γ_{φ (n)} - Γ_{φ (n) - 1}) \\ + 2 θ_{φ (n)} {∥ x_{φ (n)} - x_{φ (n) - 1} ∥}^{2} \\ \leq α_{φ (n)} M_{1} + θ_{φ (n)} (Γ_{φ (n)} - Γ_{φ (n) - 1}) + 2 θ_{φ (n)} {∥ x_{φ (n)} - x_{φ (n) - 1} ∥}^{2} \\ \leq α_{φ (n)} M_{1} + θ_{φ (n)} ∥ x_{φ (n)} - x_{φ (n) - 1} ∥ (\sqrt{Γ_{φ (n)}} + \sqrt{Γ_{φ (n) - 1}}) \\ + 2 θ_{φ (n)} {∥ x_{φ (n)} - x_{φ (n) - 1} ∥}^{2} . \end{matrix}

(40)

Similarly, from (23), we have for all

n \geq n_{0}

\begin{matrix} \sum_{i = 1}^{N} ζ_{φ (n)}^{(i)} β_{φ (n)} (1 - ω_{i} - β_{φ (n)}) {∥ (I - U_{i}) y_{φ (n)} ∥}^{2} \\ \leq α_{φ (n)} M_{1} + θ_{φ (n)} ∥ x_{φ (n)} - x_{φ (n) - 1} ∥ (\sqrt{Γ_{φ (n)}} + \sqrt{Γ_{φ (n) - 1}}) \\ + 2 θ_{φ (n)} {∥ x_{φ (n)} - x_{φ (n) - 1} ∥}^{2} . \end{matrix}

(41)

Thus, for (40) and (41) together with (C3)–(C6) and Remark 1, we have for each

j \in {1 \dots, M}

and

i \in {1 \dots, N}

,

\frac{{(l^{(j)} (s_{φ (n)}))}^{2}}{{(η_{φ (n)}^{(j)})}^{2}} \to 0, and ∥ (I - U_{i}) y_{φ (n)} ∥ \to 0, n \to \infty .

(42)

Using a similar procedure as above in Case 1, we have

lim_{n \to \infty} ∥ x_{φ (n)} - s_{φ (n)} ∥ = lim_{n \to \infty} ∥ x_{φ (n)} - y_{φ (n)} ∥ = lim_{n \to \infty} ∥ x_{φ (n) + 1} - x_{φ (n)} ∥ = 0 .

By the similar argument as above in Case 1, since

{x_{φ (n)}}

is bounded, there exists a subsequence of

{x_{φ (n)}}

which converges weakly to

p \in Ω

and this gives

\underset{n \to \infty}{lim sup} 〈 (I - V_{γ}) \bar{x}, \bar{x} - x_{φ (n) + 1} 〉 \leq 0

. Thus, from (38), we have

\begin{matrix} Γ_{φ (n) + 1} & \leq (1 - a_{φ (n)}) Γ_{φ (n)} + a_{φ (n)} ϑ_{φ (n)}, \end{matrix}

(43)

where

a_{φ (n)} = \frac{2 α_{φ (n)} (1 - μ)}{1 + α_{φ (n)} (1 - μ)}

and

\begin{matrix} ϑ_{φ (n)} = \frac{1 + α_{φ (n)} (1 - μ)}{2 (1 - μ)} (\frac{θ_{φ (n)}}{α_{φ (n)}} ∥ x_{φ (n)} - & x_{φ (n) - 1} ∥) {θ_{φ (n)} ∥ x_{φ (n)} - x_{φ (n) - 1} ∥ \\ + 2 M_{2}} + \frac{1}{1 - μ} 〈 V (\bar{x}) - \bar{x}, x_{φ (n) + 1} - \bar{x} 〉 . \end{matrix}

Using

Γ_{φ (n)} - Γ_{φ (n) + 1} \leq 0

for all

n \geq n_{0}

and

ϑ_{φ (n)} > 0

, the last inequality gives

0 \leq - a_{φ (n)} Γ_{φ (n)} + a_{φ (n)} ϑ_{φ (n)} .

Since

a_{φ (n)} > 0

, we obtain

∥ x_{φ (n)} - \bar{x} ∥^{2} = Γ_{φ (n)} \leq ϑ_{φ (n)} .

Moreover, since

\underset{n \to \infty}{lim sup} ϑ_{φ (n)} \leq 0

, we have

lim_{n \to \infty} ∥ x_{φ (n)} - \bar{x} ∥ = 0 .

Thus,

lim_{n \to \infty} ∥ x_{φ (n)} - \bar{x} ∥ = 0

together with

lim_{n \to \infty} ∥ x_{φ (n) + 1} - x_{φ (n)} ∥ = 0

, gives

lim_{n \to \infty} Γ_{φ (n) + 1} = 0

. Therefore, from (39), we obtain

lim_{n \to \infty} Γ_{n} = 0

, that is,

x_{n} \to \bar{x}

as

n \to \infty

. □

For

l (x) = \frac{1}{2} {∥ (I - {prox}_{λ g_{j}}) (x) ∥}^{2} and \nabla l (x) = (I - {prox}_{λ g_{j}}) (x)

, we have the following results solving the bilevel problem (10):

Corollary 1.

If

γ \in (0, \frac{2 σ}{L_{h}^{2}})

, the sequence

{x_{n}}

generated by

\{\begin{cases} y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ τ_{n} = ρ_{n} \frac{l (y_{n})}{{(η_{n})}^{2}}, η_{n} = max {1, ∥ \nabla l (y_{n}) ∥}, \\ z_{n} = y_{n} - τ_{n} \nabla l (y_{n}), \\ x_{n + 1} = α_{n} (y_{n} - γ \nabla h (y_{n})) + (1 - α_{n}) z_{n}, \end{cases}

converges strongly to the solution the bilevel problem (10) if

{α_{n}}

,

{ρ_{n}}

and

{θ_{n}}

are real sequences such that

(C1): $0 < α_{n} < 1$ , $lim_{n \to \infty} α_{n} = 0$ and $\sum_{n = 1}^{\infty} α_{n} = \infty$ .
(C2): $0 < ρ_{n} < 4$ and $\underset{n \to \infty}{lim inf} ρ_{n} (4 - ρ_{n}) > 0$ .
(C3): $lim_{n \to \infty} \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ = 0$ where $θ_{n} \in [0, θ)$ for $θ \in [0, 1)$ .

4. Applications

4.1. Application to the Bilevel Variational Inequality Problem

Let

H_{1}

and

H_{2}

be two real Hilbert spaces. Assume that

F : H_{1} \to H_{1}

is

L_{h}

-Lipschitz continuous and

σ

-strongly monotone on

H_{1}

,

A : H_{1} \to H_{2}

is a bounded linear operator,

g_{j} : H_{2} \to R \cup {+ \infty}

is a proper, convex, lower semicontinuous function for all

j \in {1, \dots, M}

, and

U_{i} : H_{1} \to H_{1}

is

ω_{i}

-demimetric and demiclosed mapping for all

i \in {1, \dots, N}

. Then, replacing

\nabla h

by F in Algorithm 1, we obtain strong convergence for an approximation of a solution of the bilevel variational inequality problem

find \bar{x} \in Ω such that 〈 F (\bar{x}), x - \bar{x} 〉 \geq 0, \forall x \in Ω,

where

Ω

is the solution set of

find x \in ⋂_{i = 1}^{N} Fix U_{i} such that A (x) \in ⋂_{j = 1}^{M} arg min g_{j} .

4.2. Application to a Bilevel Optimization Problem with a Feasibility Set Constraint, Inclusion Constraint, and Equilibrium Constraint

Let

H_{1}

and

H_{2}

be two real Hilbert spaces,

A : H_{1} \to H_{2}

be a linear transformation and

h : H_{1} \to R

be proper, convex, continuously differentiable, and the gradient

\nabla h

is

σ

-strongly monotone operator and

L_{h}

-Lipschitz continuous.

Now, consider the bilevel optimization problem with a feasibility set constraint

\begin{array}{l} \min h \\ s . t . A (x) \in ⋂_{i = 1}^{M} Q_{j}, \end{array}

(44)

where each

Q_{j}

is a closed convex subset of

H_{2}

for

j \in {1, \dots, M}

. Replacing

U_{i} = I

for all

i \in {1, \dots, N}

and

{prox}_{λ g_{j}}

by projection mapping

P_{Q_{j}}

in Algorithm 1, we obtain strong convergence for an approximation of the solution of the bilevel problem (44).

Consider the bilevel optimization problem with inclusion constraint

\begin{array}{l} \min h \\ s . t . 0 \in ⋂_{i = 1}^{M} G_{j} (A (x)), \end{array}

(45)

where

G_{j} : H_{2} \to 2^{H_{2}}

is maximal monotone mapping for

j \in {1, \dots, M}

. Setting

U_{i} = I

for all

i \in {1, \dots, N}

and, replacing the proximal mapping

g_{j}

in Algorithm 1 by the resolvent operators

J_{λ}^{G_{j}} = {(I + λ G_{j})}^{- 1}

(for

λ > 0

), and following the method of proof in theorems, we obtain a strong convergence result for approximation of the solution of the bilevel problem (45).

Consider the bilevel optimization problem with equilibrium constraint

\begin{array}{l} \min h \\ s . t . A (x) \in ⋂_{i = 1}^{M} S E P (g_{j}, H_{2}), \end{array}

(46)

where

g_{j} : H_{2} \times H_{2} \to R

is a bifunction and each

g_{j}

satisfies condition

C O

on

H_{2}

. We have strong convergence results solving (46) by setting

U_{i} = I

for all

i \in {1, \dots, N}

and replacing the proximal mappings by the resolvent operators

T_{r}^{g_{j}}

in Algorithm 1 (see (11) and properties of it in Lemma 4 (i)–(iv)).

5. Numerical Example

Taking the bilevel optimization problem (7) for

H_{1} = R^{p}

,

H_{2} = R^{q}

, the linear transformations

A : R^{p} \to R^{q}

are given by

A (x) = G_{q \times p}

, where

G_{q \times p}

is

q \times p

matrix, and for

x \in H_{1} = R^{p}

,

z \in H_{2} = R^{q}

, we have

\begin{matrix} h (x) = \frac{1}{2} x^{T} D x + \frac{1}{2} {∥ x ∥}_{p}^{2}, \\ U_{i} (x) = ϵ_{i} x, i \in {1 \dots, N}, \\ g_{1} (z) = \frac{1}{2} z^{T} B z, g_{2} (z) = {∥ z ∥}_{q}, g_{3} (z) = \sum_{t = 1}^{q} Φ (z_{t}), \end{matrix}

where D and B are invertible symmetric positive semidefinite

p \times p

and

q \times q

matrix, respectively,

ϵ_{i} \leq 1

\forall i \in {1, \dots, N}

,

z = (z_{1}, \dots, z_{q}) \in R^{q}

,

{∥ . ∥}_{p}

is the Euclidean norm in

R^{p}

,

{∥ . ∥}_{q}

is the Euclidean norm in

R^{q}

, and

Φ (z_{t}) = max {| z_{t} | - 1, 0}

for

t = 1, 2, \dots, q

.

Here,

h (x) = f (x) + \frac{1}{2} {∥ x ∥}_{p}^{2}

where

f (x) = \frac{1}{2} x^{T} D x

and hence the gradient

\nabla f

is

∥ D ∥

-Lipschitz. Thus, the gradient

\nabla h

is 1-strongly monotone and (

∥ D ∥ + 1

)-Lipschitz. We choose

γ = \frac{1}{{(∥ D ∥ + 1)}^{2}}

.

Now, for

λ = 1

, the proximal

g_{1}

,

g_{2}

and

g_{3}

is given by

{prox}_{λ g_{1}} (z) = {(I + B)}^{- 1} (z), i \in Φ,

{prox}_{λ g_{2}} (z) = \{\begin{cases} (1 - \frac{1}{{∥ z ∥}_{q}}) z, {∥ z ∥}_{q} \geq 1 \\ 0, otherwise . \end{cases}

and

{prox}_{λ g_{3}} (z) = ({prox}_{λ Φ} (z_{1}), \dots, {prox}_{λ Φ} (z_{q})),

where

{prox}_{λ Φ} (z_{t}) = \{\begin{cases} z_{t}, & if | z_{t} | < 1 \\ sign (z_{t}), & if 1 \leq | z_{t} | \leq 2 \\ sign (z_{t} - 1), & if | z_{t} | > 2 . \end{cases}

We consider for

p = q

,

ϵ_{i} = \frac{1}{i + 1}

for

i \in {1, \dots, N}

and

G_{q \times p} = I_{p \times p}

, where

I_{p \times p}

is identity

p \times p

matrix. The parameters are chosen are

β_{n}^{(i)} = \frac{i}{1 + \dots + N}

for

i \in {1, \dots, N}

,

δ_{n}^{(j)} = \frac{j}{6}

for

j \in {1, 2, 3}

,

α_{n} = \frac{1}{n + 1}

,

ε_{n} = \frac{1}{{(n + 1)}^{2}}

,

ρ_{n} = 1

and

θ_{n} = {\bar{θ}}_{n}

.

For the purpose of testing our algorithm, we took the following data:

D and B are randomly generated invertible symmetric positive semidefinite $p \times p$ matrices, respectively.
$x_{0}$ and $x_{1}$ are randomly generated starting points.
The stopping criteria $\frac{∥ x_{n + 1} - x_{n} ∥}{∥ x_{2} - x_{1} ∥} < T O L$ .

Table 1 and Table 2 and Figure 1 illustrate the numerical results of our algorithms for this example under the parameters and data given above and for

θ = 0.5

. The number of iterations (Iter(n)), CPU time in seconds (CPU(s)), and the error

e r r_{(n)} = ∥ x_{n} - \bar{x} ∥

, where

\bar{x}

is the solution set of the bilevel optimization problem (

\bar{x} = 0

here in this example), are reported in Table 1.

We now compare our algorithm for different

θ_{n}

, i.e., for non-inertial accelerated case (

θ_{n} = 0

) and for inertial accelerated case (

θ_{n} \neq 0

). For the non-inertial accelerated case, we just simply take

θ = 0

, and, for the inertial accelerated case, we take a very small

θ

with

θ \in (0, 1)

so that

θ_{n} = {\bar{θ}}_{n} = θ

. Numerical comparisons of our proposed algorithm with inertial version (

θ_{n} \neq 0

) and its non-inertial version (

θ_{n} = 0

) are presented in Table 3.

Remark 2.

Table 1 and Table 2 show that the CPU time and number of iterations of the algorithm increase linearly with the size or complexity of the problem (with the size of dimension p and q, number of mappings R and N, and number of functions M). From Table 3, we can see that our algorithm has a better performance for the stepsize choice

θ_{n} \neq 0

. This implies that the inertial version of our algorithm has a better convergence analysis.

6. Conclusions

In this paper, we have proposed the problem of minimizing a convex function over the solution set of the split feasiblity problem of fixed point problems of demimetric mappings and constrained minimization problems of nonsmooth convex functions. We have showed that this problem can be solved by proximal and gradient methods where the gradient method is used for an upper level problem and the proximal method is used for a lower level problem. Most of the standard bilevel problems are particular cases of our framework.

Author Contributions

S.E.Y., P.K. and A.G.G. contributed equally in this research paper particularly on the conceptualization, methodology, validation, formal analysis, resource, and writing and preparing the original draft of the manuscript; however, P.K. fundamentally plays a great role in supervision and funding acquisition as well. Moreover, A.G.G. particularly wrote the code and run the algorithm in the MATLAB program. All authors have read and agreed to the published version of the manuscript.

Funding

Funding was received from the Petchra Pra Jom Klao Ph.D. Research Scholarship from King Mongkut’s University of Technology Thonburi (KMUTT) and Theoretical and Computational Science (TaCS) Center. Moreover, Poom Kumam was supported by the Thailand Research Fund and the King Mongkut’s University of Technology Thonburi under the TRF Research Scholar Grant No.RSA6080047. The Rajamangala University of Technology Thanyaburi (RMUTTT) (Grant No. NSF62D0604).

Acknowledgments

The authors acknowledge the financial support provided by the Center of Excellence in Theoretical and Computational Science (TaCS-CoE), KMUTT. Seifu Endris Yimer is supported by the Petchra Pra Jom Klao Ph.D. Research Scholarship from King Mongkut’s University of Technology Thonburi (Grant No.9/2561).

Conflicts of Interest

The authors declare no conflict of interest.

References

Su, M.; Xu, H.K. Remarks on the gradient-projection algorithm. J. Nonlinear Anal. Optim. 2010, 1, 35–43. [Google Scholar]
Xu, H.K. Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 2011, 150, 360–378. [Google Scholar] [CrossRef]
Ceng, L.C.; Ansari, Q.H.; Yao, J.C. Some iterative methods for finding fixed points and for solving constrained convex minimization problems. Nonlinear Anal. Theor. 2011, 74, 5286–5302. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J. Variational Analysis; Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
Martinet, B. Brève communication. Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle Série Rouge 1970, 4, 154–158. [Google Scholar] [CrossRef] [Green Version]
Martinet, B. Détermination approchée d’un point fixe d’une application pseudo-contractante. CR Acad. Sci. Paris 1972, 274, 163–165. [Google Scholar]
Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef] [Green Version]
Censor, Y.; Gibali, A.; Reich, S. Algorithms for the split variational inequality problem. Numer. Algorithms 2012, 59, 301–323. [Google Scholar] [CrossRef]
Hendrickx, J.M.; Olshevsky, A. Matrix p-Norms Are NP-Hard to Approximate If p ≠ 1,2,∞. SIAM J. Matrix Anal. A 2010, 31, 2802–2812. [Google Scholar] [CrossRef] [Green Version]
Censor, Y.; Elfving, T. A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 1994, 8, 221–239. [Google Scholar] [CrossRef]
Qu, B.; Xiu, N. A note on the CQ algorithm for the split feasibility problem. Inverse Probl. 2005, 21, 1655. [Google Scholar] [CrossRef]
Byrne, C. Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 2002, 18, 441. [Google Scholar] [CrossRef]
López, G.; Martín-Márquez, V.; Wang, F.; Xu, H.K. Solving the split feasibility problem without prior knowledge of matrix norms. Inverse Probl. 2012, 28, 085004. [Google Scholar] [CrossRef]
Yimer, S.E.; Kumam, P.; Gebrie, A.G.; Wangkeeree, R. Inertial Method for Bilevel Variational Inequality Problems with Fixed Point and Minimizer Point Constraints. Mathematics 2019, 7, 841. [Google Scholar] [CrossRef] [Green Version]
Dempe, S.; Kalashnikov, V.; Pérez-Valdés, G.A.; Kalashnykova, N. Bilevel programming problems. In Energy Systems; Springer: Berlin, Germany, 2015. [Google Scholar]
Anh, T.V. Linesearch methods for bilevel split pseudomonotone variational inequality problems. Numer. Algorithms 2019, 81, 1067–1087. [Google Scholar] [CrossRef]
Anh, T.V.; Muu, L.D. A projection-fixed point method for a class of bilevel variational inequalities with split fixed point constraints. Optimization 2016, 65, 1229–1243. [Google Scholar] [CrossRef]
Yuying, T.; Van Dinh, B.; Plubtieng, S. Extragradient subgradient methods for solving bilevel equilibrium problems. J. Inequal. Appl. 2018. [Google Scholar] [CrossRef] [Green Version]
Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Method Softw. 2019. [Google Scholar] [CrossRef]
Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optimiz. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
Boţ, R.I.; Csetnek, E.R.; Nimana, N. An inertial proximal-gradient penalization scheme for constrained convex optimization problems. Vietnam J. Math. 2018, 46, 53–71. [Google Scholar] [CrossRef] [Green Version]
Solodov, M. An explicit descent method for bilevel convex optimization. J. Convex Anal. 2007, 14, 227. [Google Scholar]
Chang, S.S.; Quan, J.; Liu, J. Feasible iterative algorithms and strong convergence theorems for bi-level fixed point problems. J. Nonlinear Sci. Appl. 2016, 9, 1515–1528. [Google Scholar] [CrossRef] [Green Version]
Didi-Biha, M.; Marcotte, P.; Savard, G. Path-based formulations of a bilevel toll setting problem. In Optimization with Multivalued Mappings; Springer: Boston, MA, USA, 2006; pp. 29–50. [Google Scholar]
Mohideen, M.J.; Perkins, J.D.; Pistikopoulos, E.N. Optimal design of dynamic systems under uncertainty. AIChE J. 1996, 42, 2251–2272. [Google Scholar] [CrossRef]
Fampa, M.; Barroso, L.A.; Candal, D.; Simonetti, L. Bilevel optimization applied to strategic pricing in competitive electricity markets. Comput. Optim. Appl. 2008, 39, 121–142. [Google Scholar] [CrossRef]
Calvete, H.I.; Galé, C.; Oliveros, M.J. Bilevel model for production–distribution planning solved by using ant colony optimization. Comput. Oper. Res. 2011, 38, 320–327. [Google Scholar] [CrossRef]
Takahashi, S.; Takahashi, W. The split common null point problem and the shrinking projection method in Banach spaces. Optimization 2016, 65, 281–287. [Google Scholar] [CrossRef]
Song, Y. Iterative methods for fixed point problems and generalized split feasibility problems in Banach spaces. J. Nonlinear Sci. Appl. 2018, 11, 198–217. [Google Scholar] [CrossRef] [Green Version]
Cabot, A. Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization. SIAM J. Optim. 2005, 15, 555–572. [Google Scholar] [CrossRef]
Helou, E.S.; Simões, L.E. ϵ-subgradient algorithms for bilevel convex optimization. Inverse Probl. 2017, 33, 5. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–7. [Google Scholar] [CrossRef]
Lemaire, B. Which fixed point does the iteration method select? In Recent Advances in Optimization; Springer: Berlin/Heidelberg, Germany, 1997; pp. 154–167. [Google Scholar]
Xu, H.K. Iterative algorithms for nonlinear operators. J. Lond. Math. Soc. 2002, 66, 240–256. [Google Scholar] [CrossRef]
Maingé, P.E. Strong convergence of projected subgradient methods for nonsmooth and nonstrictly convex minimization. Set Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
Combettes, P.L.; Hirstoaga, S.A. Equilibrium programming in Hilbert spaces. J. Nonlinear Convex Anal. 2005, 6, 117–136. [Google Scholar]
Aubin, J.P. Optima and Equilibria: An Introduction to Nonlinear Analysis; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]

Figure 1. Algorithm 1 for different N and different dimensions

p = q

.

Figure 1. Algorithm 1 for different N and different dimensions

p = q

.

Table 1. Performance of Algorithm 1 for different N and different dimensions

p = q

with

T O L = 10^{- 3}

.

Table 1. Performance of Algorithm 1 for different N and different dimensions

p = q

with

T O L = 10^{- 3}

.

		Iter(n)	CPU(s)	${err}_{(n)}$
$N = 3$	$p = 3$	11	0.0600	0.3995
	$p = 8$	16	0.0854	0.4194
	$p = 16$	24	0.0801	0.3935
$N = 6$	$p = 40$	82	0.08920	0.5332
	$p = 80$	131	0.16109	0.8043
	$p = 150$	250	0.21723	0.9099
$N = 10$	$p = 50$	99	0.1295	0.6543
	$p = 100$	137	0.1463	0.7004
	$p = 200$	263	0.2969	0.7841

Table 2. Performance of Algorithm 1 for

N = 8

and different

p = q

with

T O L = 10^{- 4}

.

Table 2. Performance of Algorithm 1 for

N = 8

and different

p = q

with

T O L = 10^{- 4}

.

	$p = 5$		$p = 15$
Iter(n)	CPU(s)	${err}_{(n)}$	CPU(s)	${err}_{(n)}$
1		114.1709		372.5614
2		112.5227		367.2210
3		110.6359		361.0786
4		108.5059		354.1380
5		106.1671		346.5137
⋮		⋮		⋮
100		0.0428		3.2360
101		0.0416		3.0748
181		0.0385		2.8765
182	0.3702	0.0375		2.6385
⋮				⋮
207				0.0756
208			0.5311	0.0746

Table 3. Performance of Algorithm 1 for for different dimensions

θ_{n}

,

N = 3

with

T O L = 10^{- 3}

.

Table 3. Performance of Algorithm 1 for for different dimensions

θ_{n}

,

N = 3

with

T O L = 10^{- 3}

.

	$P = 4$			$P = 20$
	Iter(n)	CPU(s)	${err}_{(n)}$	Iter(n)	CPU(s)	${err}_{(n)}$
$θ = 0.1$	12	0.0811	0.3295	19	0.1005	0.4401
$θ = 0.01$	11	0.0844	0.3112	17	0.0968	0.3224
$θ = 0$	16	0.0960	0.5255	24	0.1203	0.4362

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yimer, S.E.; Kumam, P.; Gebrie, A.G. Proximal Gradient Method for Solving Bilevel Optimization Problems. Math. Comput. Appl. 2020, 25, 66. https://doi.org/10.3390/mca25040066

AMA Style

Yimer SE, Kumam P, Gebrie AG. Proximal Gradient Method for Solving Bilevel Optimization Problems. Mathematical and Computational Applications. 2020; 25(4):66. https://doi.org/10.3390/mca25040066

Chicago/Turabian Style

Yimer, Seifu Endris, Poom Kumam, and Anteneh Getachew Gebrie. 2020. "Proximal Gradient Method for Solving Bilevel Optimization Problems" Mathematical and Computational Applications 25, no. 4: 66. https://doi.org/10.3390/mca25040066

APA Style

Yimer, S. E., Kumam, P., & Gebrie, A. G. (2020). Proximal Gradient Method for Solving Bilevel Optimization Problems. Mathematical and Computational Applications, 25(4), 66. https://doi.org/10.3390/mca25040066

Article Menu

Proximal Gradient Method for Solving Bilevel Optimization Problems

Abstract

1. Introduction

2. Preliminary

3. Main Results

4. Applications

4.1. Application to the Bilevel Variational Inequality Problem

4.2. Application to a Bilevel Optimization Problem with a Feasibility Set Constraint, Inclusion Constraint, and Equilibrium Constraint

5. Numerical Example

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI