A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles

Tang, Chunming; Li, Yanni; Dong, Xiaoxia; He, Bo

doi:10.3390/a13040091

Open AccessArticle

A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles

College of Mathematics and Information Science, Guangxi University, Nanning 540004, China

^*

Author to whom correspondence should be addressed.

Algorithms 2020, 13(4), 91; https://doi.org/10.3390/a13040091

Submission received: 21 January 2020 / Revised: 9 April 2020 / Accepted: 9 April 2020 / Published: 14 April 2020

(This article belongs to the Special Issue Optimization Algorithms and Applications)

Download Versions Notes

Abstract

:

In this paper, we consider a class of structured optimization problems whose objective function is the summation of two convex functions: f and h, which are not necessarily differentiable. We focus particularly on the case where the function f is general and its exact first-order information (function value and subgradient) may be difficult to obtain, while the function h is relatively simple. We propose a generalized alternating linearization bundle method for solving this class of problems, which can handle inexact first-order information of on-demand accuracy. The inexact information can be very general, which covers various oracles, such as inexact, partially inexact and asymptotically exact oracles, and so forth. At each iteration, the algorithm solves two interrelated subproblems: one aims to find the proximal point of the polyhedron model of f plus the linearization of h; the other aims to find the proximal point of the linearization of f plus h. We establish global convergence of the algorithm under different types of inexactness. Finally, some preliminary numerical results on a set of two-stage stochastic linear programming problems show that our method is very encouraging.

Keywords:

nonsmooth convex optimization; bundle method; alternating linearization; on-demand accurary; global convergence

1. Introduction

In this paper, we consider the following structured convex optimization problem

F_{*} : = min_{x \in R^{n}} \{F (x) : = f (x) + h (x)\},

(1)

where

f : dom h \to R

and

h : R^{n} \to (- \infty, \infty]

are closed proper convex functions, but not necessarily differentiable, and

dom h : = {x : h (x) < \infty}

is the effective domain of h. Problems of this type frequently arise in practice, such as compressed sensing [1], image reconstruction [2], machine learning [3], optimal control [4] and power system [5,6,7,8], and so forth. The following are three interesting examples.

Example 1.

(

ℓ_{1}

minimization in compressed sensing). The signal recovery problems in compressed sensing [1] usually take the following form

min_{x \in R^{n}} \frac{1}{2} {∥ A x - b ∥}^{2} + λ {∥ x ∥}_{1},

(2)

where

A \in R^{m \times n}

,

b \in R^{m}

,

λ > 0

and

{∥ x ∥}_{1} : = \sum_{i = 1}^{n} | x_{i} |

, which aims to get a sparse solution x of the linear system

A x = b

. Note that by defining

f (x) = \frac{1}{2} {∥ A x - b ∥}^{2}

and

h (x) = {λ ∥ x ∥}_{1}

, (2) is of the form of (1).

Example 2.

(Regularized risk minimization). At the core of many machine learning problems is to minimize a regularized risk function [9,10]:

min_{x \in R^{n}} R_{emp} (x) + λ Ω (x),

(3)

where

R_{emp} (x) : = \frac{1}{m} \sum_{i = 1}^{m} l (u_{i}, v_{i}, x)

is the empirical risk,

{(u_{i}, v_{i}), i = 1, \dots, m}

is a training set, and l is a convex loss function measuring the gap between v and the predicted values generated by using x. In general,

R_{emp} (x)

is a nondifferentiable and computationally expensive convex function, whereas the regularization term

Ω (x)

is a simple convex function, say

Ω (x) = \frac{1}{2} {∥ x ∥}_{2}^{2}

or

Ω (x) = {∥ x ∥}_{1}

. By defining

f (x) = R_{emp} (x)

and

h (x) = λ Ω (x)

, (3) is also of the form of (1).

Example 3.

(Unconstrained transformation of a constrained problem). Given a constrained problem

min {f (x) : x \in X},

(4)

where f is a convex function and

X \subseteq R^{n}

is a convex set. By introducing the indicator function

δ_{X}

of X, that is,

δ_{X} (x)

equals 0 on X and infinity elsewhere, problem (4) can be written equivalently as

min_{x \in R^{n}} f (x) + δ_{X} (x) .

(5)

Clearly, by setting

h (x) = δ_{X} (x)

, (5) becomes the form of (1). We note that such transformation could be very efficient in practice if the set X has some special structure [11,12].

The design of methods for solving problems of the form (1) has attracted the attention of many researchers. We mention here four classes of these methods—operator splitting methods [13,14,15], alternating direction methods of multipliers [5,16,17,18,19], alternating linearization methods [20,21], and alternating linearization bundle method [22]. They all fall within the well-known class of first-order black-box methods, that is, it is assumed that there is an oracle that can return the (approximate) function value and one arbitrary (approximate) subgradient at any given point. Regarding the above methods, we are concerned about the following three points:

Smoothness of one or both functions in the objective has been assumed for many of the methods.
Except for Reference [22], they all require the exact computation of the function values and (sub)gradients.
The alternating linearization methods [20,21] essentially assume that both functions f and h are “simple” in the sense that minimizing the function plus a separable convex quadratic function is easy.

However, for some practical problems, the functions may be nondifferentiable (nonsmooth), not easy to handle, and computationally expensive in the sense that the exact first-order information may be impossible to calculate, or be computable but difficult to obtain. For example, if f has the form

f (x) : = sup {ϕ_{u} (x) : u \in U},

where U is an infinite set and

ϕ_{u} (x) : R^{n} \to R

is convex for any

u \in U

, then it is often difficult to calculate the exact function value

f (x)

. But for any tolerance

ϵ > 0

, we may usually find a lower approximation

f_{x}^{ϵ} \approx f (x)

in finite time such that

f_{x}^{ϵ} \in [f (x) - ϵ, f (x)]

and

f_{x}^{ϵ} = ϕ_{u_{ϵ}} (x)

for some

u_{ϵ} \in U

. Then we can take a subgradient of

ϕ_{u_{ϵ}}

at x as an approximate subgradient of f at x. Another example is two-stage stochastic programming (see, e.g., References [23,24]), in which the function value is generated after solving a series of linear programs (details will be given in the section of numerical experiments), therefore its accuracy is determined by the tolerance of the linear programming solver. Some other practical examples, such as Lagrangian relaxation, chance-constrained programs and convex composite functions, can be found in Reference [25].

Based on the above observation, in this paper, we focus particularly on the case where the function f is general, possibly nonsmooth and its exact function values and subgradients may be difficult to obtain, whereas the function h is assumed to be relatively simple. Our main goal is to provide an efficient method, namely, the improved alternating linearization bundle method, for such kind of structured convex optimization problems. The basic tool we used here to handle nonsmoothness and inexactness is the bundle technique, since in the nonsmooth optimization community, bundle methods [26,27,28,29] are regarded as the most robust and reliable methods, whose variants have been well studied for handling inexact oracles [23,25,30,31,32,33].

Roughly speaking, our method generalizes the alternating linearization bundle method of Kiwiel [22] from exact and inexact oracles to various oracles, including exact, inexact, partially inexact, asymptotically exact and partially asymptotically exact oracles. These oracles are covered by the so-called on-demand accuracy oracles proposed by de Oliveira and Sagastizábal [23], whose accuracy is controlled by two parameters: a descent target and an error bound. More precisely, the accuracy is bounded by an error bound whenever the function estimation reaches a certain descent target. The most advantage of oracles with on-demand accuracy is that the function and subgradient estimations can be rough without an accuracy control for some “non-critical” iterates, thus the computational effort can be saved.

At each iteration, the proposed algorithm alternately solves two interrelated subproblems: one is to find the proximal point of the polyhedron model of f plus the linearization of h; the other is to find the proximal point of the linearization of f plus h. We establish global convergence of the algorithm under different types of inexactness. Finally, some preliminary numerical results on a set of two-stage stochastic linear programming problems show that our method is very encouraging.

This paper is organized as follows. In Section 2, we recall the condition of the inexact frist-order oracles. In Section 3, we present an improved alternating linearization bundle method for structured convex optimization with inexact first-order oracles and show some properties of the algorithm. In Section 4, we establish global convergence of the algorithm under different types of inexactness. In Section 5, we provide some numerical experiments on two-stage stochastic linear programming problems. The notations are standard. The Euclidean inner product in

R^{n}

is denoted by

〈 x, y 〉 : = x^{T} y

, and the associated norm by

∥ \cdot ∥

.

2. Preliminaries

For a given constant

ϵ \geq 0

, the

ϵ

-subdifferential of function f at x is defined by (see Reference [34])

\partial_{ϵ} f (x) : = {g \in R^{n} : f (y) \geq f (x) + 〈 g, y - x 〉 - ϵ, \forall y \in R^{n}},

with

\partial f (x) : = \partial_{0} f (x)

being the usual subdifferential in convex analysis [35]. Each element

g \in \partial f (x)

is called a subgradient. For simplicity, we use the following notations:

f_{x}

: the approximate f value at x, that is,

f_{x} \approx f (x)

;

g_{x}

: an approximate subgradient of f at x, that is,

g_{x} \approx g (x) \in \partial f (x)

;

F_{x}

: the approximate F value at x, that is,

F_{x} : = f_{x} + h (x)

.

Aiming at the special structure of problem (1), we present a slight variant of the oracles with on-demand accuracy proposed in Reference [23] as follows: for a given

x \in R^{n}

, a descent target

γ_{x}

and an error bound

ε_{x} \geq 0

, the approximate values

f_{x}

,

g_{x}

and

F_{x}

satisfy the following condition

\{\begin{matrix} f_{x} = f (x) - η (γ_{x}) with unknown η (γ_{x}) \geq 0, \\ g_{x} \in \partial_{η (γ_{x})} f (x), and \\ whenever F_{x} \leq γ_{x} (d e s c e n t t a r g e t r e a c h e d), the relation η (γ_{x}) \leq ε_{x} holds . \end{matrix}

(6)

From the relations in (6), we see that although the error

η (γ_{x})

is unknown, it has to be restricted within the error bound

ε_{x}

whenever the descent target

F_{x} \leq γ_{x}

is reached. This ensures that the exact and inexact function values satisfy

f_{x} \in [f (x) - ε_{x}, f (x)] and f (x) \in [f_{x}, f_{x} + ε_{x}], whenever F_{x} \leq γ_{x} .

(7)

The advantages of oracle (6) are that: (1) if the descent target is not reached, the calculation of oracle information can be rough without an accuracy control, which can potentially reduce the computational cost; (2) by properly choosing the parameters

γ_{x}

and

ε_{x}

, the oracle (6) covers various existing oracles:

Exact (Ex) [12,21]: set $γ_{x} = + \infty$ and $ε_{x} = 0$ ;
Partially Inexact (PI) [24]: set $γ_{x} < + \infty$ and $ε_{x} = 0$ ;
Inexact (IE) [11,25,32,36,37]: set $γ_{x} = + \infty$ and $ε_{x} \equiv ε > 0$ (possibly unknown);
Asymptotically Exact (AE) [38,39]: set $γ_{x} = + \infty$ and $ε_{x} \to 0$ along the iterative process;
Partially Asymptotically Exact (PAE) [23]: set $γ_{x} < + \infty$ and $ε_{x} \to 0$ .

3. The Generalized Alternating Linearization Bundle Method

In this section, we present our generalized alternating linearization bundle method with inexact first-order oracles for solving (1).

Let k be the current iteration index,

x^{j}, j \in J^{k} \subseteq {1, \dots, k}

be given points generated by previous iterations, and the corresponding approximate values

f_{x^{j}} / g_{x^{j}}

be produced by the oracle (6). For notational convenience, we denote

f_{x}^{j} : = f_{x^{j}}, g_{x}^{j} : = g_{x^{j}}, F_{x}^{j} : = F_{x^{j}}, ε_{x}^{j} : = ε_{x^{j}}, γ_{x}^{j} : = γ_{x^{j}} .

The approximate linearizations of f at

x^{j}

are given by

f_{j} (\cdot) : = f_{x}^{j} + 〈 g_{x}^{j}, \cdot - x^{j} 〉, j \in J^{k} .

From the second relation in (6), we have

f (\cdot) \geq f (x^{j}) + 〈 g_{x}^{j}, \cdot - x^{j} 〉 - η (γ_{x}^{j}) = f_{j} (\cdot),

which implies that

f_{j}

is a lower approximation to f. Next, it is natural to define the polyhedral inexact cutting-plane model of f by

{\overset{ˇ}{f}}_{k} (\cdot) : = max_{j \in J_{k}} \{f_{j} (\cdot)\},

(8)

which is obviously a lower polyhedral model for f, that is,

{\overset{ˇ}{f}}_{k} (\cdot) \leq f (\cdot)

.

Let

{\hat{x}}^{k}

(called stability center) be the “best” point obtained so far, which satisfies that

{\hat{x}}^{k} = x^{k (l)}

for some

k (l) \leq k

. Frequently, it holds that

f_{x}^{k (l)} = min_{j = 1, \dots, k} f_{x}^{j}

. Thus, from (7), we have

f ({\hat{x}}^{k}) \in [f_{\hat{x}}^{k}, f_{\hat{x}}^{k} + ε_{x}^{k (l)}], whenever F_{\hat{x}}^{k} \leq γ_{\hat{x}}^{k} .

(9)

By applying the bundle idea to the “complex” function f, and keeping the simple function h unchanged, similar to traditional proximal bundle methods (see, e.g., Reference [28]), we may solve the following subproblem to obtain a new iterate

x^{k + 1}

:

x^{k + 1} : = arg min {\overset{ˇ}{f}}_{k} (\cdot) + h (\cdot) + \frac{1}{2 t_{k}} {∥ \cdot - {\hat{x}}^{k} ∥}^{2},

(10)

where

t_{k} > 0

is a proximal parameter. However, subproblem (10) is generally not easy to solve, so by making use of the alternating linearization idea of Kiwiel [22], we solve two easier subproblems instead of (10). These two subproblems are interrelated: one is to find the proximal point of the polyhedron model

\overset{ˇ}{f}

plus the linearization of h, aiming at generating an aggregate linear model of f for use in the second subproblem; the other is to find the proximal point of the aggregate linear model of f plus h, aiming at obtaining a new trial point.

Now, we are ready to present the details of our algorithm, which generalizes the work of Kiwiel [22]. We note that the choice of the model function

{\overset{ˇ}{f}}_{k}

in the algorithm may be different from the form of (8), since the subgradient aggregation strategy [40] is used to compress the bundle. The algorithm generates three sequences of iterates as follows:

{y^{k}}

, the sequence of intermediate points, at which the aggregate linear models of f are generated;

{x^{k}}

, the sequence of trial points;

{{\hat{x}}^{k}}

, the sequence of stability centers.

We make some comments about Algorithm 1 as follows.

Algorithm 1 Generalized alternating linearization bundle method

Step 0 (Initialization). Select an initial point

x^{1} \in R^{n}

, constants

κ \in (0, 1)

,

t_{\min} > 0

, and an initial stepsize

t_{1} \geq t_{\min}

. Call the oracle (6) at

x^{1}

to compute the approximate values

f_{x}^{1}

and

g_{x}^{1}

. Choose an initial error bound

ε_{x}^{1} \geq 0

and a descent target

γ_{x}^{1} = + \infty

. Set

{\hat{x}}^{1} : = x^{1}

,

f_{\hat{x}}^{1} : = f_{x}^{1}

,

F_{\hat{x}}^{1} : = f_{\hat{x}}^{1} + h ({\hat{x}}^{1})

,

{\bar{f}}_{0} : = f_{1}

, and

{\bar{h}}_{0} (\cdot) : = h (x^{1}) + 〈 p_{h}^{0}, \cdot - x^{1} 〉

with

p_{h}^{0} \in \partial h (x^{1})

. Let

i_{t}^{1} : = 0

,

l : = 1

,

k (l) : = 1

and

k : = 1

.
Step 1 (Model selection). Choose

{\overset{ˇ}{f}}_{k} : R^{n} \to R

closed convex and such that

max {{\bar{f}}_{k - 1}, f_{k}} \leq {\overset{ˇ}{f}}_{k} \leq f .

Step 2 (Solve f-subproblem). Set

y^{k + 1} : = arg min \{ϕ_{f}^{k} (\cdot) : = {\overset{ˇ}{f}}_{k} (\cdot) + {\bar{h}}_{k - 1} (\cdot) + \frac{1}{2 t_{k}} {∥ \cdot - {\hat{x}}^{k} ∥}^{2}\},

(11)

{\bar{f}}_{k} (\cdot) : = {\overset{ˇ}{f}}_{k} (y^{k + 1}) + 〈 p_{f}^{k}, \cdot - y^{k + 1} 〉 with p_{f}^{k} : = \frac{1}{t_{k}} ({\hat{x}}^{k} - y^{k + 1}) - p_{h}^{k - 1} .

(12)

Step 3 (Solve h-subproblem). Set

x^{k + 1} : = arg min \{ϕ_{h}^{k} (\cdot) : = {\bar{f}}_{k} (\cdot) + h (\cdot) + \frac{1}{2 t_{k}} {∥ \cdot - {\hat{x}}^{k} ∥}^{2}\},

(13)

{\bar{h}}_{k} (\cdot) : = h (x^{k + 1}) + 〈 p_{h}^{k}, \cdot - x^{k + 1} 〉 with p_{h}^{k} : = \frac{1}{t_{k}} ({\hat{x}}^{k} - x^{k + 1}) - p_{f}^{k} .

(14)

Step 4 (Stopping criterion). Compute

v_{k} : = F_{\hat{x}}^{k} - [{\bar{f}}_{k} (x^{k + 1}) + h (x^{k + 1})], p^{k} : = \frac{1}{t_{k}} ({\hat{x}}^{k} - x^{k + 1}), ϵ_{k} : = v_{k} - t_{k} {∥ p^{k} ∥}^{2} .

(15)

If

max {∥ p^{k} ∥, ϵ_{k}} = 0

, stop.
Step 5 (Noise attenuation). If

v_{k} < - ϵ_{k}

, set

t_{k} : = 10 t_{k}

,

i_{t}^{k} : = k

, and go back to Step 2.
Step 6 (Call oracle). Select a new error bound

ε_{x}^{k + 1} \geq 0

and a new descent target

γ_{x}^{k + 1} \in R \cup {+ \infty}

. Call the oracle (6) to compute

f_{x}^{k + 1}

and

g_{x}^{k + 1}

.
Step 7 (Descent test). If the descent condition

F_{x}^{k + 1} \leq F_{\hat{x}}^{k} - κ v_{k}

(16)

holds, set

{\hat{x}}^{k + 1} : = x^{k + 1}

,

F_{\hat{x}}^{k + 1} : = F_{x}^{k + 1}

,

i_{t}^{k + 1} : = 0

,

k (l + 1) : = k + 1

, and

l : = l + 1

(descent step); otherwise, set

{\hat{x}}^{k + 1} : = {\hat{x}}^{k}

,

F_{\hat{x}}^{k + 1} : = F_{\hat{x}}^{k}

,

i_{t}^{k + 1} : = i_{t}^{k}

, and

k (l + 1) = k (l)

(null step).
Step 8 (Stepsize updating). For a descent step, select

t_{k + 1} \geq t_{k}

. For a null step, either set

t_{k + 1} : = t_{k}

or choose

t_{k + 1} \in [t_{min}, t_{k}]

if

i_{t}^{k + 1} = 0

.
Step 9 (Loop). Let

k : = k + 1

, and go to Step 1.

Remark 1.

(i) Theoretically speaking, the model function

{\overset{ˇ}{f}}_{k}

can be the simplest form

max {{\bar{f}}_{k - 1}, f_{k}}

, but in order to keep numerical stability, it may additionally consist of some active linearizations.

(ii) Alternately solving subproblems (11) and (13) can be regarded as the proximal alternating linearization method (e.g., Reference [21]) being applied to the function

{\overset{ˇ}{f}}_{k} + h

.

(iii) If

{\overset{ˇ}{f}}_{k}

is a polyhedral function, then subproblem (11) is equivalent to a convex quadratic programming and thus can be solved efficiently. In addition, if h is simple, subproblem (13) can also be solved easily, or even has a closed-form solution (say

h (x) = \frac{1}{2} {∥ x ∥}^{2}

).

(iv) The role of Step 5 is to reduce the impact of inexactness. The algorithm loops between steps 2–5 by increasing the step size

t_{k}

until

v_{k} \geq - ε_{k}

.

(v) The stability center, descent target and error bound keep unchanged in the loop between Steps 2 and 5

(vi) In order to establish global convergence of the algorithm, the descent target and error bound at Step 6 should be suitably updated. Some detailed rules are presented in the next section.

The following lemma summarizes some fundamental properties of Algorithm 1, whose proof is a slight modification of that in [22], Lemma 2.2.

Lemma 1.

(i) The vectors

p_{f}^{k}

and

p_{h}^{k}

of (12) and (14) satisfy

p_{f}^{k} \in \partial {\overset{ˇ}{f}}_{k} (y^{k + 1}) a n d p_{h}^{k} \in \partial h (x^{k + 1}) .

(17)

The linearizations

{\bar{f}}_{k}

,

{\bar{h}}_{k}

,

{\bar{F}}_{k}

satisfy the following inequalities

{\bar{f}}_{k} \leq {\overset{ˇ}{f}}_{k}, {\bar{h}}_{k} \leq h a n d {\bar{F}}_{k} : = {\bar{f}}_{k} + {\bar{h}}_{k} \leq F .

(18)

(ii) The aggregate subgradient

p_{k}

defined in (15) and the above linearization

{\bar{F}}_{k}

can be expressed as follows

p^{k} = p_{f}^{k} + p_{h}^{k} = \frac{1}{t_{k}} ({\hat{x}}^{k} - x^{k + 1}),

(19)

{\bar{F}}_{k} (\cdot) = {\bar{F}}_{k} (x^{k + 1}) + 〈 p^{k}, \cdot - x^{k + 1} 〉 .

(iii) The predicted descent

v_{k}

and the aggregate linearization error

ϵ_{k}

of (15) satisfy

v_{k} = t_{k} {∥ p^{k} ∥}^{2} + ϵ_{k} a n d ϵ_{k} = F_{\hat{x}}^{k} - {\bar{F}}_{k} ({\hat{x}}^{k}) .

(20)

(iv) The aggregate linearization

{\bar{F}}_{k}

is also expressed

F_{\hat{x}}^{k} - ϵ_{k} + 〈 p^{k}, \cdot - {\hat{x}}^{k} 〉 = {\bar{F}}_{k} (\cdot) \leq F (\cdot) .

(21)

(v) Denote the optimality measure by

V_{k} : = max {∥ p^{k} ∥, ϵ_{k} + 〈 p^{k}, {\hat{x}}^{k} 〉},

(22)

which satisfies

V_{k} \leq max {∥ p^{k} ∥, ϵ_{k}} (1 + ∥ {\hat{x}}^{k} ∥)

(23)

and

F_{\hat{x}}^{k} \leq F (x) + V_{k} (1 + ∥ x ∥), \forall x .

(24)

(v i)

We have the relations

v_{k} \geq - ϵ_{k} \Leftrightarrow t_{k} ∥ p^{k} ∥^{2} / 2 \geq - ϵ_{k} \Leftrightarrow v_{k} \geq t_{k} {∥ p^{k} ∥}^{2} / 2, v_{k} \geq ϵ_{k} .

(25)

Moreover, if

F_{\hat{x}}^{k} \leq γ_{\hat{x}}^{k}

, then we have

- ϵ_{k} \leq ε_{x}^{k (l)}

and

v_{k} \geq max \{\frac{t_{k} {∥ p^{k} ∥}^{2}}{2}, | ϵ_{k} |\} i f v_{k} \geq {- ϵ}_{k},

(26)

V_{k} \leq max \{{(\frac{2 v_{k}}{t_{k}})}^{1 / 2}, v_{k}\} (1 + ∥ {\hat{x}}^{k} ∥) i f v_{k} \geq - ϵ_{k},

(27)

V_{k} < {(\frac{2 ε_{x}^{k (l)}}{t_{k}})}^{1 / 2} (1 + ∥ {\hat{x}}^{k} ∥) i f v_{k} < - ϵ_{k} .

(28)

Proof.

(i) From the optimality condition of subproblem (11), we obtain

\begin{matrix} 0 \in \partial ϕ_{f}^{k} (y^{k + 1}) = \partial \overset{ˇ}{f_{k}} (y^{k + 1}) + p_{h}^{k - 1} + \frac{1}{t_{k}} (y^{k + 1} - {\hat{x}}^{k}) = \partial \overset{ˇ}{f_{k}} (y^{k + 1}) - p_{f}^{k}, \end{matrix}

which implies

p_{f}^{k} \in \partial {\overset{ˇ}{f}}_{k} (y^{k + 1})

. In addition, the fact that

{\bar{f}}_{k} (y^{k + 1}) = \overset{ˇ}{f_{k}} (y^{k + 1})

yields

{\bar{f}}_{k} \leq \overset{ˇ}{f_{k}}

. Similarly, by the optimality condition of (14), we have

0 \in \partial ϕ_{h}^{k} (x^{k + 1}) = p_{f}^{k} + \partial h (x^{k + 1}) + \frac{1}{t_{k}} (x^{k + 1} - {\hat{x}}^{k}) = \partial h (x^{k + 1}) - p_{h}^{k},

which shows

p_{h}^{k} \in \partial h (x^{k + 1})

. Further from

\bar{h} (x^{k + 1}) = h (x^{k + 1})

, we obtain

{\bar{h}}_{k} \leq h

. Finally, it follows that

{\bar{F}}_{k} = {\bar{f}}_{k} + {\bar{h}}_{k} \leq {\overset{ˇ}{f}}_{k} + h \leq F .

(ii) By (14), we obtain

p_{f}^{k} + p_{h}^{k} = p_{f}^{k} + \frac{1}{t_{k}} ({\hat{x}}^{k} - x^{k + 1}) - p_{f}^{k} = \frac{1}{t_{k}} ({\hat{x}}^{k} - x^{k + 1}) = p^{k} .

Utilizing the linearity of

{\bar{F}}_{k} (\cdot)

, (12) and (19), we derive

\begin{matrix} {\bar{F}}_{k} (\cdot) & = \bar{f_{k}} (\cdot) + {\bar{h}}_{k} (\cdot) \\ = {\overset{ˇ}{f}}_{k} (y^{k + 1}) + 〈 p_{f}^{k}, \cdot - y^{k + 1} 〉 + h (x^{k + 1}) + 〈 p_{h}^{k}, \cdot - x^{k + 1} 〉 \\ = {\bar{f}}_{k} (x^{k + 1}) - 〈 p_{f}^{k}, x^{k + 1} - y^{k + 1} 〉 + 〈 p_{f}^{k}, \cdot - y^{k + 1} 〉 + h (x^{k + 1}) + 〈 p_{h}^{k}, \cdot - x^{k + 1} 〉 \\ = \bar{f_{k}} (x^{k + 1}) + 〈 p_{f}^{k}, \cdot - x^{k + 1} 〉 + h (x^{k + 1}) + 〈 p_{h}^{k}, \cdot - x^{k + 1} 〉 \\ = {\bar{F}}_{k} (x^{k + 1}) + 〈 p^{k}, \cdot - x^{k + 1} 〉 . \end{matrix}

(iii) We obtain directly

v_{k} = ϵ_{k} + t_{k} {∥ p^{k} ∥}^{2}

from (15). Combining (15) and (ii), we have

\begin{matrix} ϵ_{k} & = v_{k} - t_{k} {∥ p^{k} ∥}^{2} \\ = F_{\hat{x}}^{k} - [{\bar{f}}_{k} (x^{k + 1}) + h (x^{k + 1})] - t_{k} {∥ p^{k} ∥}^{2} \\ = F_{\hat{x}}^{k} - {\bar{F}}_{k} ({\hat{x}}^{k}) + 〈 p^{k}, {\hat{x}}^{k} - x^{k + 1} 〉 - t_{k} {∥ p^{k} ∥}^{2} \\ = F_{\hat{x}}^{k} - {\bar{F}}_{k} ({\hat{x}}^{k}) . \end{matrix}

(iv) Since

ϵ_{k} = v_{k} - t_{k} ∥ p^{k} ∥^{2} = F_{\hat{x}}^{k} - [{\bar{f}}_{k} (x^{k + 1}) + h (x^{k + 1})] - t_{k} {∥ p^{k} ∥}^{2}

, the aggregate lineaization

{\bar{F}}_{k} (\cdot)

satisfies

F_{\hat{x}}^{k} - ϵ_{k} + 〈 p^{k}, \cdot - {\hat{x}}^{k} 〉 = {\bar{F}}_{k} (x^{k + 1}) + 〈 p^{k}, \cdot - x^{k + 1} 〉 = {\bar{F}}_{k} (\cdot) \leq F (\cdot) .

(v) Using the Cauchy-Schwarz inequality in the definition (22) gives

\begin{matrix} V_{k} & = max {∥ p_{k} ∥, ϵ_{k} + 〈 p^{k}, {\hat{x}}^{k} 〉} \\ \leq max {∥ p^{k} ∥, ϵ_{k} + ∥ p^{k} ∥ ∥ {\hat{x}}^{k} ∥} \\ \leq max {∥ p^{k} ∥, ϵ_{k}} + ∥ p^{k} ∥ ∥ {\hat{x}}^{k} ∥ \\ \leq max {∥ p^{k} ∥, ϵ_{k}} (1 + ∥ {\hat{x}}^{k} ∥) . \end{matrix}

From (21), we have

\begin{matrix} F_{\hat{x}}^{k} & \leq F (x) + ϵ_{k} - 〈 p^{k}, x - {\hat{x}}^{k} 〉 \\ = F (x) + ϵ_{k} - 〈 p^{k}, x 〉 + 〈 p^{k}, {\hat{x}}^{k} 〉 \\ \leq F (x) + ∥ p^{k} ∥ ∥ x ∥ + ϵ_{k} + 〈 p^{k}, {\hat{x}}^{k} 〉 \\ \leq F (x) + max {∥ p^{k} ∥, ϵ_{k} + 〈 p^{k}, {\hat{x}}^{k} 〉} (1 + ∥ x ∥) \\ = F (x) + V_{k} (1 + ∥ x ∥), \forall x . \end{matrix}

(vi) By (iii), it is easy to get (25). Next, by (18), (20) and (9), we conclude that, if

F_{\hat{x}}^{k} \leq γ_{\hat{x}}^{k}

,

- ϵ_{k} = {\bar{F}}_{k} ({\hat{x}}^{k}) - F_{\hat{x}}^{k} \leq F ({\hat{x}}^{k}) - F_{\hat{x}}^{k} = f ({\hat{x}}^{k}) - f_{\hat{x}}^{k} \leq ε_{x}^{k (l)} .

Relation (26) follows from the facts that

v_{k} \geq ϵ_{k}

and

v_{k} \geq t_{k} {∥ p^{k} ∥}^{2} / 2

. Relation (27) follows from (23),

∥ p^{k} ∥ \leq {(\frac{2 v_{k}}{t_{k}})}^{1 / 2}

and

ϵ_{k} \leq v_{k}

. Finally, if

v_{k} < - ϵ_{k}

, we obtain

∥ p^{k} ∥^{2} < \frac{- 2 ϵ_{k}}{t_{k}}

, which together with

- ϵ_{k} \leq ε_{x}^{k (l)}

shows that

∥ p^{k} ∥ < {(\frac{2 ε_{x}^{k (l)}}{t_{k}})}^{1 / 2}

, and therefore (28) holds. □

Remark 2.

Relation (17) shows that

p_{f}^{k}

is a subgradient of the model function

{\overset{ˇ}{f}}_{k}

at

y^{k + 1}

and that

p_{h}^{k}

is a subgradient of h at

x^{k + 1}

.

V_{k}

defined by (22) can be viewed as an optimality measure of the iterates, which will be proved to converge to zero in the next section. Relation (24) is also a test for optimality, in that

{\hat{x}}^{k}

is an approximate optimal solution to problem (1) whenever

V_{k}

is sufficiently small.

4. Global Convergence

This section aims to establish the global convergence of Algorithm 1 for various oracles. These oracles are controlled by two parameters: the error bound

ε_{x}

and the descent target

γ_{x}

. In Table 1, we present the choices of these two parameters for different type of instances described in Section 2, including Exact (Ex), Partially Inexact (PI), Inexact (IE), Asymptotically Exact (AE) and Partially Asymptotically Exact (PAE) oracles, where the constants are selected as

θ

,

κ \in (0, 1)

, and

κ_{ϵ} \in (0, κ)

.

The following lemma is crucial to guarantee the global convergence of Algorithm 1.

Lemma 2.

The descent target is always reached at the stability centers, that is,

F_{\hat{x}}^{k} \leq γ_{\hat{x}}^{k}

for all

k \geq 1

.

Proof.

For instances Ex, IE and AE, since

γ_{x}^{k} = + \infty

, the claim holds immediately.

For instances PI and PAE, we have

γ_{x}^{k + 1} = F_{\hat{x}}^{k} - θ κ v_{k}

. Thus, for

k = 1

, from Step 0 it follows that

{\hat{x}}^{1} = x^{1}

,

f_{\hat{x}}^{1} = f_{x}^{1}

and

γ_{\hat{x}}^{1} = γ_{x}^{1} = + \infty

. This implies

F_{\hat{x}}^{1} = f_{\hat{x}}^{1} + h ({\hat{x}}^{1}) \leq γ_{\hat{x}}^{1}

. In addition, for

k \geq 2

, since

θ \in (0, 1)

, once the descent test (16) is satisfied at iteration

k - 1

, we have

F_{\hat{x}}^{k} \leq F_{\hat{x}}^{k - 1} - κ v_{k - 1} \leq F_{\hat{x}}^{k - 1} - θ κ v_{k - 1} = γ_{x}^{k} = γ_{\hat{x}}^{k} .

□

The following lemma shows that an (approximate) optimal solution can be obtained whenever the algorithm terminates finitely or loops infinitely between Steps 2 and 5.

Lemma 3.

If either Algorithm 1 terminates at the kth iteration at Step 4, or loops between Steps 2 and 5 infinitely, then

(i): ${\hat{x}}^{k}$ is an optimal solution to problem (1) for instances Ex and PI.
(ii): ${\hat{x}}^{k}$ is ε-optimal, that is, $F ({\hat{x}}^{k}) \leq F_{*} + ε$ , for instance IE.
(iii): ${\hat{x}}^{k}$ is $ε_{x}^{k (l)}$ -optimal, that is, $F ({\hat{x}}^{k}) \leq F_{*} + ε_{x}^{k (l)}$ , for instances AE and PAE.

Proof.

Firstly, suppose that Algorithm 1 terminates at Step 4 with iteration k. Then from (23), we have

V_{k} = 0

. This together with (24) shows that

F_{\hat{x}}^{k} \leq inf {F (x) : x \in R^{n}} = F_{*} .

(29)

Thus, from (7), we can conclude that:

F ({\hat{x}}^{k}) = F_{\hat{x}}^{k} \leq F_{*}

for instances Ex and PI;

F ({\hat{x}}^{k}) \leq F_{\hat{x}}^{k} + ε \leq F_{*} + ε

for instance IE;

F ({\hat{x}}^{k}) \leq F_{\hat{x}}^{k} + ε_{x}^{k (l)} \leq F_{*} + ε_{x}^{k (l)}

for instances AE and PAE.

Secondly, suppose that Algorithm 1 loops between Steps 2 and 5 infinitely. Then from Lemma 2 and the condition at Step 5, it follows that (28) holds and

t_{k} ↑ \infty

. Thus, we obtain

V_{k} \to 0

. This along with (24) implies (29), and therefore the claims hold by repeating the corresponding lines in first case. □

From the above lemma, in what follows, we may assume that Algorithm 1 neither terminates finitely nor loops infinitely between Steps 2 and 5. In addition, as in Reference [22], we assume that the model subgradients

p_{f}^{k} \in \partial {\overset{ˇ}{f}}_{k} (y^{k + 1})

in (17) satisfy that

{p_{f}^{k}}

is bounded if

{y^{k}}

is bounded.

Algorithm 1 must take only one of the following two cases:

(i): the algorithm generates finitely many descent steps;
(ii): the algorithm generates infinitely many descent steps.

We first consider case (i), in which two subcases may occur:

t_{\infty} : = {lim}_{k} t_{k} = \infty

and

t_{\infty} < \infty

. The first subcase of

t_{\infty} = \infty

is analyzed in the following lemma.

Lemma 4.

Suppose that Algorithm 1 generates finitely many descent steps, that is, there exists an index

\bar{k}

such that only null steps occur for all

k \geq \bar{k}

, and that

t_{\infty} = \infty

. Denote

K : = {k \geq \bar{k} : t_{k + 1} > t_{k}}

, then

V_{k} \to 0

as

k \in K, k \to \infty

.

Proof.

For the last time

t_{k}

increases before Step 5 for

k \in K

, one has

V_{k} < {(\frac{2 ε_{x}^{\bar{k}}}{t_{k}})}^{1 / 2} (1 + ∥{\hat{x}}^{\bar{k}}∥),

which along with

t_{k} \to \infty

shows the lemma. □

The following lemma analyzes the second subcase of

t_{\infty} < \infty

.

Lemma 5.

Suppose that there exists

\bar{k}

such that

{\hat{x}}^{k} = {\hat{x}}^{\bar{k}}

and

t_{\min} \leq t_{k + 1} \leq t_{k}

for all

k \geq \bar{k}

. If the descent criterion (16) fails for all

k \geq \bar{k}

, then

V_{k} \to 0

.

Proof.

In view of the facts that

t_{\min} \leq t_{k + 1} \leq t_{k}

and

{\hat{x}}^{k} = {\hat{x}}^{\bar{k}}

for all

k \geq \bar{k}

, we know that only null steps occur and

t_{k}

does not increase at Step 5. By Taylor’s expansion, Cauchy-Schwarz inequality, and the properties of subproblems (11) and (13), we can conclude that

v_{k} \to 0

, so the conclusion holds from (27). For more details, one can refer to [11], Lemma 3.2. □

By combining Lemmas 4 and 5, we have the following lemma.

Lemma 6.

Suppose that there exists

\bar{k}

such that only null steps occur for all

k \geq \bar{k}

. Let

K : = {k \geq \bar{k} : t_{k + 1} > t_{k}}

if

t_{k} \to \infty

;

K : = {k : k \geq \bar{k}}

otherwise. Then

V_{k} \overset{K}{\to} 0

.

Now, we can present the main convergence result for the case where the algorithm generates finitely many descent steps.

Theorem 1.

Suppose that Algorithm 1 generates finitely many descent steps, and that

{\hat{x}}^{\bar{k}}

is the last stability center. Then,

{\hat{x}}^{\bar{k}}

is an optimal solution to problem (1) for instances Ex and PI; an ε-optimal solution for IE; and an

ε_{x}^{\bar{k} (l)}

-optimal solution for AE and PAE.

Proof.

Under the stated assumption, we know that

{\hat{x}}^{k} \equiv {\hat{x}}^{\bar{k}}

and

f_{\hat{x}}^{k} \equiv f_{\hat{x}}^{\bar{k}}

for all

k \geq \bar{k}

. This together with (24) and Lemma 6 shows that

F_{\hat{x}}^{\bar{k}} \leq inf {F (x) : x \in R^{n}} = F_{*} .

Hence, similar to the proof of Lemma 3, we obtain the results of the theorem. □

Next, we consider the second case where the algorithm generates infinitely many descent steps.

Lemma 7.

Suppose that Algorithm 1 generates infinitely many descent steps, and that

F_{\hat{x}}^{\infty} : = lim_{k} F_{\hat{x}}^{k} > - \infty

. Let

K : = {k : F_{\hat{x}}^{k + 1} < F_{\hat{x}}^{k}}

. Then

v_{k} \overset{K}{\to} 0

and

{lim_{¯}}_{k \in K} V_{k} = 0

. Moreover, if

{{\hat{x}}^{k}}

is bounded, then

V_{k} \overset{K}{\to} 0

.

Proof.

From the descent test condition (16), we may first prove that

v_{k} \overset{K}{\to} 0

, and therefore

ϵ_{k}, t_{k} {∥ p_{k} ∥}^{2} \overset{K}{\to} 0

from (26) and

∥ p_{k} ∥ \overset{K}{\to} 0

from the fact that

t_{k} \geq t_{\min}

. It can be further proved that

ϵ_{k}, ∥ p_{k} ∥ \overset{K}{\to} 0

, so it follows

{lim_{¯}}_{k \in K} V_{k} = 0

from the definition of

V_{k}

. Moreover, under the condition that

{{\hat{x}}^{k}}

is bounded, we have

V_{k} \overset{K}{\to} 0

by (v) of Lemma 1. For more details, one can refer to [11], Lemma 3.4. □

Finally, we present the convergence results for the second case.

Theorem 2.

Suppose that Algorithm 1 generates infinitely many descent steps,

F_{\hat{x}}^{\infty} > - \infty

, and that the index set K is defined in Lemma 7. Then

(i): $F_{*} \leq {lim_{¯}}_{k \in K} F ({\hat{x}}^{k + 1}) \leq {lim^{¯}}_{k \in K} F ({\hat{x}}^{k + 1}) \leq F_{\hat{x}}^{\infty} + ε$ for instance IE in Table 1;
(ii): $F_{*} \leq {lim_{¯}}_{k \in K} F ({\hat{x}}^{k + 1}) \leq {lim^{¯}}_{k \in K} F ({\hat{x}}^{k + 1}) \leq F_{\hat{x}}^{\infty}$ for the remaining instances in Table 1;
(iii): ${lim_{¯}}_{k \in K} V_{k} = 0$ and $F_{\hat{x}}^{k} ↓ F_{\hat{x}}^{\infty} \leq F_{*}$ .

Proof.

It is obvious that

F_{*} \leq \underset{k \in K}{lim_{¯}} F ({\hat{x}}^{k + 1}) \leq \underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) .

(30)

(i) For instance IE, it follows that

ε_{\hat{x}}^{k + 1} = ε

and

F_{\hat{x}}^{k + 1} \leq γ_{\hat{x}}^{k + 1} = + \infty

for all

k \in K

. Then from (7), we have

F ({\hat{x}}^{k + 1}) \leq F_{\hat{x}}^{k + 1} + ε

,

\forall k \in K

, which implies

\underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) \leq lim_{k \in K} F_{\hat{x}}^{k + 1} + ε = F_{\hat{x}}^{\infty} + ε .

This along with (30) shows part (i).

(ii) Next, the other four instances in Table 1 are considered separately.

For instance Ex, we have

ε_{\hat{x}}^{k + 1} = 0

,

F_{\hat{x}}^{k + 1} \leq γ_{\hat{x}}^{k + 1} = + \infty

and

F ({\hat{x}}^{k + 1}) = F_{\hat{x}}^{k + 1}

. This implies

\underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) = lim_{k \in K} F_{\hat{x}}^{k + 1} = F_{\hat{x}}^{\infty} .

For instance PI, we have

ε_{\hat{x}}^{k + 1} = 0

and

γ_{\hat{x}}^{k + 1} = F_{\hat{x}}^{k} - θ κ v_{k}

for all

k \in K

. Thus, we obtain

F_{\hat{x}}^{k + 1} \leq F_{\hat{x}}^{k} - κ v_{k} \leq F_{\hat{x}}^{k} - θ κ v_{k} = γ_{\hat{x}}^{k + 1} .

This implies

F ({\hat{x}}^{k + 1}) = F_{\hat{x}}^{k + 1}

, and therefore

\underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) = lim_{k \in K} F_{\hat{x}}^{k + 1} = F_{\hat{x}}^{\infty} .

For instance AE, we have

ε_{\hat{x}}^{k + 1} = κ_{ϵ} v_{k}

and

F_{\hat{x}}^{k + 1} \leq γ_{\hat{x}}^{k + 1} = + \infty

for all

k \in K

, which implies

F ({\hat{x}}^{k + 1}) \leq F_{\hat{x}}^{k + 1} + ε_{\hat{x}}^{k + 1} \leq F_{\hat{x}}^{k + 1} + κ_{ϵ} v_{k} .

This along with Lemma 7 (

v_{k} \overset{K}{\to} 0

) shows that

\underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) \leq lim_{k \in K} F_{\hat{x}}^{k + 1} = F_{\hat{x}}^{\infty} .

For instance PAE, we have

ε_{\hat{x}}^{k + 1} = κ_{ϵ} v_{k}

and

γ_{\hat{x}}^{k + 1} = F_{\hat{x}}^{k} - θ κ v_{k}

for all

k \in K

. Then, it follows that

F_{\hat{x}}^{k + 1} \leq F_{\hat{x}}^{k} - κ v_{k} \leq F_{\hat{x}}^{k} - θ κ v_{k} = γ_{\hat{x}}^{k + 1},

which implies

F ({\hat{x}}^{k + 1}) \leq F_{\hat{x}}^{k + 1} + κ_{ϵ} v_{k} .

Again from Lemma 7, we obtain

\underset{k \in K}{lim^{¯}} F ({\hat{x}}^{k + 1}) \leq lim_{k \in K} F_{\hat{x}}^{k + 1} = F_{\hat{x}}^{\infty} .

Summarizing the above analysis and noticing (30), we complete the proof of part (ii).

(iii) From Lemma 7, we know that

{lim_{¯}}_{k \in K} V_{k} = 0

. This together with (24) shows part (iii). □

5. Numerical Experiments

In this section, we aim to test the numerical efficiency of the proposed algorithm. In the fields of production and transportation, finance and insurance, power industry, and telecommunications, decision makers usually need to solve problems with uncertain information. As an effective tool to solve such problems, stochastic programming (SP) has attracted more and more attention and research on its practical instances and theories; see, for example, References [41,42]. We consider a class of two-stage SP problems with fixed recourse, whose discretization of uncertainty into N scenarios has the form (see e.g., References [23,43])

\begin{matrix} min & f (x) : = 〈 c, x 〉 + \sum_{i = 1}^{N} p_{i} V_{i} (x) \\ s . t . & x \in X : = {x \in R_{+}^{n_{1}} : A x = b}, \end{matrix}

(31)

where x is the first-stage decision variable,

c \in R^{n_{1}}

,

A \in R^{m_{1} \times n_{1}}

, and

b \in R^{m_{1}}

. In addition, the recourse function is

V_{i} (x) : = min_{π \in R_{+}^{n_{2}}} {〈 q, π 〉 : W π = h_{i} - T_{i} x},

where corresponding to the ith scenario

(h_{i}, T_{i})

, with probability

p_{i} > 0

for

h_{i} \in R^{m_{2}}

and

T_{i} \in R^{m_{2} \times n_{1}}

. Here

π

is the second-stage decision variable.

Clearly, by introducing the indicator function

δ_{X}

, problem (31) can be written as the form of (5), and then becomes the form of (1) by setting

h (x) = δ_{X} (x)

.

The above recourse function can be written as its dual form:

V_{i} (x) = max_{y \in R^{m_{2}}} 〈 h_{i} - T_{i} x, y 〉 s . t . W^{T} y \leq q,

where

q \in R^{n_{2}}

and

W \in R^{m_{2} \times n_{2}}

. By solving these linear programming problems to return solutions with precision up to a given tolerance, one can establish an inexact oracle in the form (6), see Reference [23] for more detailed description.

The instances of SP problems are downloaded from the link: http://pwp.gatech.edu/guanghui-lan/computer-codes/.

Four instances are tested, namely, SSN(50), SSN(100), 20-term(50), 20-term(100), where the integers in the brackets mean the number of scenarios N. Here, the SSN instances come from the telecommunications and have been studied by Sen, Doverspike, and Cosares [44]. And the 20-term instances come from the motor freight carrier’s problem and have been studied by Mak, Morton, and Wood [45]. The dimensions of these instances are listed in Table 2.

The parameters are selected as:

κ = 0.04

,

t_{m i n} = 0.1

and

t_{1} = 1.1

. The maximum bundle size is set to be 35. All the tests were performed in MATLAB (R2014a) on a PC with Intel(R) Core(TM) i7-4790 CPU 3.60GHz, 4GB RAM. The quadratic programming and linear programming subproblems were solved by the MOSEK 8 toolbox for MATLAB; see http://www.mosek.com.

We first compare our algorithm (denoted by GALBM) with the accelerated prox-level method (APL) in Reference [43], where the tolerances of the linear programming solver of MOSEK are set by default. The results are listed in Table 3, in which the number of iterations (NI), the consumed CPU time in seconds (Time), and the returned minimum values (

f_{*}

) are compared. Note that we use the MATLAB commands tic and toc to measure the consumed CPU time. For each instance, we run 10 times and report the average CPU time. From Table 3, we see that, when similar solution quality is achieved, GALBM can significantly outperform than APL in terms of the number of iterations and CPU time.

In what follows, we are interested in evaluating the impact of inexact oracles for GALBM. In more detail, we carry out two groups of tests. The first group adopts fixed tolerances, that is,

ε_{x}^{k + 1} \equiv ε

, and the corresponding results are reported in Table 4, Table 5, Table 6 and Table 7. Whereas the second group adopts dynamic tolerances with a safeguard parameter

μ > 0

, that is,

ε_{x}^{k + 1} = min {μ, κ_{ϵ} v_{k}}

with

κ_{ϵ} = 0.7

, and the corresponding results are reported in Table 8, Table 9, Table 10 and Table 11. The symbol “-” in the following tables means that the number of iterations for the corresponding instance is greater than 500.

6. Conclusions

In this paper, we have proposed a generalized alternating linearization bundle method for solving structured convex optimization with inexact first-order oracles. Our method can handle various inexact data by making use of the so-called on-demand accuracy oracles. At each iteration, two interrelated subproblems are solved alternately, aiming to reduce the computational cost. Global convergence of the algorithm is established under different types of inexactness. Numerical results show that the proposed algorithm is promising.

Author Contributions

C.T. mainly contributed to the algorithm design and convergence analysis; Y.L. and X.D. mainly contributed to the convergence analysis and numerical results; and B.H. mainly contributed to the numerical results. All authors have read and agree to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation (11761013) and Guangxi Natural Science Foundation (2018GXNSFFA281007) of China.

Acknowledgments

The authors would like to thank for the support funds.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tsaig, Y.; Donoho, D.L. Compressed sensing. IEEE Trans. Inform. Theory 2006, 52, 1289–1306. [Google Scholar]
Jung, M.; Kang, M. Efficient nonsmooth nonconvex optimization for image restoration and segmentation. J. Sci. Comput. 2015, 62, 336–370. [Google Scholar] [CrossRef]
Sar, S.; Nowozin, S.; Wright, S.J. Optimization for Machine Learning; Massachusetts Institute of Technology Press: Cambridge, MA, USA, 2012. [Google Scholar]
Clarke, F.H.; Ledyaev, Y.S.; Stern, R.J.; Wolenski, P.R. Nonsmooth Analysis and Control Theory; Springer: New York, NY, USA, 1998. [Google Scholar]
Yang, L.F.; Luo, J.Y.; Jian, J.B.; Zhang, Z.R.; Dong, Z.Y. A distributed dual consensus ADMM based on partition for DC-DOPF with carbon emission trading. IEEE Trans. Ind. Inform. 2020, 16, 1858–1872. [Google Scholar] [CrossRef]
Yang, L.F.; Zhang, C.; Jian, J.B.; Meng, K.; Xu, Y.; Dong, Z.Y. A novel projected two-binary-variable formulation for unit commitment in power systems. Appl. Energ. 2017, 187, 732–745. [Google Scholar] [CrossRef]
Yang, L.F.; Jian, J.B.; Zhu, Y.N.; Dong, Z.Y. Tight relaxation method for unit commitment problem using reformulation and lift-and-project. IEEE Trans. Power. Syst. 2015, 30, 13–23. [Google Scholar] [CrossRef]
Yang, L.F.; Jian, J.B.; Xu, Y.; Dong, Z.Y.; Ma, G.D. Multiple perspective-cuts outer approximation method for risk-averse operational planning of regional energy service providers. IEEE Trans. Ind. Inform. 2017, 13, 2606–2619. [Google Scholar] [CrossRef]
Teo, C.H.; Vishwanathan, S.V.N.; Smola, A.J.; Le, Q.V. Bundle methods for regularized risk minimization. Mach. Learn. Res. 2010, 11, 311–365. [Google Scholar]
Bottou, L.; Curtis, F.E.; Nocedal, J. Optimization Methods for Machine Learning. SIAM Rev. 2018, 60, 223–311. [Google Scholar] [CrossRef]
Kiwiel, K.C. A proximal-projection bundle method for Lagrangian relaxation, including semidefinite programming. SIAM J. Optim. 2006, 17, 1015–1034. [Google Scholar] [CrossRef]
Tang, C.M.; Jian, J.B.; Li, G.Y. A proximal-projection partial bundle method for convex constrained minimax problems. J. Ind. Manag. Optim. 2019, 15, 757–774. [Google Scholar] [CrossRef] [Green Version]
Paul, T. Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM Cont. Optim. 1991, 29, 119–138. [Google Scholar]
Mahey, P.; Tao, P.D. Partial regularization of the sum of two maximal monotone operators. Math. Model. Numer. Anal. 1993, 27, 375–392. [Google Scholar] [CrossRef] [Green Version]
Eckstein, J. Some saddle-function splitting methods for convex programming. Optim. Meth. Soft. 1994, 4, 75–83. [Google Scholar] [CrossRef]
Fukushima, M. Application of the alternating direction method of multipliers to separable convex programming problems. Comput. Optim. Appl. 1992, 1, 93–111. [Google Scholar] [CrossRef]
He, B.S.; Tao, M.; Yuan, X.M. Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 2012, 22, 313–340. [Google Scholar] [CrossRef]
He, B.S.; Tao, M.; Yuan, X.M. Convergence rate analysis for the alternating direction method of multipliers with a substitution procedure for separable convex. Math. Oper. Res. 2017, 42, 662–691. [Google Scholar] [CrossRef]
Chao, M.; Cheng, C.; Zhang, H. A linearized alternating direction method of multipliers with substitution procedure. Asia-Pac. Opera. Res. 2015, 32, 1550011. [Google Scholar] [CrossRef]
Goldfarb, D.; Ma, S.; Scheinberg, K. Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program 2013, 141, 349–382. [Google Scholar] [CrossRef] [Green Version]
Kiwiel, K.C.; Rosa, C.H.; Ruszczyński, A. Proximal decomposition via alternating linearization. SIAM J. Optim. 1999, 9, 668–689. [Google Scholar] [CrossRef]
Kiwiel, K.C. An alternating linearization bundle method for convex optimization and nonlinear multicommodity flow problems. Math. Program 2011, 130, 59–84. [Google Scholar] [CrossRef]
DeOliveira, W.; Sagastizábal, C. Level bundle methods for oracles with on-demand accuracy. Optim. Method Softw. 2014, 29, 1180–1209. [Google Scholar] [CrossRef]
Kiwiel, K.C. Bundle Methods for Convex Minimization with Partially Inexact Oracles; Technical Report; Systems Research Institute, Polish Academy of Sciences: Warsaw, Poland, 2009. [Google Scholar]
De Oliveira, W.; Sagastizábal, C.; Lemaréchal, C. Convex proximal bundle methods in depth: A unified analysis for inexact oracles. Math. Program. 2014, 148, 241–277. [Google Scholar] [CrossRef]
Wolfe, P. A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. 1975, 3, 145–173. [Google Scholar]
Mäkelä, M. Survey of bundle methods for nonsmooth optimization. Optim. Meth. Soft. 2002, 17, 1–29. [Google Scholar] [CrossRef]
Bonnans, J.F.; Gilbert, J.C.; Lemaréchal, C.; Sagastizábal, C. Numerical Optimization: Theoretical and Practical Aspects, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Lemaréchal, C. An extension of davidon methods to nondifferentiable problems. Math. Program. 1975, 3, 95–109. [Google Scholar]
Kiwiel, K.C. Approximations in proximal bundle methods and decomposition of convex programs. J. Optim. Theory. Appl. 1995, 84, 529–548. [Google Scholar] [CrossRef]
Hintermuller, M. A proximal bundle method based on approximate subgradients. Comput. Optim. Appl. 2001, 20, 245–266. [Google Scholar] [CrossRef]
Kiwiel, K.C. A proximal bundle method with approximate subgradient linearizations. SIAM J. Optim. 2006, 16, 1007–1023. [Google Scholar] [CrossRef] [Green Version]
Kiwiel, K.C. An algorithm for nonsmooth convex minimization with errors. Math. Comput. 1985, 45, 173–180. [Google Scholar] [CrossRef]
Hiriart-Urruty, J.B.; Lemaréchal, C. Convex Analysis and Minimization Algorithms. Number 305-306 in Grundlehren der mathematischen Wissenschaften; Springer: Berlin, Germany, 1993. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
Malick, J.; De Oliveira, W.; Zaourar-Michel, S. Uncontrolled inexact information within bundle methods. Eur. J. Oper. Res. 2017, 5, 5–29. [Google Scholar] [CrossRef] [Green Version]
De Oliveira, W.; Sagastizábal, C.; Scheimberg, S. Inexact bundle methods for two-stage stochastic programming. SIAM J. Optim. 2011, 21, 517–544. [Google Scholar] [CrossRef] [Green Version]
Zakeri, G.; Philpott, A.B.; Ryan, D.M. Inexact cuts in benders decomposition. SIAM J. Optim. 2000, 10, 643–657. [Google Scholar] [CrossRef]
Fábixaxn, C.I. Bundle-type methods for inexact data. Cent. Eur. J. Oper. Res. 2000, 8, 35–55. [Google Scholar]
Kiwiel, K.C. Methods of Descent for Nondifferentiable Optimization; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1985. [Google Scholar]
Wallace, S.W.; Ziemba, W.T. Applications of Stochastic Programming; MPS-SIAM Ser. Optim.; SIAM: Philadelphia, PA, USA, 2005. [Google Scholar]
Shapiro, A.; Dentcheva, D.; Ruszczyński, A. Lectures on Stochastic Programming: Modeling and Theory; SIAM: Philadelphia, PA, USA, 2014. [Google Scholar]
Lan, G. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math. Program 2015, 149, 1–45. [Google Scholar] [CrossRef] [Green Version]
Sen, S.; Doverspike, R.D.; Cosares, S. Network planning with random demand. Telecommun. Syst. 1994, 3, 11–30. [Google Scholar] [CrossRef]
Mak, W.; Morton, D.P.; Wood, R.K. Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 1999, 24, 47–56. [Google Scholar] [CrossRef]

Table 1. The choices for the error bound and the descent target.

Instances	$ε_{x}^{k + 1}$	$γ_{x}^{k + 1}$
Ex	0	$+ \infty$
PI	0	$F_{\hat{x}}^{k} - θ κ v_{k}$
IE	$ε > 0$	$+ \infty$
AE	$κ_{ϵ} v_{k}$	$+ \infty$
PAE	$κ_{ϵ} v_{k}$	$F_{\hat{x}}^{k} - θ κ v_{k}$

Table 2. Dimensions of the SP instances.

Name	$n_{1}$	$m_{1}$	$n_{2}$	$m_{2}$
SSN	89	1	706	175
20-term	63	3	764	124

Table 3. The comparisons between GALBM and accelerated prox-level (APL) for the stochastic programming (SP) instances.

Name	Algorithm	NI	Time	$f_{*}$
SSN(50)	GALBM	105	28.12	4.838238
	APL	147	72.40	4.838278
SSN(100)	GALBM	95	49.58	7.352609
	APL	155	156.53	7.352618
20-term(50)	GALBM	132	47.62	$2.549453 \times 10^{5}$
	APL	156	106.51	$2.549453 \times 10^{5}$
20-term(100)	GALBM	173	128.23	$2.532875 \times 10^{5}$
	APL	261	364.93	$2.532876 \times 10^{5}$

Table 4. Numerical results for SSN(50) with fixed tolerances.

No.	$ε_{x}$	NI	Time	$f_{*}$
1	$10^{- 4}$	206	55.21	4.838157
2	$10^{- 5}$	202	50.37	4.838163
3	$10^{- 6}$	190	49.64	4.838247
4	$10^{- 7}$	132	34.35	4.838156
5	$10^{- 8}$	105	27.81	4.838238
6	$10^{- 9}$	97	24.49	4.838188
7	$10^{- 10}$	99	25.62	4.838136
8	$10^{- 11}$	84	22.98	4.838191
9	$10^{- 12}$	97	27.47	4.838128
10	$10^{- 13}$	84	25.53	4.838231

Table 5. Numerical results for SSN(100) with fixed tolerances.

No.	$ε_{x}$	NI	Time	$f_{*}$
1	$10^{- 4}$	-	-	-
2	$10^{- 5}$	224	109.21	7.352932
3	$10^{- 6}$	194	94.80	7.352854
4	$10^{- 7}$	127	62.17	7.352937
5	$10^{- 8}$	95	46.93	7.352758
6	$10^{- 9}$	87	43.03	7.352750
7	$10^{- 10}$	79	41.55	7.353074
8	$10^{- 11}$	83	43.76	7.352939
9	$10^{- 12}$	81	44.08	7.352748
10	$10^{- 13}$	81	46.77	7.352734

Table 6. Numerical results for 20-term (50) with fixed tolerances.

No.	$ε_{x}$	NI	Time	$f_{*}$
1	$10^{- 2}$	250	80.86	$2.549490 \times 10^{5}$
3	$10^{- 3}$	144	49.12	$2.549466 \times 10^{5}$
3	$10^{- 4}$	165	57.49	$2.549463 \times 10^{5}$
4	$10^{- 5}$	164	54.12	$2.549460 \times 10^{5}$
5	$10^{- 6}$	211	72.42	$2.549461 \times 10^{5}$
6	$10^{- 7}$	178	62.19	$2.549459 \times 10^{5}$
7	$10^{- 8}$	132	44.56	$2.549457 \times 10^{5}$
8	$10^{- 9}$	175	61.53	$2.549461 \times 10^{5}$
9	$10^{- 10}$	132	49.53	$2.549457 \times 10^{5}$
10	$10^{- 11}$	258	99.69	$2.549457 \times 10^{5}$
11	$10^{- 12}$	175	67.70	$2.549461 \times 10^{5}$
12	$10^{- 13}$	183	71.52	$2.549461 \times 10^{5}$

Table 7. Numerical results for 20-term (100) with fixed tolerances.

No.	$ε_{x}$	NI	Time	$f_{*}$
1	$10^{- 2}$	227	141.74	$2.532914 \times 10^{5}$
2	$10^{- 3}$	-	-	-
3	$10^{- 4}$	140	96.02	$2.532879 \times 10^{5}$
4	$10^{- 5}$	179	117.71	$2.532877 \times 10^{5}$
5	$10^{- 6}$	152	99.29	$2.532879 \times 10^{5}$
6	$10^{- 7}$	139	95.07	$2.532876 \times 10^{5}$
7	$10^{- 8}$	173	128.44	$2.532876 \times 10^{5}$
8	$10^{- 9}$	143	103.51	$2.532878 \times 10^{5}$
9	$10^{- 10}$	-	-	-
10	$10^{- 11}$	159	110.21	$2.532877 \times 10^{5}$
11	$10^{- 12}$	150	112.37	$2.532878 \times 10^{5}$
12	$10^{- 13}$	132	99.46	$2.532878 \times 10^{5}$

Table 8. Numerical results for SSN (50) with dynamic tolerances.

No.	$μ$	NI	Time	$f_{*}$
1	$10^{- 3}$	-	-	-
2	$10^{- 4}$	201	50.54	4.838163
3	$10^{- 5}$	202	49.07	4.838163
4	$10^{- 6}$	190	49.21	4.838247
5	$10^{- 7}$	132	32.26	4.838156
6	$10^{- 8}$	105	27.10	4.838238

Table 9. Numerical results for SSN (100) with dynamic tolerances.

No.	$μ$	NI	Time	$f_{*}$
1	$10^{- 3}$	-	-	-
2	$10^{- 4}$	-	-	-
3	$10^{- 5}$	284	141.85	7.353010
4	$10^{- 6}$	194	97.01	7.352854
5	$10^{- 7}$	127	63.83	7.352937
6	$10^{- 8}$	95	47.97	7.352758

Table 10. Numerical results for 20-term (50) with dynamic tolerances.

No.	$μ$	NI	Time	$f_{*}$
1	$10^{- 2}$	199	65.09	$2.549485 \times 10^{5}$
2	$10^{- 3}$	140	44.73	$2.549462 \times 10^{5}$
3	$10^{- 4}$	165	54.08	$2.549463 \times 10^{5}$
4	$10^{- 5}$	164	54.18	$2.549460 \times 10^{5}$
5	$10^{- 6}$	211	70.67	$2.549461 \times 10^{5}$
6	$10^{- 7}$	178	61.90	$2.549459 \times 10^{5}$
7	$10^{- 8}$	132	46.94	$2.549457 \times 10^{5}$

Table 11. Numerical results for 20-term (100) with dynamic tolerances.

No.	$μ$	NI	Time	$f_{*}$
1	$10^{- 2}$	191	119.40	$2.532901 \times 10^{5}$
2	$10^{- 3}$	143	92.32	$2.532881 \times 10^{5}$
3	$10^{- 4}$	140	91.68	$2.532878 \times 10^{5}$
4	$10^{- 5}$	179	121.12	$2.532877 \times 10^{5}$
5	$10^{- 6}$	146	104.27	$2.532878 \times 10^{5}$
6	$10^{- 7}$	170	120.29	$2.532879 \times 10^{5}$
7	$10^{- 8}$	173	126.57	$2.532876 \times 10^{5}$

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, C.; Li, Y.; Dong, X.; He, B. A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles. Algorithms 2020, 13, 91. https://doi.org/10.3390/a13040091

AMA Style

Tang C, Li Y, Dong X, He B. A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles. Algorithms. 2020; 13(4):91. https://doi.org/10.3390/a13040091

Chicago/Turabian Style

Tang, Chunming, Yanni Li, Xiaoxia Dong, and Bo He. 2020. "A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles" Algorithms 13, no. 4: 91. https://doi.org/10.3390/a13040091

APA Style

Tang, C., Li, Y., Dong, X., & He, B. (2020). A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles. Algorithms, 13(4), 91. https://doi.org/10.3390/a13040091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generalized Alternating Linearization Bundle Method for Structured Convex Optimization with Inexact First-Order Oracles

Abstract

1. Introduction

2. Preliminaries

3. The Generalized Alternating Linearization Bundle Method

4. Global Convergence

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI