1. Introduction
SAA was first introduced by Robbins and Monro in 1950s [
1] and is also called the RM algorithm in the literature. Since the 1950s, SAA has found successful applications in systems and control, statistics, machine learning, the stochastic arithmetic, and the CESTAC method for numerical calculation [
2,
3], energy internet [
4], and so forth, and there are plenty of studies on various theoretical properties of SAA, including the sample path convergence [
1,
5,
6,
7], the weak convergence [
8], convergence of time-varying root-searching functions [
9,
10,
11,
12], robustness of SAA [
13], stable and unstable limit points [
14], convergence rate and asymptotic normality [
7,
15,
16,
17,
18,
19,
20], and so forth. In this paper, due to space limitations, we will not give an overview of all aspects of SAA, but focus on the sample path analysis of this kind of algorithm, and particularly introduce the centralized and distributed SAAWET developed by Han-Fu Chen and his colleagues during the past 35 years.
At the early stage of SAA, to guarantee the convergence of the algorithm, it usually requires restrictive conditions on the noise
and the function
, for example,
being a martingale difference sequence and
being bounded by a linear function. Then, by tools from the probability theory, such as the martingale theory, the almost sure convergence of SAA can be established and such an analysis method is called the probabilistic method (PM) [
1,
20]. However, in many situations, the noise
may not be a martingale difference sequence, and it may depend on the past estimates generated by SAA itself, which we call the state-dependent noise, and in such cases, PM fails. Towards relaxing the restrictive conditions in PM, in the 1970s a so-called ordinary differential equation (ODE) method was introduced [
7,
21], which transforms the convergence analysis of SAA into the stability of the equilibrium point of an associated ODE and forms another representative direction for theoretical analysis of SAA. For the ODE method, it
a priori assumes that the estimate sequence generated by the algorithm is bounded, which is not easy to be verified in practice. For many theoretical problems, due to the complexity of the system noise and less
a priori information on the estimate sequence, neither PM nor the ODE method can be applied.
In order to essentially relax the assumptions required by PM and the ODE method, in the past 35 years an expanding truncation technique was introduced into SAA by Han-Fu Chen and his colleagues, and a so-called stochastic approximation algorithm with expanding truncations (SAAWET) has been developed [
22,
23]. The key idea of the expanding truncation technique lies in that if the magnitude of the estimate exceeds the truncation bound, then the estimate is reset to the initial value but with a decreased stepsize, and, at the same time, an enlarged truncation bound will be used for the next recursion, while if the magnitude of estimate is smaller than the truncation bound, then SAAWET evolves as the classical SAA in the next recursion and the truncation bound remains unchanged. The expanding truncation technique introduces a way of adaptively choosing the step-sizes of the algorithm and essentially relaxes the conditions on the noises and the root-searching function. Furthermore, to guarantee the convergence of SAAWET, one only needs to verify the noise condition along the index sequence
of any convergent subsequence
rather than along the whole sequence
, which greatly facilitates the application of such an algorithm. The convergence analysis method of SAAWET is accordingly named the trajectory-subsequence (TS) method. In the past 35 years, SAAWET has successfully been applied to recursive identification of a multivariate ARMAX system [
24], recursive identification of nonlinear systems [
25,
26], adaptive regulation of a nonlinear system [
27], consensus control of multiagent systems [
28], convergence of a distributed randomized PageRank algorithm [
29], and so forth. In this paper, we first review the probabilistic method (PM) and the ordinary differential equation (ODE) method for SAA and the trajectory-subsequence method for SAAWET. Then, we will give an application example of SAAWET to the recursive principal component analysis.
All the above algorithms and methods are applied in a centralized situation. In recent years, distributed algorithms have been extensively investigated—for example, the consensus problem [
30,
31], distributed estimation [
32,
33], and so on. The distributed algorithms work in a network situation, where there is no central agent collecting and processing the data over the whole network but the agents cooperatively accomplish a global objective by using their local observations and information obtained from communication with their adjacent neighbours [
34,
35,
36,
37,
38]. Recently, many distributed problems have been solved by SAA-based algorithms, such as distributed estimation [
32,
39], and distributed convex optimization [
34,
35]. As a result, the distributed algorithm for SAA has naturally drawn much attention from researchers. Distributed SAA (DSAA) was proposed in [
40,
41,
42,
43] and so forth, in which agents cooperatively find roots of a function, being a sum of local functions associated with agents connected in a network. It is noticed that almost all the above DSAA algorithms require restrictive conditions to guarantee convergence; for example, in [
41] it requires the global Lipschitz continuity of local functions and the observation noise as a martingale difference sequence. However, these assumptions may not hold for many problems, including the distributed principal component analysis and the distributed gradient-free optimization, and so forth. Aiming at solving the distributed root-seeking problem under weaker conditions compared with those used in [
41], a distributed version of SAAWET (DSAAWET) was recently developed [
44]. In this paper, we will introduce DSAAWET and its convergence properties and also give some application examples of DSAAWET.
The rest of the paper is arranged as follows. In
Section 2, the centralized SAA and PM, as well as the ODE method for its convergence analysis are reviewed. Then, SAAWET is introduced with the conditions for its convergence analysis and its general convergence theorem given. In this paper, SAAWET is also applied to solve the recursive principal component analysis problem. In
Section 3, DSAAWET and its convergence properties are introduced, and some application examples are given as well. In
Section 4, some concluding remarks are addressed. The notations used in this paper are listed in
Table 1.
2. Centralized Stochastic Approximation Algorithm
2.1. Probabilistic Martingale Method
We first give the detailed problem formulation of SAA.
Assume that
is an unknown function with root
, that is,
. At time
k, assume that
is an estimate of
and the measurement of
at
is
, that is,
where
is the observation noise.
With an arbitrary initial value
, in [
1] a recursive algorithm for searching the root
of
is introduced
where
is the stepsize sequence which satisfies
Assumption (
3) requires that the stepsize sequence tends to zero, but the sum of the stepsizes goes to infinity. This indicates that the noise effect can be asymptotically reduced while the algorithm has the ability to search in the whole feasible domain.
Example 1. Assume that at time k, the estimate is very close to the root . Then there exists an small enough, such that From (5) it can be seen that if the stepsize sequence has a positive lower bound, then for the unbounded noises, such as Gaussian variables, the estimation error will not converge to zero. Thus, for the stepsize sequence , is a necessary condition for convergence of SAA. Example 2. Consider the case . If and is bounded, then If the initial value satisfies , then from (6) it follows thatand will not converge to . Thus, is another necessary condition for convergence of SAA. We make the following assumptions.
- A2.1
The stepsize sequence
satisfies
- A2.2
There exists a twice-differentiable function , such that:
- (i)
The Hessian matrix of is bounded.
- (ii)
and as .
- (iii)
For any
, there exists an
such that
- A2.3
is an m.d.s., that is, and further, .
- A2.4
There exists a constant
such that
Theorem 1. ([20,45,46]) Assume that A2.1–A2.4 hold. Then, with an arbitrary initial value , estimates generated from (2) converge to the root of almost surely, that is, Proof. Here, we list the sketch of the proof.
By using the function
given in A2.2, define
:
By the properties of a nonnegative supermartingale, we have that
converges almost surely. Noting (
8), we further obtain
and thus,
converges almost surely.
With an arbitrary
, define
and a stopping sequence
:
By (
13) and (
14) and noting (
9), we have
From (
16) and by the properties of the supermartingale, we have
By (
17) and (
8), we can prove that
is finite almost surely, and there exists a subsequence
which satisfies
. Since
can be arbitrarily small, we know that there exists a subsequence of
which converges to
. By the convergence of
, we can finally prove (
11). □
The above proof is based on the convergence of the nonnegative supermartingale, and the analysis method is accordingly named the probabilistic method (PM).
Remark 1. If A2.2 is changed tothen we can prove the almost sure convergence of the following SAA Remark 2. From A2.3 we can see that it requires the observation noise being an m.d.s. In many problems, the observation noise may contain complicated state-dependent uncertainties, for example, . On the other hand, if we choose , then from A2.4 we havewhich indicates that is bounded by a linear function. These are restrictions for PM. 2.2. Ordinary Differential Equation Method
We first introduce two technical lemmas.
Lemma 1. (Arzelà-Ascoli, [47]) Let be a function class satisfying the condition of uniform boundedness and equi-continuity. Then there exists a continuous function and a subsequence such that converges to uniformly on any fixed bounded interval in . Remark 3. For the function class , the equi-continuity means that for any , there exists such that for any , The following result is well-known for the Lyapunov stability of ODE.
Lemma 2. Consider the ODEwith an equilibrium point , that is, . If there exists a continuously differentiable such that - (i)
;
- (ii)
; ;
- (iii)
then is the globally stable solution of (22). Before introducing the conditions and the results for the ODE method, we first introduce a crucial notation to be used in this paper. For any
, define
From the definition in (
23) we can see that
is the maximal number of steps starting from
n with the sum of stepsizes not exceeding
T. Noting that
as
, we have that
as
.
We make the following assumptions.
- A3.1
- A3.2
There exists a twice-differentiable function such that
- (i)
;
- (ii)
and as ;
- (iii)
- A3.3
The noise
satisfies
- A3.4
The function is continuous.
Remark 4. If the stepsizes further satisfy and the noise is an m.d.s. and , then it can be directly verified that , and hence, A3.3 holds. If the noise is not stochastic but satisfies , then A3.3 also holds. In this regard, (24) is more general compared with A2.3. We give the following result for the ODE method.
Theorem 2. ([6,7,21]) Consider the sample path ω where A3.3 takes place. If A3.1, A3.2, and A3.4 hold on the sample path ω and is bounded, then Proof. We sketch the proof.
Define
with
By interpolating the estimates generated from (
2), we have a sequence of continuous functions
which can be proved to be bounded by the boundedness of
and uniform continuity by A3.3.
Then by Lemma 1, we can prove that there exists a subsequence
converging a continuous function
satisfying the following ODE
By A3.2 and Lemma 2 we can further prove that the equilibrium point is globally stable and the estimates converge to on the sample path . □
The essential idea of the above proof is to transform the convergence of a recursive algorithm to the stability of an associated ODE, and thus the analysis method is called the ODE method. From assumption A3.3 we can see that the ODE method has a wider application potential than the PM method. However, the boundedness assumption on the estimate sequence is still restrictive. Aiming at removing this condition and further relaxing the technical assumptions on the noise and the root-seeking function, in the next section we will introduce the SAA with expanding truncations (SAAWET) and its general convergence theorem.
2.3. SAAWET and Trajectory-Subsequence Method
We first give an example.
Example 3.([5]) Consider the root-seeking of the function and assume that the observation noise . - (i)
If we choose the initial value and the stepsize , it can be directly verified that the estimates generated from (2) - (ii)
If we choose the initial value and the stepsize , it can be directly verified that the estimates generated from (2) is convergent, that is, .
For Example 3, neither the growth rate condition on used by PM nor the boundedness condition on the estimate sequence used by the ODE method take place. On the other hand, if we can choose the stepsize in an adaptive manner, estimates generated from SAA may still converge. The core idea of SAAWET consists in adaptively choosing the stepsizes. Let us describe it.
Denote by
J the root set of an unknown function
, that is,
. Choose
a positive sequence increasingly diverging to infinity,
. With an arbitrary initial value
, the estimate sequence
is generated by the following SAAWET:
where
is the observation of
at
,
is the observation noise,
is the stepsize and
is the estimate for the root set of
at time
k.
From (
29) and (
30), we can see that if the magnitude of
is located in the truncation bound, then the algorithm evolves as the classical SAA (
2), while if the magnitude of
exceeds the truncation bound, then
is pulled back to
with the truncation bound being enlarged and the stepsize being reduced for the next recursion.
We first list conditions for theoretical analysis.
- A4.1
;
- A4.2
There exists a continuously differentiable function such that
- (i)
,
- (ii)
The set is nowhere dense,
- (iii)
For
in (
29), there is a
such that
and
,
- A4.3
On the sample path
, for any convergent subsequence
generated from (
29) and (
30), it holds that
with
;
- A4.4
The function is measurable and locally bounded.
Remark 5. A set being nowhere dense means that the set of the interior points of its closure is empty. It is clear that a set with a single point is nowhere dense. Note that for the noise condition A3.3, it only requires to verify (32) on a fixed sample path and along the index of any convergent subsequence, which compared with (24) is much easier to be verified in practice. In this regard, the analysis method for SAAWET is called the trajectory-subsequence (TS) method. Theorem 3. ([5]) For SAAWET (29) and (30), if A4.1, A4.2 and A4.4 hold, then on the sample path ω where A4.3 takes place, it holds thatwhere . Denote by the closure of J and by the set of the limit points of . It holds thatand is a closed connected subset of . It is direct to verify that if the root set of is a singleton, then under the conditions of Theorem 3 we have that and .
Proof. We outline the proof.
Denote by
a convergent subsequence generated by (
29) and (
30).
- Step 1.
By A4.3, prove that for all
k large enough and
small enough, there is no truncation for
, that is,
and
- Step 2.
By the stability condition A4.2 and (
35) and (
36), prove that for SAAWET, the number of truncations is finite, that is, algorithms (
29) and (
30) evolve as the classical SAA after a finite number of steps.
- Step 3.
To establish the convergence of . □
Remark 6. If it is a priori known that estimates generated from (29) and (30) are in a subspace , then for the convergence of , we only need to verify the stability condition in and in such a case assumption A4.2 is formulated as follows: - A4.2’
There exists a continuously differentiable function such that
- (i)
,
- (ii)
The set is nowhere dense,
- (iii)
For in (29), there exists a constant such that and .
Example 4. Consider the root-seeking problem in Example 3. Assume that with and being a sequence of iid Gaussian variables . In (29) and (30), choose . By a direct calculation, it obtains that From the above sections, we can see that SAAWET does not require that the observation noise is purely stochastic or the estimate sequence is bounded. In fact, it can be shown that conditions for convergence of SAAWET are the weakest possible in a certain sense [
5]. In the next section, we will apply SAAWET to give a recursive algorithm for principal component analysis.
2.4. Recursive Principal Component Analysis Algorithm
Before introducing the recursive algorithm, we first make the following assumption.
- A5.1
are symmetric matrices. A is unknown and is the observation sequence of A satisfying .
The principal component analysis algorithm considered in this paper aims at estimating the eigenvalues and eigenvectors based on the observation sequence . Since A is unknown, if we perform SVD or QR decomposition for each , it would be rather time-consuming. In the following, we will introduce a SAAWET-based recursive algorithm for solving the problem.
The eigenvectors of matrix A are recursively estimated as follows:
Choose the stepsize
and any initial values
with unit modular and define
If
, then
Otherwise, and reset as another vector with a unit modular.
Assume that have been well-defined. Next, we inductively define the estimation algorithm for .
Define the
-matrix
where
and
is the pseudo inverse of
.
With an arbitrary initial vector
with a unit modular, define
If
, then
Otherwise, and reset as another vector denoted still by satisfying and .
serve as the estimates for eigenvectors of matrix A.
Based on , we introduce the recursive algorithms for estimating eigenvalues of matrix A.
Choose a positive sequence increasingly diverging to infinity,
,
. With arbitrary initial values
,
are generated by the following algorithms:
where
is the estimate for the eigenvalue corresponding to the eigenvector which
estimates at time
k.
Denote S the unit sphere in and J the set of all eigenvectors with unit modular of matrix A. Denote the set of all eigenvalues of matrix A by , that is, .
For algorithms (
39)–(
42), the following result takes place.
Theorem 4. Assume that A5.1 holds.
- (i)
There exists a closed-subset of J such that - (ii)
There exists such thatand for any - (iii)
converges to the eigenvalue of matrix A and the limit set of coincides with .
Proof. Noting that
are with a unit modular, algorithms (
39)–(
42) are SAAWET. The proof can be obtained by applying Theorem 3 and Remark 5. Due to space limitations, the detailed proof is omitted. □
3. Distributed Stochastic Approximation Algorithm
In this section, we will introduce the distributed version of SAAWET (DSAAWET). The key difference from the existing distributed SAA, see for example, [
40], lies in the expanding truncation mechanism that adaptively defines enclosing bounds. The theoretical results of DSAAWET guarantee that estimates generated by DSAAWET for all agents converge almost surely to a consensus set, which is contained in the root set of the sum function.
We first introduce the problem formulation.
Consider the case where all agents in a network cooperatively search the root of the sum function given by
where
is the local objective function which can only be observed by agent
Denote the root set of
by
.
For any
, denote by
the estimate for the root of
given by agent
i at time
k. Since at time
, agent
i only has its local noisy observation
where
is the observation noise, to estimate the root of
, all agents need to exchange information with adjacent agents via the topology of the network.
The topology of the network at time k is described by a digraph , where is the index set of all agents, and is the edge set with representing the information flow from agent j to agent i at time k. Denote the adjacency matrix of the network at time k by , where if, and only if , and , otherwise. Denote by the set of neighbors of agent i at time k.
We apply the idea of expanding truncation to the distributed estimation. Denote by
the estimate for agent
i at time
k and by
the number of truncations for agent
i up to time
k. DSAAWET is given by (
46)–(
49) with any initial values
where
is defined by (
45),
is the stepsize,
is a vector known to all agents,
is a positive sequence increasingly diverging to infinity with
.
Denote by
the largest truncation number among all agents at time
k. If
then by denoting
(
46)–(
49) becomes
which is the SAAWET in the centralized setting.
We first introduce assumptions to be used.
- A5.1
, and .
- A5.2
There exists a continuously differentiable function
such that
for any
, where
denotes the gradient of
and
,
(b) is nowhere dense,
(c)
for some positive constant
, where
is used in (
47) and (
48).
- A5.3
The local functions are continuous.
- A5.4
(a) are doubly stochastic matrices (A matrix is said to be doubly stochastic if all entries being nonnegative and the sum of entries in each row and each column being 1.);
(b) There exists a constant
such that
(c) The digraph
is strongly connected, where
is the set of edges
such that
j is a neighbor of
i which communicates with
i infinitely often, that is,
(d) There exists a positive integer
B such that for every
, agent
j sends information to the neighbor
i at least once every
B consecutive time slots, that is,
for all
and any
Remark 7. Assumptions A5.1–A5.3 are similar to those conditions used in the centralized SAAWET. Since the DSAAWET is in a distributed setting, assumption A5.4 is a commonly used condition describing the topology of the network [34]. Next, we introduce the condition on the observation noises for each agent.
- A5.5
For the sample path under consideration and for each agent i,
(a) , and
(b) along indices whenever converges.
We now give the main results of DSAAWET.
Define the vectors . Denote by the disagreement vector of with , and by the average of the estimates derived at all agents at time k.
Theorem 5. ([44]) Let be produced by (46)–(49) with an arbitrary initial value . Assume A5.1–A5.4 hold. Then on the sample path ω where A5.5 holds, the following assertions take place: - (i)
is bounded and there exists a positive integer possibly depending on ω such thator in the compact form: - (ii)
- (iii)
There exists a connected subset such that
Proof. The proof can be obtained by first proving the finite number of truncations, then the consensus of estimates, and finally, the convergence of estimates. The detailed proof can be found in [
44]. □
DSAAWET provides a likely tool for solving the distributed problems over networks. In fact, DSAAWET has been successfully applied in the following problems:
Distributed identification of linear systems [
48],
Distributed blind identification of communication channels [
49],
Output consensus of networked Hammerstein and Wiener systems [
38].