On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation

Xing, Lili; Bao, Wendi; Li, Weiguo

doi:10.3390/math11214554

Open AccessArticle

On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation

by

Lili Xing

,

Wendi Bao

and

Weiguo Li

^*

College of Science, China University of Petroleum, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(21), 4554; https://doi.org/10.3390/math11214554

Submission received: 8 October 2023 / Revised: 30 October 2023 / Accepted: 3 November 2023 / Published: 5 November 2023

(This article belongs to the Section Computational and Applied Mathematics)

Download

Browse Figure

Versions Notes

Abstract

:

A randomized block Kaczmarz method and a randomized extended block Kaczmarz method are proposed for solving the matrix equation

A X B = C

, where the matrices A and B may be full-rank or rank-deficient. These methods are iterative methods without matrix multiplication, and are especially suitable for solving large-scale matrix equations. It is theoretically proved that these methods converge to the solution or least-square solution of the matrix equation. The numerical results show that these methods are more efficient than the existing algorithms for high-dimensional matrix equations.

Keywords:

matrix equation; randomized block Kaczmarz; randomized extended block Kaczmarz; convergence

MSC:

65F10; 65F45; 65H10

1. Introduction

Consider the linear matrix equation

A X B = C,

(1)

where

A \in R^{m \times p}

,

B \in R^{q \times n}

and

C \in R^{m \times n}

. Such problems arise in many practical applications such as surface fitting in computer-aided geometric design (CAGD), signal and image processing, photogrammetry, etc.; see, for example, [1,2,3,4] and the large body of literature therein. If

A X B = C

is consistent,

X^{*} = A^{†} C B^{†}

is the minimum Frobenius norm solution. If

A X B = C

is inconsistent,

X^{*} = A^{†} C B^{†}

is the minimum Frobenius norm least-squares solution. When the matrices A and B are small and dense, direct methods based on QR fractions are attractive [5,6]. However, for large A and B matrices, iterative methods have attracted a lot of attention [7,8,9,10,11]. Recently, Du et al. proposed the randomized block coordinate descent (RBCD) method for solving the matrix least-squares problem

min_{X \in R^{p \times q}} {∥ C - A X B ∥}_{F}

without strong convexity assumption in [12]. This method requires that matrix B is a full row-rank matrix. Wu et al. [13] introduced two kinds of Kaczmarz-type methods to solve the consistent matrix equation

A X B = C

: relaxed greedy randomized Kaczmarz (ME-RGRK) and maximal weighted residual Kaczmarz (ME-MWRK). Although the row and column index selection strategy is time-consuming, the ideas of these two methods are suitable for solving large-scale consistent matrix equations.

In this paper, the randomized Kaczmarz method [14] and the randomized extended Kaczmarz method [15] are used to solve consistent and inconsistent matrix equation (1) with the product of the matrix and vector.

All the results in this paper hold in the complex field. But for the sake of simplicity, we only discuss them in terms of the real number field.

In this paper, we denote

A^{T}

,

A^{†}

,

r (A)

,

R (A)

,

{∥ A ∥}_{F} = \sqrt{trace (A^{T} A)}

and

{〈 A, B 〉}_{F} = trace (A^{T} B)

as the transpose, the Moore–Penrose generalized inverse, the rank of A, the column space of A, the Frobenius norm of A and the inner product of two matrices A and B, respectively. For an integer

n \geq 1

, let

[n] = {1, 2, \dots, n}

. We use I to denote the identity matrix whose order is clear from the context. In addition, for a given matrix

G = (g_{i j}) \in R^{m \times n}

,

G_{i, :}

,

G_{:, j}

,

σ_{max} (G)

and

σ_{min} (G)

are used to denote the ith row, the jth column, the maximum singular value and the smallest nonzero singular value of G, respectively. Let

E_{k}

denote the expected value conditional on the first k iterations, that is,

E_{k} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1}, \dots, i_{k - 1}, j_{k - 1}],

where

i_{s}

and

j_{s} (s = 0, 1, \dots, k - 1)

are the row and the column chosen at the sth iteration. Let the conditional expectations with respect to the random row index be

E_{k}^{i} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1}, \dots, i_{k - 1}, j_{k - 1}, j_{k}]

and with respect to the random column index be

E_{k}^{j} [\cdot] = E [\cdot | i_{0}, j_{0}, i_{1}, j_{1} \dots, i_{k - 1}, j_{k - 1}, i_{k}] .

By the law of total expectation, it holds that

E_{k} [\cdot] = E_{k}^{i} [E_{k}^{j} [\cdot]]

.

The organization of this paper is as follows. In Section 2, we will discuss the block Kaczmarz method (ME-RBK) for finding the minimal F-norm solution (

A^{†} C B^{†}

) of consistent matrix equation (1). In Section 3, we discuss the extended block Kaczmarz method (IME-REBK) for finding the minimal F-norm least-squares solution of matrix equation (1). In Section 4, some numerical examples are provided to illustrate the effectiveness of our new methods. Finally, some brief concluding remarks are described in Section 5.

2. The Randomized Block Kaczmarz Method for Consistent Equation

At the kth iteration, the Kaczmarz method selects randomized a row

i \in [m]

of A and performs an orthogonal projection of the current estimate matrix

X^{(k)}

onto the corresponding hyperplane

H_{i} = {X \in R^{p \times q} | A_{i, :} X B = C_{i, :}}

, that is,

min_{X \in R^{p \times q}} \frac{1}{2} {∥ X - X^{(k)} ∥}_{F}^{2} s . t . A_{i, :} X B = C_{i, :} .

(2)

The Lagrangian function of the conditional optimization problem (2) is

L (X, Y) = \frac{1}{2} {∥ X - X^{(k)} ∥}_{F}^{2} + 〈 Y, A_{i, :} X B - C_{i, :} 〉,

(3)

where

Y \in R^{1 \times n}

is a Lagrangian multiplier. Via the matrix differentiation, we obtain the gradient of

L (X, Y)

and set

\nabla L (X, Y) = 0

to find the stationary matrix:

\{\begin{matrix} \nabla_{X} L (X, Y) |_{X^{(k + 1)}} = X^{(k + 1)} - X^{(k)} + A_{i, :}^{T} Y B^{T} = 0, \\ \nabla_{Y} L (X, Y) |_{X^{(k + 1)}} = A_{i, :} X^{(k + 1)} B - C_{i, :} = 0 . \end{matrix}

(4)

Using the first equation of (4), we have

X^{(k + 1)} = X^{(k)} - A_{i, :}^{T} Y B^{T}

. Substituting this into the second equation of (4), we can obtain

Y = - \frac{1}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - A_{i, :} X^{(k)}) {(B^{T} B)}^{†}

. So, the projected randomized block Kacmarz (ME-PRBK) for solving

A X B = C

iterates as

X^{(k + 1)} = X^{(k)} + \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - A_{i, :} X^{(k)} B) B^{†} .

(5)

However, in practice, it is very expensive to calculate the pseudoinverse of large-scale matrices. Next, we generalize the average block Kaczmarz method [16] for solving linear equation to matrix equation.

At the kth step, we obtain the approximate solution

X^{(k + 1)}

by projecting the current estimate

X^{(k)}

onto the hyperplane

H_{i, j} = {X \in R^{p \times q} | A_{i, :} X^{(k)} B_{:, j} = C_{i, j}}

. Using the Lagrangian multiplier method, we can obtain the following Kaczmarz method for

A X B = C

:

X^{(k + 1)} = X^{(k)} + \frac{A_{i, :}^{T} (C_{i, :} - A_{i, :} X^{(k)} B_{:, j}) B_{:, j}^{T}}{{∥ A_{i, :} ∥}_{2}^{2} {∥ B_{:, j} ∥}_{2}^{2}} .

Inspired by the idea of the average block Kaczmaz algorithm for

A x = b

, we consider the average block Kaczmaz method for

A X B = C

with respect to B.

X^{(k + 1)} = X^{(k)} + λ \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} \sum_{j = 1}^{n} v_{j}^{k} \frac{(C_{i, :} - A_{i, :} X^{(k)} B_{:, j}) B_{:, j}^{T}}{∥ B_{:, j} ∥_{2}^{2}},

where

λ

is stepsize and

v_{j}^{k}

are the weights that satisfy

v_{j}^{k} \geq 0

and

\sum_{j = 1}^{n} v_{j}^{k} = 1

. If

v_{j}^{k} = \frac{∥ B_{:, j} ∥_{2}^{2}}{{∥ B ∥}_{F}^{2}}

, then

X^{(k + 1)} = X^{(k)} + \frac{λ}{{∥ B ∥}_{F}^{2}} \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - A_{i, :} X^{(k)} B) B^{T} .

Setting

α = \frac{λ}{{∥ B ∥}_{F}^{2}} > 0

, we obtain the following randomized block Kaczmarz iteration:

X^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} ((C_{i, :} - (A_{i, :} X^{(k)}) B) B^{T}), k = 0, 1, 2, \dots,

(6)

where i is selected with probability

p_{i} = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

. We describe this method as Algorithm 1, which is called the ME-RBK algorithm.

Algorithm 1 Randomized Block Kaczmarz Method for

A X B = C

(ME-RBK)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$
1:: for $k = 0, 1, 2, \dots,$ do
2:: Pick i with probability $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$
3:: Compute $X^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} ((C_{i, :} - (A_{i, :} X^{(k)}) B) B^{T})$
4:: end for

We arrange the computational process of calculating

X^{(k + 1)}

in Table 1, which only costs

4 q (n + p) + p + 1 - 2 q

flopping operations (flops) if the square of the row norm of A has been calculated in advance.

Remark 1.

Note that the problem of finding a solution of

A X B = C

can be posed as the following linear least-squares problem:

min_{X \in R^{p \times q}} \frac{1}{2} {∥ A X B - C ∥}_{F}^{2} = min_{X \in R^{p \times q}} \frac{1}{2} \sum_{i = 1}^{m} {∥ A_{i, :} X B - C_{i, :} ∥}_{2}^{2} .

(7)

Define the component function

f_{i} (X) = \frac{1}{2} {∥ A_{i, :} X B - C_{i, :} ∥}_{2}^{2},

then differentiate with X to obtain its gradient

\nabla f_{i} (X) = A_{i, :}^{T} (A_{i, :} X B - C_{i, :}) B^{T} .

Therefore, the randomized block Kaczmarz method (6) is equivalent to one step of the stochastic gradient descent method [17] applied to (7) with stepsize

\frac{α}{∥ A_{i, :} ∥_{2}^{2}}

.

First, we give the following lemma, whose proof can be found in [12].

Lemma 1

([12]). Let

A \in R^{m \times p}

and

B \in R^{q \times n}

be any nonzero matrix. Let

M = {M \in R^{p \times q} | \exists Y \in R^{m \times n} s . t . M = A^{T} Y B^{T}} .

For any matrix

M \in M

, it holds that

{∥ A M B ∥}_{F}^{2} \geq σ_{min}^{2} (A) σ_{min}^{2} (B) {∥ M ∥}_{F}^{2} .

Remark 2.

M \in M

means that

M_{:, j} \in R (A^{T}), j = 1, 2, \dots, q

and

{(M_{i, :})}^{T} \in R (B),

i = 1, 2, \dots, p

. In fact,

M

is well defined because

0 \in M

and

A^{†} C B^{†} \in M

.

In the following theorem, with the idea of the RK method [14], we will prove that

X^{(k)}

generated by Algorithm 1 converges to the least F-norm solution of

A X B = C

.

Theorem 1.

Assume

0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}

. If matrix equation (1) is consistent, the sequence

{X^{(k)}}

generated by the ME-RBK method starting from the initial matrix

X^{(0)} \in R^{p \times q}

, in which

X_{:, j}^{(0)} \in R (A^{T}),

j = 1, 2, \dots, q

and

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, 2, \dots, p

, converges linearly to

A^{†} C B^{†}

in mean square form. Moreover, the solution error in expectation for the iteration sequence

X^{(k)}

obeys

E [{∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] \leq ρ^{k} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2},

(8)

where

ρ = 1 - \frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} σ_{min}^{2} (A) σ_{min}^{2} (B)

, and

i \in [m]

picked with probability

p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

.

Proof.

For

k = 0, 1, 2, \dots

, by (6) and

A_{i, :} A^{†} C B^{†} B = C_{i, :}

(consistency), we have

\begin{matrix} X^{(k + 1)} - A^{†} C B^{†} & = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} (C_{i, :} - A_{i, :} X^{(k)} B) B^{T} - A^{†} C B^{†} \\ = (X^{(k)} - A^{†} C B^{†}) - \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T}, \end{matrix}

then

\begin{matrix} ∥ X^{(k + 1)} - A^{†} C B^{†} ∥_{F}^{2} & = {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} + \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{4}} {∥ A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} ∥}_{F}^{2} \\ - \frac{2 α}{∥ A_{i, :} ∥_{2}^{2}} {〈 X^{(k)} - A^{†} C B^{†}, A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} 〉}_{F} . \end{matrix}

It follows from

\begin{matrix} \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{4}} {∥ A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} ∥}_{F}^{2} \\ = \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{4}} trace (B B^{T} {(X^{(k)} - A^{†} C B^{†})}^{T} A_{i, :}^{T} A_{i, :} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T}) \\ = \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{2}} trace (B B^{T} {(X^{(k)} - A^{†} C B^{†})}^{T} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T}) \\ = \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{2}} ∥ A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} ∥_{2}^{2} (by trace (u u^{T}) = {∥ u ∥}_{2}^{2} for any vector u) \\ \leq \frac{α^{2} {∥ B ∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} ∥ A_{i, :} (X^{(k)} - A^{†} C B^{†}) {B ∥}_{2}^{2} (by ∥ u^{T} B^{T} ∥_{2} = {∥ B u ∥}_{2} \leq {∥ B ∥}_{2} {∥ u ∥}_{2}), \end{matrix}

and

\begin{matrix} \frac{2 α}{∥ A_{i, :} ∥_{2}^{2}} {〈 X^{(k)} - A^{†} C B^{†}, A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} 〉}_{F} \\ = \frac{2 α}{∥ A_{i, :} ∥_{2}^{2}} trace (B^{T} {(X^{(k)} - A^{†} C B^{†})}^{T} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B) \\ = \frac{2 α}{∥ A_{i, :} ∥_{2}^{2}} {∥ A_{i, :} (X^{(k)} - A^{†} C B^{†}) B ∥}_{2}^{2} \end{matrix}

that

{∥ X^{(k + 1)} - A^{†} C B^{†} ∥}_{F}^{2} \leq {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} - \frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} {∥ A_{i, :} (X^{(k)} - A^{†} C B^{†}) B ∥}_{2}^{2} .

By taking the conditional expectation, we have

\begin{matrix} E_{k} [{∥X^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] \leq & {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} \\ - E_{k} [\frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} {∥ A_{i, :} (X^{(k)} - A^{†} C B^{†}) B ∥}_{2}^{2}] \\ = & {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} - \frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} {∥ A (X^{(k)} - A^{†} C B^{†}) B ∥}_{F}^{2} . \end{matrix}

From

X^{(0)} \in M

and

A^{†} C B^{†} \in M

, we have

X^{(0)} - A^{†} C B^{†} \in M

. Noting

A_{i, :}^{T} = A^{T} I_{:, i}

, it is easy to show that

X^{(k)} - A^{†} C B^{†} \in M

through induction. Then, from Lemma 1 and

0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}

, we can obtain

\begin{matrix} E_{k} [{∥X^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] & \leq {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} - \frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} \\ \cdot σ_{min}^{2} (A) σ_{min}^{2} (B) {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} \\ = (1 - \frac{2 α - α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} σ_{min}^{2} (A) σ_{min}^{2} (B)) {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2} . \end{matrix}

(9)

Finally, from (9) and induction on the iteration index k, we obtain the estimate (8). □

Remark 3.

Using a similar approach to that used in the proof of Theorem 1, we can prove that the iterate

X^{(k)}

generated by ME-PRBK (5) satisfies the following estimate:

E [{∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] \leq {\hat{ρ}}^{k} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2},

where

\hat{ρ} = 1 - \frac{σ_{min}^{2} (A) σ_{min}^{2} (B)}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{2}^{2}}

. The convergence factor of GRK in [18] is

ρ_{G R K} = 1 - \frac{σ_{min}^{2} (A) σ_{min}^{2} (B)}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{F}^{2}}

. It is obvious that

ρ_{G R K} > min_{0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}} ρ = 1 - \frac{σ_{min}^{2} (A) σ_{min}^{2} (B)}{{∥ A ∥}_{F}^{2} {∥ B ∥}_{2}^{2}} = \hat{ρ}

and

ρ < ρ_{G R K}

when

\frac{1 - \sqrt{1 - \frac{{∥ B ∥}_{2}^{2}}{{∥ B ∥}_{F}^{2}}}}{{∥ B ∥}_{2}^{2}} < α < \frac{1 + \sqrt{1 - \frac{{∥ B ∥}_{2}^{2}}{{∥ B ∥}_{F}^{2}}}}{{∥ B ∥}_{2}^{2}}

. This means that the convergence factor of ME-PRBK is the smallest and the factor of ME-RBK can be smaller than that of GRK when α is properly selected.

3. The Randomized Extended Block Kaczmarz Method for Inconsistent Equation

In [15,19,20], the authors proved that the Kaczmarz method does not converge to the least-squares solution of

A X = b

when

A X = b

is inconsistent. Analogously, if the matrix equation (1) is inconsistent, the above ME-PRBK method dose not converge to

A^{†} C B^{†}

. The following theorem gives the error bound of the inconsistent matrix equation.

Theorem 2.

Assume that the consistent equation

A X B = C

has a solution

X^{*} = A^{†} C B^{†}

. Let

{\hat{X}}^{(k)}

denote the kth iterate of the ME-PRBK method applied to the inconsistent equation

A X B = C + W

for any

W \in R^{m \times n}

starting from the initial matrix

X^{(0)} \in R^{p \times q}

, in which

X_{:, j}^{(0)} \in R (A^{T}), j = 1, 2, \dots, q

and

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, 2, \dots, p

. In exact arithmetic, it follows that

E [{∥{\hat{X}}^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] \leq {\hat{ρ}}^{k} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2} + \frac{1 - {\hat{ρ}}^{k}}{1 - \hat{ρ}} \frac{∥ W B^{†} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}},

(10)

Proof.

Set

H_{i} = {X | A_{i} X B = C_{i}}

,

{\hat{H}}_{i} = {X | A_{i} X B = C_{i} + W_{i}}

. Let Y denote the iterate of the PRBK method applied to the consistent equation

A X B = C

at the kth step, that is,

Y = {\hat{X}}^{(k)} + \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} (C_{i, :} - A_{i, :} {\hat{X}}^{(k)} B) B^{†} .

It follows from

\begin{matrix} {〈 Y - A^{†} C B^{†}, {\hat{X}}^{(k + 1)} - A^{†} C B^{†} 〉}_{F} & = {〈 Y - A^{†} C B^{†}, \frac{A_{i, :}^{T}}{∥ A_{i, :} ∥_{2}^{2}} W_{i} B^{†} 〉}_{F} \\ = trace ({(B^{†})}^{T} W_{i}^{T} \frac{A_{i, :}}{∥ A_{i, :} ∥_{2}^{2}} (Y - A^{†} C B^{†})) \\ = trace (W_{i}^{T} \frac{A_{i, :} Y - A_{i, :} A^{†} C B^{†}}{∥ A_{i, :} ∥_{2}^{2}} {(B^{†})}^{T}) \\ = trace (W_{i}^{T} \frac{A_{i, :} Y - A_{i, :} A^{†} C B^{†}}{∥ A_{i, :} ∥_{2}^{2}} B {(B^{T} B)}^{†}) \\ (by {(B^{†})}^{T} = B {(B^{T} B)}^{†}) \\ = trace (W_{i}^{T} \frac{A_{i, :} Y B - A_{i, :} A^{†} C B^{†} B}{∥ A_{i, :} ∥_{2}^{2}} {(B^{T} B)}^{†}) = 0 \end{matrix}

and

\begin{matrix} {∥{\hat{X}}^{(k + 1)} - Y∥}_{F}^{2} = \frac{{∥A_{i, :}^{T} W_{i} B^{†}∥}_{F}^{2}}{∥ A_{i, :} ∥_{2}^{4}} = \frac{1}{∥ A_{i, :} ∥_{2}^{4}} trace ({(B^{†})}^{T} W_{i}^{T} A_{i, :} A_{i, :}^{T} W_{i} B^{†}) = \frac{{∥W_{i} B^{†}∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} \end{matrix}

that

\begin{matrix} {∥{\hat{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2} & = {∥Y - A^{†} C B^{†}∥}_{F}^{2} + {∥{\hat{X}}^{(k + 1)} - Y∥}_{F}^{2} \\ = {∥Y - A^{†} C B^{†}∥}_{F}^{2} + \frac{{∥W_{i} B^{†}∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} . \end{matrix}

(11)

By taking the conditional expectation on both sides of (11), we can obtain

\begin{matrix} E_{k} [{∥{\hat{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] & = E_{k} [{∥Y - A^{†} C B^{†}∥}_{F}^{2}] + E_{k} [\frac{{∥W_{i} B^{†}∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] \\ \leq \hat{ρ} {∥{\hat{X}}^{(k)} - A^{†} C B^{†}∥}_{F}^{2} + \frac{{∥W B^{†}∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} \end{matrix}

The inequality is obtained using Remark 3. Applying this recursive relation iteratively, we have

\begin{matrix} E [{∥{\hat{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] & \leq \hat{ρ} E [{∥{\hat{X}}^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] + \frac{{∥W B^{†}∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ \leq {\hat{ρ}}^{2} E [{∥{\hat{X}}^{(k - 1)} - A^{†} C B^{†}∥}_{F}^{2}] + (\hat{ρ} + 1) \frac{{∥W B^{†}∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ \leq \dots \\ \leq {\hat{ρ}}^{k + 1} {∥{\hat{X}}^{(0)} - A^{†} C B^{†}∥}_{F}^{2} + ({\hat{ρ}}^{k} + \dots + \hat{ρ} + 1) \frac{{∥W B^{†}∥}_{F}^{2}}{{∥ A ∥}_{F}^{2}} \\ = {\hat{ρ}}^{k + 1} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2} + \frac{1 - {\hat{ρ}}^{k + 1}}{1 - \hat{ρ}} \frac{∥ W B^{†} ∥_{F}^{2}}{{∥ A ∥}_{F}^{2}} . \end{matrix}

This completes the proof. □

Next, we use the idea of the randomized extended Kaczmarz method (see [20,21,22] for details) to solve the least-squares solution of the inconsistent Equation (1). At each iteration,

Z^{(k)}

is the kth iterate of ME-RBK applied to

A^{T} Z B^{T} = 0

with the initial guess

Z^{(0)}

, and

X^{(k)}

is the one-step ME-RBK update for

A X B = C - Z^{(k)}

. We can obtain the following randomized extended block Kaczmarz iteration:

\{\begin{matrix} Z^{(k + 1)} = Z^{(k)} - \frac{α}{∥ A_{:, j} ∥_{2}^{2}} A_{:, j} (((A_{:, j}^{T} Z^{(k)}) B^{T}) B), \\ X^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} ((C_{i, :} - Z_{i, :}^{(k + 1)} - (A_{i, :} X^{(k)}) B) B^{T}), \end{matrix}

(12)

where

α > 0

is the step size, and i and j are selected with probability

p_{i} = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

and

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

, respectively. The cost of each iteration of this method is

4 n (q + m) + m + 1 - 2 n - q

for updating

Z^{(k)}

and

(4 q + 1) (n + p) + 1 - 2 q

for updating

X^{(k)}

if the square of the row norm and the column norm of A have been calculated in advance. We describe this method as Algorithm 2, which is called the ME-REBK algorithm.

Algorithm 2 Randomized Extended Block Kaczmarz Method for

A X B = C

(ME-REBK)

Input:: $A \in R^{m \times p}$ , $B \in R^{q \times n}$ , $C \in R^{m \times n}$ , $X^{(0)} = 0 \in R^{p \times q}$ , $Z^{(0)} = C$
1:: for $k = 0, 1, 2, \dots,$ do
2:: Pick j with probability ${\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$
3:: Compute $Z^{(k + 1)} = Z^{(k)} - \frac{α}{∥ A_{:, j} ∥_{2}^{2}} A_{:, j} (((A_{:, j}^{T} Z^{(k)}) B^{T}) B)$
4:: Pick i with probability $p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}$
5:: Compute $X^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} ((C_{i, :} - Z_{i, :}^{(k + 1)} - (A_{i, :} X^{(k)}) B) B^{T})$
6:: end for

Theorem 3.

Assume

0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}

. Let

{Z^{(k)}}

denote the kth iteration of ME-RBK applied to

A^{T} Z B^{T} = 0

starting from the initial matrix

Z^{(0)} \in R^{m \times n}

, in which

Z_{:, j}^{(0)} \in C_{:, j} + R (A),

j = 1, 2, \dots, n

and

{(Z_{i, :}^{(0)})}^{T} \in {(C_{i, :})}^{T} + R (B^{T}), i = 1, 2, \dots, m

. Then,

{Z^{(k)}}

converges linearly to

C - A A^{†} C B^{†} B

in mean square form, and the solution error in expectation for the iteration sequence

X^{(k)}

obeys

E [{∥Z^{(k)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2}] \leq ρ^{k} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2},

(13)

where the jth column of A is selected with probability

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

.

Proof.

In Theorem 1, replacing A with

A^{T}

, B with

B^{T}

and C with 0, we can prove Theorem 3 based on the result of Theorem 1. For the sake of conciseness, we omit the proof process. □

Theorem 4.

Assume

0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}

. The sequence

{X^{(k)}}

is generated using the ME-REBK method for

A X B = C

, starting from the initial matrix

X^{(0)} \in R^{p \times n}

and

Z^{(0)} \in R^{m \times n}

, where

X_{:, j}^{(0)} \in R (A^{T}),

j = 1, 2, \dots, q

,

{(X_{i, :}^{(0)})}^{T} \in R (B), i = 1, 2, \dots, p

Z_{:, j}^{(0)} \in C_{:, j} + R (A),

j = 1, 2, \dots, n

and

{(Z_{i, :}^{(0)})}^{T} \in {(C_{i, :})}^{T} + R (B^{T}), i = 1, 2, \dots, m

. For any

ε > 0

, it holds that

\begin{matrix} E [{∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] \leq & \frac{{(1 + ε)}^{k + 1} - (1 + ε)}{ε^{2}} \frac{α^{2} {∥ B ∥}_{2}^{2} ρ^{k}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} \\ + {(1 + ε)}^{k} ρ^{k} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2}, \end{matrix}

(14)

where

i \in [m]

,

j \in [p]

are picked with probability

p_{i} (A) = \frac{∥ A_{i, :} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

and

{\hat{p}}_{j} (A) = \frac{∥ A_{:, j} ∥_{2}^{2}}{{∥ A ∥}_{F}^{2}}

, respectively.

Proof.

Let

X^{(k)}

denote the kth iteration of the ME-REBK method for

A X B = C

, and

{\tilde{X}}^{(k + 1)}

be the one-step Kaczmarz update for the matrix equation

A X B = A A^{†} C B^{†} B

from

X^{(k)}

, i.e.,

{\tilde{X}}^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} (A_{i, :} A^{†} C B^{†} B - A_{i, :} X^{(k)} B) B^{T} .

We have

\begin{matrix} {\tilde{X}}^{(k + 1)} - A^{†} C B^{†} & = X^{(k)} - A^{†} C B^{†} - \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T} \end{matrix}

and

X^{(k + 1)} - {\tilde{X}}^{(k + 1)} = \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B) B^{T} .

For any

ε > 0

, via triangle inequality and Young’s inequality, we can obtain

\begin{matrix} {∥X^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2} = & {∥(X^{(k + 1)} - {\tilde{X}}^{(k + 1)}) + ({\tilde{X}}^{(k + 1)} - A^{†} C B^{†})∥}_{F}^{2} \\ \leq & {({∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F} + {∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F})}^{2} \\ \leq & {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2} + {∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2} \\ + 2 {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F} {∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F} \\ \leq & (1 + \frac{1}{ε}) {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2} + (1 + ε) {∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2} . \end{matrix}

(15)

By taking the conditional expectation on the both sides of (15), we have

\begin{matrix} E_{k} [{∥X^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] \leq & (1 + \frac{1}{ε}) E_{k} [{∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2}] \\ + (1 + ε) E_{k} [{∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] . \end{matrix}

(16)

It follows from

\begin{matrix} {∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2} = {∥\frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B) B^{T}∥}_{F}^{2} \\ = \frac{α^{2}}{∥ A_{i, :} ∥_{2}^{2}} trace (B {(C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B)}^{T} (C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B) B^{T}) \\ \leq \frac{α^{2} {∥ B ∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}} {∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B∥}_{2}^{2} \end{matrix}

that

\begin{matrix} E_{k} [{∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2}] & \leq α^{2} {∥ B ∥}_{2}^{2} E_{k}^{j} E_{k}^{i} [\frac{{∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B∥}_{2}^{2}}{∥ A_{i, :} ∥_{2}^{2}}] \\ = α^{2} {∥ B ∥}_{2}^{2} E_{k}^{j} [\frac{1}{{∥ A ∥}_{F}^{2}} \sum_{i = 1}^{m} {∥C_{i, :} - Z_{i, :}^{(k + 1)} - A_{i, :} A^{†} C B^{†} B∥}_{2}^{2}] \\ = \frac{α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} E_{k} [{∥Z^{(k + 1)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2}] . \end{matrix}

By Theorem 3, it yields

\begin{matrix} E [{∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2}] & \leq \frac{α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} E [{∥Z^{(k + 1)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2}] \\ \leq \frac{α^{2} {∥ B ∥}_{2}^{2}}{{∥ A ∥}_{F}^{2}} ρ^{k + 1} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} . \end{matrix}

(17)

From

X^{(0)} - A^{†} C B^{†} \in M

, we have

X^{(k)} - A^{†} C B^{†} \in M

. Then, by using Theorem 1, we can obtain

\begin{matrix} E_{k} [∥ {\tilde{X}}^{(k + 1)} - A^{†} C B^{†} ∥_{F}^{2}] & = E_{k} [{∥X^{(k)} - A^{†} C B^{†} - \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} A_{i, :} (X^{(k)} - A^{†} C B^{†}) B B^{T}∥}_{F}^{2}] \\ \leq ρ {∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}, \end{matrix}

then

\begin{matrix} E [{∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] \leq ρ E [{∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] . \end{matrix}

(18)

Combining (16)–(18) yields

\begin{matrix} E [{∥X^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] \leq & (1 + \frac{1}{ε}) E [{∥X^{(k + 1)} - {\tilde{X}}^{(k + 1)}∥}_{F}^{2}] \\ + (1 + ε) E [{∥{\tilde{X}}^{(k + 1)} - A^{†} C B^{†}∥}_{F}^{2}] \\ \leq & (1 + \frac{1}{ε}) \frac{α^{2} {∥ B ∥}_{2}^{2} ρ^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} \\ + (1 + ε) ρ E [{∥X^{(k)} - A^{†} C B^{†}∥}_{F}^{2}] \\ \leq & (1 + \frac{1}{ε}) \frac{α^{2} {∥ B ∥}_{2}^{2} ρ^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} [1 + (1 + ε)] \\ + {(1 + ε)}^{2} ρ^{2} E [{∥X^{(k - 1)} - A^{†} C B^{†}∥}_{F}^{2}] \\ \leq & \dots \\ \leq & (1 + \frac{1}{ε}) \frac{α^{2} {∥ B ∥}_{2}^{2} ρ^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} \sum_{i = 0}^{k} {(1 + ε)}^{i} \\ + {(1 + ε)}^{k + 1} ρ^{k + 1} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2} \\ = & \frac{{(1 + ε)}^{k + 2} - (1 + ε)}{ε^{2}} \frac{α^{2} {∥ B ∥}_{2}^{2} ρ^{k + 1}}{{∥ A ∥}_{F}^{2}} {∥Z^{(0)} - (C - A A^{†} C B^{†} B)∥}_{F}^{2} \\ + {(1 + ε)}^{k + 1} ρ^{k + 1} {∥X^{(0)} - A^{†} C B^{†}∥}_{F}^{2} . \end{matrix}

This completes the proof. □

Remark 4.

Replacing

B^{T}

in (12) with

B^{†}

, we obtain the following projection-based randomized extended block Kaczmarz mathod (ME-PREBK) iteration:

\{\begin{matrix} Z^{(k + 1)} = Z^{(k)} - \frac{α}{∥ A_{:, j} ∥_{2}^{2}} A_{:, j} (((A_{:, j}^{T} Z^{(k)}) B^{T}) {(B^{†})}^{T}), \\ X^{(k + 1)} = X^{(k)} + \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T} ((C_{i, :} - Z_{i, :}^{(k + 1)} - (A_{i, :} X^{(k)}) B) B^{†}), \end{matrix}

(19)

4. Numerical Experiments

In this section, we will present some experimental results of the proposed algorithms for solving various matrix equations, and compare them with ME-RGRK and ME-MWRK in [13] for consistent matrix equations and RBCD in [12] for inconsistent matrix equations. All experiments were carried out using MATLAB (version R2020a) on a DESKTOP-8CBRR86 with Intel(R) Core(TM) i7-4712MQ CPU @2.30GHz 2.29GHz, RAM 8GB and Windows 10.

All computations start from the initial guess

X^{(0)} = 0

, and are terminated once the relative error (RE) of the solution, defined by

R E = \frac{∥ X^{(k)} - X^{*} ∥_{F}^{2}}{∥ X^{*} ∥_{F}^{2}}

at the current iteration

X^{(k)}

, satisfies

R E < 10^{- 6}

or exceeds the maximum iteration K = 50,000, where

X^{*} = A^{†} C B^{†}

. We report the average number of iterations (denoted as “IT”) and the average computing time in seconds (denoted as“CPU”) for 20 repeated trial runs of the corresponding method. Three examples are tested, and A and B are generated as follows.

Type I: For given $m, p, q, n$ , the entries of A and B are generated from a standard normal distribution, i.e., $A = r a n d n (m, p), B = r a n d n (q, n) .$
Type II: Like [18], for given $m, p$ , and $r_{1} = r a n k (A)$ , we construct a matrix A by $A = U_{1} D_{1} V_{1}^{T}$ , where $U_{1} \in R^{m \times r_{1}}$ and $V_{1} \in R^{p \times r_{1}}$ are orthogonal column matrices, $D \in R^{r_{1} \times r_{1}}$ is a diagonal matrix whose first $r - 2$ diagonal entries are uniformly distributed numbers in $[σ_{min} (A), σ_{max} (A)]$ , and the last two diagonal entries are $σ_{max} (A), σ_{min} (A)$ . The entries of B are generated using a similar method with parameters $q, n, r_{2} = r a n k (B)$ .
Type III: The real-world sparse data come from the Florida sparse matrix collection [23]. Table 2 lists the features of these sparse matrices.

Table 2. The detailed features of sparse matrices from [23].

Name	Size	Rank	Sparsity
ash219	$219 \times 85$	85	$2.3529 %$
ash958	$958 \times 292$	292	$0.68493 %$
divorce	$50 \times 9$	9	$50 %$

4.1. Consistent Matrix Equation

Given

A, B

, we set

C = A X^{*} B

with

X^{*} = r a n d n (p, q)

to construct a consistent matrix equation. First, we test the impact of

α

in the ME-RBK method on the experimental results. Figure 1 plots the IT and CPU versus different “para” with different matrices in Table 3, where

p a r a =

0.1:0.1:1.9 so that

α = \frac{p a r a}{{∥ B ∥}_{2}^{2}}

satisfies

0 < α < \frac{2}{{∥ B ∥}_{2}^{2}}

in Theorem 1. From Figure 1, it can be seen that the number of iteration steps and the running time decrease with the increase in parameters. However, when

p a r a = 1.9

, both IT and CPU begin to increase. The same situations occur when solving consistent or inconsistent equations with different matrices in Table 4 and Table 5. Therefore, we set

α = \frac{1.8}{{∥ B ∥}_{2}^{2}}

in all experiments.

In Table 3, Table 4 and Table 5, we report the average IT and CPU of the ME-RGRK, ME-MWRK, ME-RBK and ME-PRBK methods for solving consistent eqautions. In the following tables, the item “>” represents that the number of iteration steps exceeds the maximum iteration (50,000), and the item “-” represents that the method does not converge.

From these tables, we can see that the ME-RBK and ME-PRBK methods vastly outperform the ME-RGRK and ME-MWRK methods in terms of both IT and CPU times regardless of whether the matrices A and B are full column/row rank or not. As the matrix dimension increases, the CPU time of the ME-RBK and ME-PRBK methods increases slowly, while the running time of ME-RGRK and ME-MWRK increases dramatically.

In addition, when the matrix size is small, the ME-PRBK method is competitive, because the pseudoinverse is less expensive and the number of iteration steps is small. When the matrix size is large, the matrix is large, and the ME-RBK method is more challenging because it does not need to calculate the pseudoinverse (see the last line in Table 3).

4.2. Inconsistent Matrix Equation

To construct an inconsistent matrix equation, we set

C = A X^{*} B + R

, where

X^{*}

and R are random matrices which are generated by

X^{*} = r a n d n (p, q)

and

R = δ * r a n d n (p, q),

δ \in (0, 1)

. Numerical results of the RBCD, IME-REBK and IME-PREBK methods are listed in Table 6, Table 7 and Table 8. From these tables, we can see that the IME-PREBK method is better than the RBCD method in terms of IT and CPU time, especially when the

\frac{σ_{max}}{σ_{min}}

is large (see the last line in Table 7). The IME-REBK method is not competitive for B with full row rank because it needs to solve two equations. However, when B does not have full row rank, the RBCD method does not converge, while the IME-REBK and IME-PREBK methods do.

5. Conclusions

In this paper, we have proposed a randomized block Kaczmarz algorithm for solving the consistent matrix equation and its extended version for the inconsistent case. Theoretically, we have proved that the proposed algorithms converge linearly to the unique minimal F-norm solution or least-squares solution (i.e.,

A^{†} C B^{†}

) without requirements on A and B having full column/row rank. The numerical results show the effectiveness of the algorithms. Since the proposed algorithms only require one row or one column of A at each iteration without a matrix–matrix product, they are suitable for the scenarios where the matrix A is too large to fit in the memory or matrix multiplication is considerably expensive.

Author Contributions

Conceptualization, L.X.; Methodology, L.X. and W.B.; Validation, L.X. and W.L.; Writing—Original Draft Preparation, L.X.; Writing—Review and Editing, L.X., W.B. and W.L.; Software, L.X.; Visualization, L.X. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability Statement

The datasets that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are thankful to the referees for their constructive comments and valuable suggestions, which have greatly improved the original manuscript of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rauhala, U.A. Introduction to array algebra, Photogramm. Eng. Remote Sens. 1980, 46, 177–192. [Google Scholar]
Regalia, P.A.; Mitra, S.K. Kronecker products, unitary matrices and signal processing applications. SIAM Rev. 1989, 31, 586–613. [Google Scholar] [CrossRef]
Liu, M.Z.; Li, B.J.; Guo, Q.J.; Zhu, C.; Hu, P.; Shao, Y. Progressive iterative approximation for regularized least square bivariate B-spline surface fitting. J. Comput. Appl. Math. 2018, 327, 175–187. [Google Scholar] [CrossRef]
Liu, Z.Y.; Li, Z.; Ferreira, C.; Zhang, Y.L. Stationary splitting iterative methods for the matrix equation AXB=C. Appl. Math. Comput. 2020, 378, 125195. [Google Scholar] [CrossRef]
Fausett, D.W.; Fulton, C.T. Large least squares problems involving Kronecker products. SIAM J. Matrix Anal. Appl. 1994, 15, 219–227. [Google Scholar] [CrossRef]
Zha, H.Y. Comments on large least squares problems involving Kronecker products. SIAM J. Matrix Anal. Appl. 1995, 16, 1172. [Google Scholar] [CrossRef]
Peng, Z.Y. An iterative method for the least squares symmetric solution of the linear matrix equation AXB = C. Appl. Math. Comput. 2005, 170, 711–723. [Google Scholar] [CrossRef]
Ding, F.; Liu, P.X.; Ding, J. Iterative solutions of the generalized Sylvester matrix equations by using the hierarchical identification principle. Appl. Math. Comput. 2008, 197, 41–50. [Google Scholar] [CrossRef]
Huang, G.X.; Yin, F.; Guo, K. An iterative method for the skew-symmetric solution and the optimal approximate solution of the matrix equation AXB = C. J. Comput. Appl. Math. 2008, 212, 231–244. [Google Scholar] [CrossRef]
Wang, X.; Li, Y.; Dai, L. On hermitian and skew-hermitian splitting iteration methods for the linear matrix equation AXB = C. Comput. Math. Appl. 2013, 65, 657–664. [Google Scholar] [CrossRef]
Shafiei, S.G.; Hajarian, M. Developing Kaczmarz method for solving Sylvester matrix equations. J. Franklin Inst. 2022, 359, 8991–9005. [Google Scholar] [CrossRef]
Du, K.; Ruan, C.C.; Sun, X.H. On the convergence of a randomized block coordinate descent algorithm for a matrix least squaress problem. Appl. Math. Lett. 2022, 124, 107689. [Google Scholar] [CrossRef]
Wu, N.C.; Liu, C.Z.; Zuo, Q. On the Kaczmarz methods based on relaxed greedy selection for solving matrix equation AXB = C. J. Comput. Appl. Math. 2022, 413, 114374. [Google Scholar] [CrossRef]
Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier. Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef]
Zouzias, A.; Freris, N.M. Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix. Anal. Appl. 2013, 34, 773–793. [Google Scholar] [CrossRef]
Ion, N. Faster randomized block kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 2019, 40, 1425–1452. [Google Scholar]
Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A. Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz. 2009, 19, 1574–1609. [Google Scholar] [CrossRef]
Niu, Y.Q.; Zheng, B. On global randomized block Kaczmarz algorithm for solving large-scale matrix equations. arXiv 2022, arXiv:2204.13920. [Google Scholar]
Needell, D. Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 2010, 50, 395–403. [Google Scholar] [CrossRef]
Ma, A.; Needell, D.A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 2015, 36, 1590–1604. [Google Scholar] [CrossRef]
Du, K. Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms. Numer. Linear Algebra Appl. 2019, 26, e2233. [Google Scholar] [CrossRef]
Du, K.; Si, W.T.; Sun, X.H. Randomized extended average block kaczmarz for solving least squares. SIAM J. Sci. Comput. 2020, 42, A3541–A3559. [Google Scholar] [CrossRef]
Davis, T.A.; Hu, Y. The university of Florida sparse matrix collection. Math. Softw. 2011, 38, 1–25. [Google Scholar]

Figure 1. IT (left) and CPU (right) of different para of ME-RBK for consistent matrix equations with differnt matrices in Table 3.

Table 1. The complexities of computing

X^{(k + 1)}

in ME-RBK.

Table 1. The complexities of computing

X^{(k + 1)}

in ME-RBK.

$y_{1} = A_{i, :} X^{(k)}$	$y_{2} = C_{i, :} - y_{1} B$	$y_{3} = y_{2} B^{T}$	$y_{4}^{T} = \frac{α}{∥ A_{i, :} ∥_{2}^{2}} A_{i, :}^{T}$	$Y_{1} = y_{4}^{T} y_{3}$	$X^{(k)} + Y_{1}$
$(2 p - 1) q$	$(2 q - 1) n + n$	$(2 n - 1) q$	$1 + p$	$p q$	$p q$

Table 3. IT and CPU of ME-RGRK, ME-MWRK, ME-RBK and ME-PRBK for the consistent matrix equations with Type I.

No.	m	p	q	n		ME-RGRK	ME-MWRK	ME-RBK	ME-PRBK
1	100	40	40	100	IT	49,707	27,579	7834.5	1152.8
1	100	40	40	100	CPU	0.78	2.01	0.20	0.04
2	40	100	100	40	IT	>	49,332.7	6334.7	1507.2
2	40	100	100	40	CPU	>	2.15	0.33	0.08
3	500	100	100	500	IT	>	32,109	4021.8	1866.1
3	500	100	100	500	CPU	>	57.61	0.52	0.19
4	1000	200	300	2000	IT	>	>	6429.6	4450.4
4	1000	200	300	2000	CPU	>	>	3.95	0.72

Table 4. IT and CPU of ME-RGRK, ME-MWRK, ME-RBK and ME-PRBK for the consistent matrix equations with Type II.

m	p	$r_{1}$	$[σ_{min} (A), σ_{max} (A)]$	q	n	$r_{2}$	$[σ_{min} (B), σ_{max} (B)]$		ME-RGRK	ME-MWRK	ME-RBK	ME-PRBK
100	40	20	[1, 2]	40	100	40	[1, 2]	IT	4361.0	1987.1	503.1	332.4
100	40	20	[1, 2]	40	100	40	[1, 2]	CPU	0.06	0.17	0.01	0.008
100	40	20	[1, 5]	40	100	20	[1, 5]	IT	22,423.2	6439.8	9307.6	1056.3
100	40	20	[1, 5]	40	100	20	[1, 5]	CPU	0.33	0.57	0.20	0.02
1000	200	100	[1, 2]	100	1000	50	[1, 2]	IT	20,055.5	7047.8	2587.4	1674.3
1000	200	100	[1, 2]	100	1000	50	[1, 2]	CPU	78.45	70.56	0.42	0.23
1000	100	50	[1, 5]	200	1000	200	[1, 5]	IT	>	>	18,898.3	2833.6
1000	100	50	[1, 5]	200	1000	200	[1, 5]	CPU	>	>	3.61	0.42

Table 5. IT and CPU of ME-RGRK, ME-MWRK, ME-RBK and ME-PRBK for the consistent matrix equations with Type III.

A	B		ME-RGRK	ME-MWRK	ME-RBK	ME-PRBK
divorce	ash219 $^{T}$	IT	43,927.8	14,164.4	10,993.4	3873.5
divorce	ash219 $^{T}$	CPU	1.15	1.35	0.36	0.13
divorce	ash219	IT	40,198.7	17,251.4	11,557.4	3124.7
divorce	ash219	CPU	0.63	0.80	0.43	0.11
ash219	ash958 $^{T}$	IT	>	>	6042.3	2267.0
ash219	ash958 $^{T}$	CPU	>	>	1.04	0.36
ash219	ash958	IT	>	>	5745.4	2114.2
ash219	ash958	CPU	>	>	1.22	0.42

Table 6. IT and CPU of RBCD, IME-REBK and IME-PREBK for the inconsistent matrix equations with Type I.

m	p	q	n		RBCD	IME-REBK	IME-PREBK
100	40	40	100	IT	17,212.2	21,270.5	2469.4
100	40	40	100	CPU	0.69	1.57	0.19
40	100	100	40	IT	-	24,708.3	2174.6
40	100	100	40	CPU	-	2.38	0.21
500	100	100	500	IT	6059.1	7352.8	2512.3
500	100	100	500	CPU	4.97	7.28	2.70
1000	200	300	2000	IT	14,209.2	12,490.4	5183.5
1000	200	300	2000	CPU	152.32	246.56	99.68

Table 7. IT and CPU of RBCD, IME-REBK and IME-PREBK for the inconsistent matrix equations with Type II.

m	p	$r_{1}$	$[σ_{min} (A), σ_{max} (A)]$	q	n	$r_{2}$	$[σ_{min} (B), σ_{max} (B)]$		RBCD	IME-REBK	IME-PREBK
100	40	20	[1, 2]	40	100	40	[1, 2]	IT	1035.9	762.2	384.5
100	40	20	[1, 2]	40	100	40	[1, 2]	CPU	0.04	0.05	0.03
100	40	20	[1, 5]	40	100	20	[1, 5]	IT	-	16,224.7	1507.8
100	40	20	[1, 5]	40	100	20	[1, 5]	CPU	-	1.23	0.12
1000	200	100	[1, 2]	100	1000	50	[1, 2]	IT	-	4067.5	2217.8
1000	200	100	[1, 2]	100	1000	50	[1, 2]	CPU	-	24.23	13.67
1000	200	100	[1, 5]	100	1000	100	[1, 5]	IT	>	48,269.6	4328.0
1000	200	100	[1, 5]	100	1000	100	[1, 5]	CPU	>	365.78	30.25

Table 8. IT and CPU of RBCD, IME-REBK and IME-PREBK for the inconsistent matrix equations with Type III.

A	B		RBCD	IME-REBK	IME-PREBK
divorce	ash219 $^{T}$	IT	>	20,026.3	4308.4
divorce	ash219 $^{T}$	CPU	>	2.26	0.48
divorce	ash219	IT	-	19,199.1	4026.2
divorce	ash219	CPU	-	1.49	0.31
ash219	ash958 $^{T}$	IT	22,313.4	10,823.5	2561.8
ash219	ash958 $^{T}$	CPU	15.69	19.03	5.89
ash219	ash958	IT	-	10,020.7	2363.5
ash219	ash958	CPU	-	15.83	4.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xing, L.; Bao, W.; Li, W. On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation. Mathematics 2023, 11, 4554. https://doi.org/10.3390/math11214554

AMA Style

Xing L, Bao W, Li W. On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation. Mathematics. 2023; 11(21):4554. https://doi.org/10.3390/math11214554

Chicago/Turabian Style

Xing, Lili, Wendi Bao, and Weiguo Li. 2023. "On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation" Mathematics 11, no. 21: 4554. https://doi.org/10.3390/math11214554

APA Style

Xing, L., Bao, W., & Li, W. (2023). On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation. Mathematics, 11(21), 4554. https://doi.org/10.3390/math11214554

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Convergence of the Randomized Block Kaczmarz Algorithm for Solving a Matrix Equation

Abstract

1. Introduction

2. The Randomized Block Kaczmarz Method for Consistent Equation

3. The Randomized Extended Block Kaczmarz Method for Inconsistent Equation

4. Numerical Experiments

4.1. Consistent Matrix Equation

4.2. Inconsistent Matrix Equation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI