Next Article in Journal
A General Statistical Physics Framework for Assignment Problems
Next Article in Special Issue
Re-Orthogonalized/Affine GMRES and Orthogonalized Maximal Projection Algorithm for Solving Linear Systems
Previous Article in Journal
Particle Swarm Optimization-Based Model Abstraction and Explanation Generation for a Recurrent Neural Network
Previous Article in Special Issue
The Algorithm of Gu and Eisenstat and D-Optimal Design of Experiments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems

1
Center of Excellence for Ocean Engineering, National Taiwan Ocean University, Keelung 20224, Taiwan
2
Department of Mechanical Engineering, National United University, Miaoli 36063, Taiwan
*
Author to whom correspondence should be addressed.
Algorithms 2024, 17(5), 211; https://doi.org/10.3390/a17050211
Submission received: 1 April 2024 / Revised: 2 May 2024 / Accepted: 7 May 2024 / Published: 14 May 2024
(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Abstract

:
A double optimal solution (DOS) of a least-squares problem A x = b , A R q × n with q n is derived in an m + 1 -dimensional varying affine Krylov subspace (VAKS); two minimization techniques exactly determine the m + 1 expansion coefficients of the solution x in the VAKS. The minimal-norm solution can be obtained automatically regardless of whether the linear system is consistent or inconsistent. A new double optimal algorithm (DOA) is created; it is sufficiently time saving by inverting an m × m positive definite matrix at each iteration step, where m min ( n , q ) . The properties of the DOA are investigated and the estimation of residual error is provided. The residual norms are proven to be strictly decreasing in the iterations; hence, the DOA is absolutely convergent. Numerical tests reveal the efficiency of the DOA for solving least-squares problems. The DOA is applicable to least-squares problems regardless of whether q < n or q > n . The Moore–Penrose inverse matrix is also addressed by adopting the DOA; the accuracy and efficiency of the proposed method are proven. The m + 1 -dimensional VAKS is different from the traditional m-dimensional affine Krylov subspace used in the conjugate gradient (CG)-type iterative algorithms CGNR (or CGLS) and CGRE (or Craig method) for solving least-squares problems with q > n . We propose a variant of the Karush–Kuhn–Tucker equation, and then we apply the partial pivoting Gaussian elimination method to solve the variant, which is better than the original Karush–Kuhn–Tucker equation, the CGNR and the CGNE for solving over-determined linear systems. Our main contribution is developing a double-optimization-based iterative algorithm in a varying affine Krylov subspace for effectively and accurately solving least-squares problems, even for a dense and ill-conditioned matrix A with q n or q n .

1. Introduction

One of the most important numerical methods to solve linear systems is the Krylov subspace method [1,2,3,4,5,6,7]; since the appearance of pioneering works in [8,9], it has been studied extensively. A lot of Krylov subspace methods were reported in [10,11,12,13,14,15,16].
Many algorithms have been used to solve the least-squares problem with a minimum norm [17,18]:
min x s . t . x arg min x b A x ,
where · denotes the Euclidean norm, x R n is unknown, and A R q × n and b R q are given.
We are concerned with a varying affine Krylov subspace (VAKS) solution and its corresponding iterative algorithm for solving
A x = b ,
which is an over-determined system when q > n and an under-determined system when q < n .
Most results for Equation (2) are directly extendable to Equation (1), whose solution is called the minimum-norm or pseudo-inverse solution and is given by x = A b , where A denotes the pseudo-inverse of A satisfying the following Penrose equations:
A A A = A ,
A A A = A ,
( A A ) T = A A ,
( A A ) T = A A .
The pseudo-inverse has been investigated and applied to solve linear least-squares problems, and some computational methods have been developed to compute the Moore–Penrose inverse matrix [19,20,21,22,23,24]. In terms of the Moore–Penrose inverse matrix A , the general solution of the least-squares problem is given by
x = A b + ( I R A ) z ,
R A = A A ,
where z is arbitrary. Because A b and ( I R A ) z are orthogonal, it follows that x = A b is the unique solution of Equation (1).
QR factorization is a good method to solve Equation (2). The normalized Gram–Schmidt process is a modification of the original Gram–Schmidt orthogonalization by normalizing each vector after orthogonalization. Let Q denote a q × n matrix obtained by applying the normalized Gram–Schmidt process on the column vectors of A :
Q : = [ q 1 , , q n ] ,
which possesses the following orthonormal property:
Q T Q = I n .
Hence, by Q R x = b , it is easy to solve x by applying the backward substitution method to
R x = Q T b .
However, the QR method is only applicable to an over-determined system with q > n . For an under-determined system with q < n , the QR method is not applicable because R is no longer an upper triangular matrix.
Recently, many scholars have proposed many algorithms to tackle least-squares problems, like relaxed greedy randomized coordinate descent methods [25], the parallel approximate pseudo-inverse matrix technique in conjunction with the explicit preconditioned conjugate gradient least squares scheme [26], the new randomized Gauss–Seidel method [27], the QR–Cholesky method, the implicit QR method, the bidiagonal implicit QR method [28], the randomized block coordinate descent algorithm [29], an iterative pre-conditioning technique [30], splitting-based randomized iterative methods [31], the projection method [32], the index-search-method-based inner–outer iterative algorithm [33], the greedy double subspace coordinate descent method [34], a distributed algorithm [35], and multivalued projections [36].
As just mentioned, there are many algorithms for solving linear least-squares problems. For small- and medium-sized problems, one can use QR-based methods, and, if the problem is rank-deficient or ill conditioned, singular value decomposition (SVD) can be adopted [37,38]. For large-sized least-squares problems, Krylov subspace methods are more efficient than direct methods if the matrix is sparse or otherwise structured. In the present paper, we plan to develop an efficient and accurate Krylov subspace method for large-sized least-squares problems with dense and maybe ill-conditioned matrices, not limited to sparse matrices. It is also interesting that the proposed iterative algorithm can solve least-squares problems with extremal cases of q n or q n . In fact, for under-determined systems, many methods are not applicable, for instance, QR and conjugate gradient (CG)-type methods, for which the basis vectors of the Krylov subspace are orthogonalized implicitly.
In addition to Equation (1), Liu [39] considered the following minimization
min x f = x T A T A x ( b T A x ) 2
for solving ill-posed linear systems of Equation (2) with q = n . Liu [40,41] employed the double optimization technique to iteratively solve inverse problems. Liu [42] replaced the Krylov subspace by the column space of A and derived an effective iterative algorithm based on the maximal projection in Equation (12). These results are quite promising. However, for a non-square system (2), there exists no similar method of double optimization in the literature.
It is well known that the least-squares solution of the minimization in Equation (1) is obtained by
A T A x = A T b .
When the derivative of f in Equation (12) vs. x is equal to zero, we can obtain
A T A x = x T A T A x b T A x A T b .
Taking the transpose of Equation (2) to x T A T = b T and inserting it into Equation (14), we can immediately obtain
A T A x = b T A x b T A x A T b ,
which implies Equation (13) by b T A x / b T A x = 1 . Many numerical methods for least-squares problems are based on Equation (1); no numerical methods for least-squares problems are based on Equation (12).
To solve the least-squares problem (2), the usual approach is the Karush–Kuhn–Tucker equation method, which sets Equation (2) to a square linear system but with a larger dimension q + n . Let
r = b A x
be the residual vector. By considering the enforcing constraint:
A T r = 0 ,
which is equivalent to the normal form of Equation (2), Equations (16) and (17) constitute a q + n -dimensional linear system:
I q A A T 0 r x = b 0 .
The advantage of the so-called Karush–Kuhn–Tucker equation [15] is that one can employ a linear solver to solve Equation (18) to obtain the least-squares solution of Equation (2); however, a disadvantage is that the dimension is enlarged to q + n .
A major novelty of the present paper is that we employ both Equations (1) and (12) to develop a novel numerical method in a varying affine Krylov subspace. The innovative contributions are as follows: this is the first time a new concept of varying affine Krylov subspaces has been introduced, and a double-optimal iterative algorithm has been derived based on the double optimization method to solve least-squares problems. A new variant of the Karush–Kuhn–Tucker equation can significantly improve the accuracy of solving least-squares problems by adopting the partial pivoting Gaussian elimination method.
In Section 2, the solution of Equation (2) is expanded in an m + 1 -dimensional varying affine Krylov subspace (VAKS), whose m + 1 unknown coefficients are optimized in Section 3 by two merit functions. The double optimal algorithm (DOA) is developed in Section 4; we prove a key property of absolute convergence, and the decreasing quantity of residual squares is derived explicitly. Examples of linear least-squares problems are given and compared in Section 5; in addition, Moore–Penrose inverses of non-square matrices are discussed. In Section 6, we discuss a variant of the Karush–Kuhn–Tucker equation for solving over-determined linear systems, which is compared to conjugate-gradient-type iterative methods. Finally, Section 7 describes the conclusions.

2. A Varying Affine Krylov Subspace Method

We consider an m + 1 -dimensional varying affine Krylov subspace (VAKS) by
K m : = span { A T A u 0 , , ( A T A ) m u 0 } , K m : = span { u 0 , K m } ,
where
u 0 : = A T b R n ,
and K m is equivalent to K m + 1 ( A T A , u 0 ) .
Then, we expand the solution x of Equation (2) in K m by
x = α 0 u 0 + k = 1 m α k u k K m ,
where u k : = ( A T A ) k u 0 , k = 1 , , m , and α 0 and α k are constants to be determined. We take m min ( n , q ) .
Let
U : = [ u 1 , , u m ] = [ A T A u 0 , , ( A T A ) m u 0 ]
be an n × m matrix. The Arnoldi process [15] as a normalized Gram–Schmidt process is used to normalize and orthogonalize the Krylov vectors ( A T A ) j u 0 , j = 1 , , m , such that the resultant vectors u i , i = 1 , , m satisfy u i T u j = δ i j , i , j = 1 , , m , where δ i j is the Kronecker delta symbol. Select m, give an initial v 0 = A T A u 0 , and compute u 1 = v 0 v 0 :
Do j = 1 : m 1 , w j = A T A u j , Do i = 1 : j , h i j = u i T w j , w j = w j h i j u i , End   do   of i , h j + 1 , j = w j ; if h j + 1 , j = 0 stop , u j + 1 = w j h j + 1 , j , End   do   of j .
Hence, Equation (21) can be written as
x = α 0 u 0 + U α ,
where α : = ( α 1 , , α m ) T and U satisfies the orthonormal property:
U T U = I m .
For a consistent system, the exact solution x * satisfies A x * = b exactly. For an inconsistent system, the exact solution x * exactly satisfies Equation (1) given by x * = A b , because the equality in A x * = b no longer holds.
On this occasion, we briefly recall the m-dimensional conjugate gradient (CG)-type iterative method with an approximation to Equation (2) with q > n . The CGNR (or CGLS) minimizes the energy function of the error [15,43]:
f ( x ) = ( A T A ( x * x ) , ( x * x ) ) ,
where ( a , b ) denotes the inner product of vectors a and b , and x * is the exact solution of Equation (2). Overall, x in the affine Krylov subspace (AKS) is:
x 0 + K m ( A T A , A T r 0 ) : = x 0 + span { A T r 0 , A T A A T r 0 , , ( A T A ) m 1 A T r 0 } ,
where r 0 = b A x 0 . There is some extension of the CGNR (or CGLS) to bounded perturbation resilient iterative methods [44].
Craig’s idea was to seek the solution of Equation (2) with q > n by
x = arg min z R n z subject   to A z = b .
By introducing x = A T y , we seek y in the AKS by
y 0 + K m ( A A T , r 0 ) ,
to minimize
f ( y ) = ( A A T ( y * y ) , ( y * y ) ) ,
where y * is the exact solution of A A T y = b . The above statement for x * is also valid for y * with A x * = b replaced by A A T y * = b . The resulting CG iterative algorithm is known as the CGNE (or the Craig method), which minimizes the error function [15,43].
Notice that the affine Krylov subspaces in Equations (27) and (29) are fixed AKSs upon setting x 0 and y 0 , while in Equation (21), the AKS is varying, denoted as a VAKS, owing to the translation vector α 0 u 0 not being fixed to u 0 :
K m = K m + 1 ( A T A , A T b ) = span { A T b , A T A A T b , , ( A T A ) m A T b } .
Upon comparing Equation (21), whose coefficient α 0 before u 0 is an unknown constant, to Equations (27) and (29), whose coefficients before x 0 and y 0 are fixed to one, the affine Krylov subspace in Equation (21) is a varying affine Krylov subspace (VAKS). The dimension of the VAKS is one more than that of the AKS.

3. A Double Optimal Solution

In this section, we adopt the new concept of the VKAS to derive a double optimal solution of Equation (2). The starting point is Equation (21), rather than x = u 0 + k = 1 m α k u k u 0 + K m .

3.1. Two Minimized Functions

Let
y : = A x ,
such that the approximation to b in Equation (2) is accounted by the error vector:
e : = b b T y y y y ;
the squared-norm of e reads as
e 2 = b 2 ( b T y ) 2 y 2 0 .
We maximize
max y ( b T y ) 2 y 2
as the maximal projection of b on the direction y / y , such that e 2 is minimized. Equation (35) is equivalent to the minimization of
min y f : = y 2 ( b T y ) 2 1 b 2 .
Let J be
J : = A U R q × m .
Then, with the aid of Equation (24), Equation (32) can be written as
y = y 0 + J α ,
where
y 0 : = α 0 A u 0 R q .
Inserting Equation (38) for y into Equation (36) yields
min α f = y 2 ( b T y ) 2 = y 0 + J α 2 ( b T y 0 + b T J α ) 2 ,
which is used to find α . However, α 0 in y 0 = α 0 A u 0 is still an unknown value to be determined by imposing the minimization of squared-norm of residuals, shortened to the residual square:
min α 0 { r 2 = b A x 2 } .

3.2. Mathematical Preliminaries

To determine the unknowns α j , j = 0 , 1 , , m , the following main theorem is an extension of [39] to a non-square system (2).
Theorem 1. 
Assume that m 0 = r a n k ( A ) min ( n , q ) . U satisfies Equation (25) with m < m 0 . x K m , approximately satisfying Equation (2) derived from two minimizations (40) and (41), is a double optimal solution (DOS), given by
x = α 0 ( u 0 V A u 0 + λ 0 V b ) ,
 where
u 0 = A T b , J = A U , C = J T J , D = C 1 , V = U D J T , P = A V = J D J T ,
λ 0 = A u 0 2 u 0 T A T P A u 0 b T A u 0 b T P A u 0 ,
w = λ 0 P b + A u 0 P A u 0 ,
α 0 = w T b w 2 .
To guarantee the existence of D = C 1 , the dimension m of K m follows m < m 0 .
Proof. 
In view of Equation (38), we can write
b T y = b T y 0 + b T J α ,
y 2 = y 0 2 + 2 y 0 T J α + α T C α
for f in Equation (40), where
C : = J T J
is an m × m positive definite matrix; hence, C 1 exists. For the rank deficiency matrix A , we suppose that its rank is m 0 = rank ( A ) , where m 0 < min ( n , q ) ; to guarantee that C is an m × m positive definite matrix, m must be smaller than m 0 .
The minimality condition of Equation (40) is
( b T y ) 2 α y 2 2 b T y y 2 α ( b T y ) = 0 ;
α is the gradient with respect to α . Cancelling b T y in Equation (49) yields
b T y y 2 2 y 2 y 1 = 0 ,
where
y 1 : = α ( b T y ) = J T b ,
y 2 : = α y 2 = 2 J T y 0 + 2 C α .
From Equation (50), y 2 is observed to be proportional to y 1 in the m-dimensional space R m , which is supposed to be
y 2 = 2 y 2 b T y y 1 = 2 λ y 1 ,
where 2 λ is a proportional factor determined below. The second equality leads to
y 2 = λ b T y .
Then, by Equations (51)–(53), we have
α = λ D J T b D J T y 0 ,
where
D : = C 1 = ( J T J ) 1 > 0
is an m × m matrix. It follows from Equations (46), (47) and (55) that
b T y = b T y 0 + λ b T P b b T P y 0 ,
y 2 = λ 2 b T P b + y 0 2 y 0 T P y 0 ,
where
P : = J D J T
is a q × q positive semi-definite matrix. P is indeed a projection operator.
From Equations (54), (57) and (58), λ is solved from
y 0 2 y 0 T P y 0 = λ ( b T y 0 b T P y 0 )
as follows:
λ = y 0 2 y 0 T P y 0 b T y 0 b T P y 0 .
Inserting this into Equation (55) generates
α = y 0 2 y 0 T P y 0 b T b y 0 b T P y 0 D J T b D J T y 0 .
By inserting the above α into Equation (24) and using Equation (39) for y 0 , we can obtain
x = α 0 ( u 0 V A u 0 + λ 0 V b ) ,
where
V : = U D J T ,
λ 0 : = A u 0 2 u 0 T A T P A u 0 b T A u 0 b T P A u 0 .
Upon letting
v : = u 0 V A u 0 + λ 0 V b ,
  x in Equation (63) can be expressed as
x = α 0 v ,
where α 0 is determined by inserting Equation (67) into Equation (41):
b A x 2 = α 0 A v b 2 = α 0 2 w 2 2 α 0 w T b + b 2 ,
where
w : = A v = A u 0 A V A u 0 + λ 0 A V b .
In view of Equations (37) and (64), P in Equation (59) can be written as
P = A V ,
such that w can be written as
w = λ 0 P b + A u 0 P A u 0 .
This is just w in Equation (44).
Set the differential of Equation (68) vs. α 0 to zero, obtaining
α 0 = w T b w 2 ;
hence, the proof of Equation (42) in Theorem 1 is complete. □
In the proof of Theorem 1, we have used the minimality condition of Equation (40) to determine α in Equation (62), and then the minimality condition of Equation (41) was used to determine α 0 in Equation (71). Two optimizations were used to derive x in Equation (42), which is thus named a double optimal solution (DOS).
Theorem 2. 
The two factors λ 0 in Equation (43) and α 0 in Equation (45) satisfy
α 0 λ 0 = 1 .
Proof. 
It follows from Equations (59) and (56) that
P 2 = P ;
hence, P is a projection operator.
In Equation (44), w can be written as
w = λ 0 P b + ( I q P ) A u 0 .
Utilizing Equation (73), P T = P , and ( I q P ) 2 = I q P , it follows that
w 2 = λ 0 2 b T P b + u 0 T A T A u 0 u 0 T A T P A u 0 ,
b T w = λ 0 b T P b + b T A u 0 b T P A u 0 .
With the aid of Equation (65), Equation (76) is further reduced to
b T w = λ 0 b T P b + 1 λ 0 ( u 0 T A T A u 0 u 0 T A T P A u 0 ) .
Now, after inserting Equation (77) into Equation (71), we can obtain
α 0 λ 0 w 2 = λ 0 2 b T P b + u 0 T A T A u 0 u 0 T A T P A u 0 ,
which is just equal to w 2 by Equation (75). Hence, we obtain Equation (72) readily. The proof of α 0 λ 0 = 1 is complete. □
Remark 1. 
The DOS is derived in the VAKS  K m  in a way to stick onto Equation (21). If one takes  α 0 = 1  in Equation (21),  K m  reduces to the usual affine Krylov subspace  u 0 + K m . However, in this situation, Theorem 1 no longer holds. If one takes  u 0 = 0  and works directly in the  ( m + 1 ) -dimensional Krylov  K m + 1  by expanding  α  in an  ( m + 1 ) -dimensional subspace, it would result in failure, because after inserting  y 0 = A u 0 = 0  into Equation (62),  α  cannot be defined. We can conclude that the DOS directly benefits from the VAKS.
The following corollaries are proven to help us understand the properties of DOS whose bases appeared in Theorems 1 and 2 and equations appeared in the proof of Theorem 1.
Corollary 1. 
The multiplier λ defined by Equation (61) satisfies
λ = 1 ,
 and the double optimal solution (42) can be written as
x = α 0 u 0 α 0 V A u 0 + V b .
Proof. 
Inserting Equation (39) for y 0 into Equation (61), we have
λ = α 0 A u 0 2 u 0 T A T P A u 0 b T A u 0 b T P A u 0 ,
in view of Equation (65), which can be written as
λ = α 0 λ 0 .
Hence, λ = 1 follows immediately from Equation (72). Then, using Equations (72) and (42) proves Equation (80). □
Remark 2. 
Inserting Equation (20) for u 0 into Equation (80) yields
x = [ V + α 0 A T α 0 V A A T ] b ,
 which is a pseudo-inverse solution obtained from the DOS.
Corollary 2. 
If m = n < q and A T A is non-singular, then the double optimal solution x of Equation (2) is the Penrose solution, given by
x = A b ,
 where A is the Moore–Penrose pseudo-inverse of A .
Proof. 
If m = n , then J = A U defined by Equation (37) is a q × n matrix. Meanwhile, U defined by Equation (22) is a rank n orthogonal matrix due to Equation (25):
U T U = U U T = I n .
From Equations (56) and (84), it follows that
D = U T ( A T A ) 1 U .
Inserting J = A U , V = U D J T and Equation (85) into Equation (80), we have
x = α 0 u 0 + U U T ( A T A ) 1 U U T A T b α 0 U U T ( A T A ) 1 U U T A T A u 0 .
By using Equation (84), we can derive
x = ( A T A ) 1 A T b ,
which is a Penrose solution of Equation (2). □
Corollary 2 reveals that the DOS can give the pseudo-inverse solution. SVD can offer a pseudo-inverse solution as follows [37]. In SVD, the matrix A is decomposed by
A = U D V T ,
where U is a q × q orthonormal matrix, V is an n × n orthonormal matrix, and D is a q × n pseudo-diagonal matrix with non-zero singular values on the diagonal and all other elements are zero. Suppose that A T A is non-singular. Then, Equation (87) is a pseudo-inverse solution, which in terms of SVD can be derived as follows:
x = V ( D T D ) 1 D T U T b = V D U T b ;
letting
A = V D U T ,
we can derive Equation (83) in Corollary 2. Mathematically, the DOS is equivalent to the SVD solution in Equation (89), even though they are different approaches to the least-squares problem.

3.3. Estimation of Residual Error

Equation (80) is written as
x = α 0 ( u 0 V A u 0 ) + V b ,
where
α 0 = 1 λ 0 = b T A u 0 b T P A u 0 A u 0 2 u 0 T A T P A u 0 ,
by using Equations (65) and (72).
Let us investigate the residual square:
b A x 2 = y b 2 = y 2 2 b T y + b 2 ,
where
y = A x = α 0 A u 0 α 0 P A u 0 + P b .
is obtained from Equation (91) by using Equation (70).
From Equation (94), it follows that
y 2 = α 0 2 ( u 0 T A T A u 0 u 0 T A T P A u 0 ) + b T P b ,
b T y = α 0 b T A u 0 α 0 b T P A u 0 + b T P b .
In the first equation, we take Equation (73) into account. Inserting them into Equation (93) leads to
b A x 2 = α 0 2 ( u 0 T A T A u 0 u 0 T A T P A u 0 ) + 2 α 0 ( b T P A u 0 b T A u 0 ) + b 2 b T P b .
Finally, inserting Equation (92) for α 0 into the above equation yields
b A x 2 = b 2 b T P b ( b T A u 0 b T P A u 0 ) 2 u 0 T A T A u 0 u 0 T A T P A u 0 .
If P I q , the value of b A x 2 in Equation (98) is made as small as possible:
b A x 2 0 .
Theorem 3. 
For the double optimal solution x K m derived in Equation (42) of the least-squares problem (2), the squared-norm of the error vector e :
e 2 = b 2 ( b T A x ) 2 A x 2 ,
 and the squared-norm of the residual vector  r = b A x  are equal, i.e.,
e 2 = b A x 2 .
 Moreover,
r < b .
Proof. 
Inserting Equation (92) for α 0 into Equations (95) and (96), we can obtain
y 2 = ( b T A u 0 b T P A u 0 ) 2 u 0 T A T A u 0 u 0 T A T P A u 0 + b T P b ,
b T y = ( b T A u 0 b T P A u 0 ) 2 u 0 T A T A u 0 u 0 T A T P A u 0 + b T P b .
Consequently, we have
y 2 = b T y .
It follows from Equations (98) and (103) that
b A x 2 = b 2 y 2 .
By Equations (105) and (34),
e 2 = b 2 y 2 .
Equation (101) was proven upon comparing Equations (106) and (107).
Using r = b A x and Equations (101) and (107) leads to
r 2 = e 2 = b 2 y 2 .
Due to y 2 > 0 , Equation (102) follows readily. □

4. A Numerical Algorithm

In this section, we develop an iterative algorithm to solve the least-squares problem, starting from an initial guess x 0 . We assume that the value of x k at the kth step is already known, and x k + 1 is computed at the next step via the iterative algorithm. According to the value x k , the kth step residual r k = b A x k can be computed.
When the initial guess x 0 is given, an initial residual is written as follows:
r 0 = b A x 0 .
Upon letting
z = x x 0 ,
Equation (2) is equivalent to solving z from
A z = r 0 ,
which is deduced by
A z = A x A x 0 = b A x 0 = r 0 .
For system (111), we seek z in the VAKS by
z = α 0 u 0 + k = 1 m α k u k = α 0 u 0 + U α K m ,
where
u 0 = A T r 0 ,
and u k : = ( A T A ) k u 0 , k = 1 , , m . The constants α k , k = 1 , , m and α 0 are determined by the following two minimizations:
min z f = A z 2 ( r 0 T A z ) 2 ,
min z r = r 0 A z 2 .
After inserting Equation (112) for z , the first minimization can derive α , while the second minimization can derive α 0 .
Since two minimizations in Equations (114) and (115) are adopted to determine the descent vector z in Equation (112), the resulting iterative algorithm to solve the least-square problem might be labeled as a double optimal algorithm (DOA).
To treat the rank deficient least-squares problem, the dimension m of the VAKS must be m < m 0 = r a n k ( A ) , such that rank ( J k ) = m and C k = J k T J k > 0 is an m × m invertible matrix. Consequently, the DOA is depicted by (i), giving A , b , m < m 0 , an initial value x 0 and the convergence criterion ε and (ii) doing k = 0 , 1 , ,
r k = b A x k , u 0 k = A T r k , u j k = ( A T A ) j u 0 k , j = 1 , , m , U k = [ u 1 k , , u m k ] ( By   Arnoldi   procedure ) , J k = A U k , C k = J k T J k , D k = C k 1 , V k = U k D k J k T , P k = A V k , λ k = A u 0 k 2 ( u 0 k ) T A T P k A u 0 k r k T A u 0 k r k T P k A u 0 k , z k = V k r k + 1 λ k ( u 0 k V k A u 0 k ) , x k + 1 = x k + z k ,
until the convergence with x k + 1 x k < ε or r k + 1 < ε . We call x k + 1 x k the relative residual.
In the above, U k is n × m matrix, J k is a q × m matrix, C k and D k are m × m matrices, V k is an n × q matrix and P k is a q × q matrix. The computational cost is expanded to compute D k , which is however quite time-saving because m is a small number. A T A is an n × n fixed matrix computed once and used in the construction of U k , which requires m n operations. J k requires m n q operations; C k requires m 2 q operations; D k requires m 2 q operations; V k requires n m 2 q operations; and P k requires n q 2 operations. In each iteration, there are m n + m n q + 2 m 2 q + n m 2 q + n q 2 operations. Denote the number of iterations by NI. The computational complexity is in total NI × [ m n + m n q + 2 m 2 q + n m 2 q + n q 2 ] .
It is known that m is a key parameter in the Krylov subspace. A proper choice of m can significantly enhance the convergence speed and accuracy. For ill-posed linear least-squares problems, there exists the best value of m, but for well-posed linear least-squares problems, small values of m may slow down convergence. When m is increased, both the convergence speed and accuracy are increased. However, when m is increased, more computational power is required in the construction of the projection operator P and the inversion matrix C 1 .
We make some comments about the initial value of x 0 . For under-determined systems, there are many solutions; hence, different choices of x 0 would generate different solutions. In general, as required in Equation (1), the minimal-norm solution can be obtained by setting x 0 = 0 , whose norm is zero x 0 = 0 . For over-determined systems, we take x 0 = 0 such that the initial residual is a non zero vector r 0 = b and the DOA based on the residual r k is solvable. Indeed, the DOA is not sensitive to the initial value of x 0 ; we will take x 0 = 0 for most problems unless otherwise specified.
The following corollaries help the clarification of the DOA and are based on the DOS used in the residual Equation (111).
Corollary 3. 
r k + 1 is orthogonal to A z k in Equation (111) and r k + 1 r k , i.e.,
r k + 1 T ( A z k ) = 0 ,
r k + 1 T ( r k + 1 r k ) = 0 .
Proof. 
By Equation (111), the next r k + 1 is given by
r k + 1 = r k y k ,
where y k = A z k . By Equation (116), we can also derive Equation (119) as follows:
b A x k + 1 = b A x k A z k .
For Equation (111), Equation (105) is written as
r k T y k y k 2 = 0 ;
and taking the y k inner product to Equation (119), we have
r k + 1 T y k = r k + 1 T ( A z k ) = r k T y k y k 2 = 0 .
Inserting A z k = r k r k + 1 into Equation (117), we can prove Equation (118). The DOA is a good approximation of Equation (2) with a better descent direction z k in the VAKS. □
Corollary 4. 
For the DOA, the convergence rate is
r k r k + 1 = 1 sin θ > 1 , θ π 2 .
Proof. 
It follows from Equations (119) and (120) that
r k + 1 2 = r k 2 r k y k cos θ ,
where θ is the intersection angle between r k and y k = A z k . With help from Equation (120), we have
cos θ = y k r k .
Obviously, θ π / 2 because y k > 0 . Then, Equation (123) can be further reduced to
r k + 1 2 = r k 2 [ 1 cos 2 θ ] = r k 2 sin 2 θ .
Dividing both sides by r k + 1 2 and taking the square-roots of both sides, we can obtain Equation (122). □
Corollary 5. 
The residual is decreased step by step,
r k + 1 2 = r k 2 A z k 2 r k + 1 < r k ,
Proof. 
Taking the squared norms of Equation (119) and using Equation (120) generates
r k + 1 2 = r k 2 y k 2 ;
and after inserting y k = A z k , Equation (126) is proven. Corollary 5 is easily deduced by noting A z k 2 > 0 . □
The property in Equation (126) is very important, which guarantees that the DOA is absolutely convergent at each iteration.
Remark 3. 
At the very beginning, the DOS and DOA are developed in the VAKS K m similar to those in Equation (21). If one takes α 0 = 1 in Equation (21), K m reduces to the usual affine Krylov subspace u 0 + K m . However, in this situation, several good properties similar to those in Theorem 3 and Corollaries 3–5 will be lost; convergence cannot be guaranteed in the resulting iterative algorithm. On the other hand, if we take u 0 = 0 and directly work in the ( m + 1 ) -dimensional Krylov subspace K m + 1 , by extending α to an ( m + 1 ) -dimensional vector, this would result in failure because after inserting y 0 = A u 0 = 0 into Equation (62), α cannot be defined.
Corollary 6. 
In the iterative algorithm (116), we may encounter a slowly varying point:
x k + 1 x k ,
 when  x 0  is given such that  r 0 = b A x 0  satisfies  A T A r 0 = c 0 r 0  for some  c 0 1 .
Proof. 
Inserting b = r 0 and
u 0 = A T r 0
into Equation (92) yields
α 0 = 1 c 0 .
when c 0 1 , α 0 0 , and x k + 1 x k by Equation (116). □

5. Numerical Examples

To demonstrate the efficiency and accuracy of the present iterative DOA, several examples are presented. All the numerical computations were carried out in Microsoft Developer Studio with an Intel Core I7-3770, CPU 2.80 GHz and 8 GB memory. The precision is 10 16 .

5.1. Example 1

In this example, we find a minimal-norm and least-squares solution of the following consistent equation:
1 2 3 1 3 2 1 1 2 3 1 1 x 1 x 2 x 3 x 4 = 1 1 1 .
Although Example 1 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for a consistent system.
First we apply the DOS with m = 2 to solve this problem, where we can find a quite accurate solution of x 1 = 1.48176 × 10 1 , x 2 = 1.92567 × 10 1 , x 3 = 1.48141 × 10 1 , and x 4 = 2.22689 × 10 2 , for which the three residual errors are r 1 = 2.012 × 10 6 , r 2 = 7.125 × 10 5 and r 3 = 7.51756 × 10 5 , respectively.
Although the DOS is an accurate solution with an error on the order of 10 5 , the accuracy can be improved by the DOA. In the application of the iterative DOA, initial guesses are given by x 1 0 = x 2 0 = x 3 0 = x 4 0 = c 0 for some constant. Then, the DOA with m = 1 solves this problem with c 0 = 0 and ε = 10 12 . As shown in Figure 1a, the DOA converges very fast in only seven steps. Furthermore, we can find very accurate solutions of x 1 = 1.48148 × 10 1 , x 2 = 1.925926 × 10 1 , x 3 = 1.48148 × 10 1 , and x 4 = 2.22222 × 10 2 ; the norm of x is x = 0.28545 . The three residual errors are r 1 = 5.63993 × 10 14 , r 2 = 5.61773 × 10 14 and r 3 = 5.63993 × 10 14 , respectively.
On the other hand, if we take c 0 = 1 , r 1 = 5.6 × 10 13 , r 2 = 5.59 × 10 13 and r 3 = 8.47 × 10 13 are obtained, which show that the solution ( x 1 , x 2 , x 3 , x 4 ) = ( 0.4815 , 0.2741 , 0.4815 , 0.3778 ) is accurate. However, the norm x = 0.82552 is not the minimal one. Therefore, the correct one for the minimal-norm solution is c 0 = 0 . Unless otherwise specified, we will take the initial guess by x 0 = 0 for most problems computed below.

5.2. Example 2

A minimal norm and least-squares solution is sought for of the following inconsistent equation:
1 1 0 1 0 1 1 0 0 1 1 1 x 1 x 2 x 3 = 0 0 1 2 .
Although Example 2 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for an inconsistent system.
We first apply the DOS with m = 1 , where we obtain a very accurate solution with x 1 = 1.25 , x 2 = 1.5 , and x 3 = 1.5 . The residuals are ( r 1 , r 2 , r 3 , r 4 ) = ( 0.25 , 0.25 , 0.25 , 0.25 ) with the residual norm being 0.5 .
Then, the DOA with m = 1 and ε = 10 12 solves this problem. Starting from the initial guesses x 1 0 = x 2 0 = x 3 0 = 0 , as shown in Figure 1b, the DOA converges very fast in only two steps, where we obtain a very accurate solution with x 1 = 1.25 , x 2 = 1.5 , and x 3 = 1.5 . The residuals are ( r 1 , r 2 , r 3 , r 4 ) = ( 0.25 , 0.25 , 0.25 , 0.25 ) , with the residual norm being 0.5 . For the inconsistent system, the equality in Equation (130) no longer holds, such that the residual is not a zero vector.
The Moore–Penrose inverse of the coefficient matrix A in Equation (130) has an exact solution:
A = 1 4 1 4 3 4 1 4 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 .
Then, we find that the exact solution of Equation (130) is given by x = A b , as just ( x 1 , x 2 , x 3 ) = ( 1.25 , 1.5 , 1.5 ) mentioned above.
We apply the iterative method developed by Petkovic and Stanimirovic [20] to find the Moore–Penrose inverse, which has a simpler form:
X k + 1 = ( 1 + β ) X k β X k A X k , X 0 = β A T ,
based on the Penrose Equations (4) and (6). Under the same convergence criterion, this iteration process is convergent in 130 steps, as shown in Figure 1b by a dashed line.
Note that the DOA can be also used to find the Moore–Penrose inverse A by taking the right-hand side b to be a zero vector and the k-th element to be one, of which the corresponding solution of x denoted by x k constitutes the kth column of A . Then, we can run the DOA q times from k = 1 , , q to obtain
A = x 1 1 x 1 q x 2 1 x 2 q x n 1 x n q .
The dimension of A is n × q . By applying the DOA with m = 1 under the same convergence criterion ε = 10 12 , we can find a very accurate solution of A , as shown in Equation (131) with the maximum error of all elements being 10 14 , which is achieved in a total of 26 steps and converges faster than the method developed by Petkovic and Stanimirovic [20].

5.3. Example 3

We consider an over-determined Hilbert linear problem:
A i j = 1 i + j 1 , 1 i q , 1 j n , b i = j = 1 n 1 i + j 1 x j , 1 i q ,
where we fix q = 6 and n = 5 , and the exact solutions x j = 1 / j , 1 j n are given.
The DOA with m = 4 and ε = 10 13 can solve this problem very fast, starting from the initial guesses x 1 0 = x 2 0 = x 3 0 = x 4 0 = x 5 0 = 0 , in only four steps, as shown in Figure 2a. Furthermore, we can find a very accurate solution, as shown in Figure 2b with the maximum error (ME) being 8.91 × 10 12 . If we consider an under-determined system with q = 5 and n = 6 , the DOA with m = 3 is still workable with ME = 2.22 × 10 8 .
For least-squares problems, QR is a famous method to directly solve Equation (2). This process is shown in Equations (9)–(11). However, we find that ME = 1.05 × 10 6 obtained by QR is less accurate than that obtained via the DOA in the Hilbert least-squares problem with q = 6 and n = 5 . We have checked the accuracy of the orthogonality in Equation (10) by
Q T Q I n ,
which is on the order of 8.65 × 10 12 .
The Hilbert matrix is known as a notorious example of a highly ill-conditioned matrix. For this problem, the condition number is 2.5 × 10 5 . Then, in the floating point, b has a rounding error on the the order of 10 16 . The algorithm introduces further floating point errors which are of the same order of magnitude. Perturbation theory for the least-squares problem [38] shows that one can only expect to find a solution that is approximately within 2 × 10 ( 5 16 ) = 2 × 10 11 of the exact solution. The error Q T Q I n = 8.65 × 10 12 is within this range; however, the ME = 1.05 × 10 6 obtained by QR is not within this range. The reason for this is that when we use Equation (11) to find the solution in the QR method, very small singular values appearing in the denominator will enlarge the rounding error and the error from Q ; for example, the last two singular values 4.803 × 10 4 and 1.734 × 10 5 , upon dividing by those two small values sequentially, lead to ME = 1.05 × 10 6 , which is not within the range of 2 × 10 11 . In contrast, the proposed DOA method with an error of 8.91 × 10 12 is within this range. For highly ill-conditioned least-squares problems, the QR method loses some accuracy, and even Q well satisfies Equation (10) within the range of 2 × 10 11 .
We raise q to 10, and Table 1 lists the computed ME for different values of n. We also list the error Q T Q I n for the QR method, which is worse when n is increased. Based on Equations (88)–(90), this problem is solved by SVD. The DOA is competitive with SVD, and for all cases, it is slightly more accurate than SVD.
To investigate the influence of m on the DOA, Table 2 compares the ME and the iteration number (IN). For ill-posed Hilbert problems, there exists the best value m = 3 .

5.4. Example 4

We find the Moore–Penrose inverse of the following rank deficient matrix [45]:
A = 1 0 1 2 1 1 0 1 0 1 1 3 0 1 1 3 1 1 0 1 1 0 1 2 .
We apply the iterative method of the DOA as demonstrated in Section 5.2 to find the Moore–Penrose inverse, which has an exact solution:
A = 1 102 15 18 3 3 18 15 8 13 5 5 13 8 7 5 2 2 5 7 6 3 9 9 3 6 .
Under the same convergence criterion ε = 10 9 as used by Xia et al. [45], the iteration process of the DOA with m = 1 converges very fast in a total of 12 steps, as shown in Figure 3. In Table 3, we compare the DOA with other numerical methods specified by Xia et al. [45] and Petkovic and Stanimirovic [20] to asses the performance of the DOA measured by four numerical errors of the Penrose Equations (3)–(6) and the iteration number (IN).
In the above, Alg. is the shorthand of algorithm, IP is the shorthand of initial point and IN is the shorthand of “iteration number”. 1 = ( 1 , , 1 ) T . The first two algorithms were reported by Xia et al. [45]. It can be seen that the DOA converges much faster and is more accurate than the other algorithms.
In the computation of the pseudo-inverse, the last two singular values close to zero can be avoided to appear in the matrix C R m × m if we take m small enough. For example, we take m = 1 in Table 3, such that D = C 1 can be computed accurately without inducing a zero singular value to appear in the denominator to increase the rounding error.

5.5. Example 5

In this example, we find the Moore–Penrose inverse of the Hilbert matrix in Equation (134), which is more difficult than the previous example. Here, we fix q = 3 , n = 50 , and q = 50 , n = 3 . The numerical errors of the Penrose Equations (3)–(6) are compared in Table 4 and Table 5, respectively.

5.6. Example 6

In this example, we consider a non-harmonic boundary value problem on an amoeba-like boundary:
Δ u = u x x + u y y = 0 , ( x , y ) Ω ,
u ( x , y ) = f ( x , y ) = x 2 y 3 , ( x , y ) Γ , Γ : = { ρ ( θ ) cos θ , ρ ( θ ) sin θ , 0 θ 2 π } ,
ρ = exp ( sin θ ) sin 2 ( 2 θ ) + exp ( cos θ ) cos 2 ( 2 θ ) ,
where Δ f ( x , y ) = 2 y 3 + 6 x 2 y 0 , and the contour is displayed in Figure 4a by a solid black line. Feng et al. [46] have pointed out that “non-harmonic boundary data” mean that the solution does not have a harmonic extension to the whole plane and a solution is very difficult to achieve.
Let
v ( x , y ) = u ( x , y ) f ( x , y )
be a new variable, and then we come to a Poisson equation under a homogeneous boundary condition:
Poisson   equation : Δ v = F ( x , y ) = Δ f ( x , y ) , v ( x , y ) | ( x , y ) Γ = 0 ,
where F ( x , y ) 0 , because f ( x , y ) is a non-harmonic boundary function. When v ( x , y ) is solved, we can find u ( x , y ) = v ( x , y ) + f ( x , y ) .
In order to obtain an accurate solution of v ( x , y ) , we use the multiple-scale Pascal triangle polynomial expansion method developed by Liu and Kuo [47]:
v ( x , y ) = i = 1 m 0 j = 1 i c i j s i j x R 0 i j y R 0 j 1 .
After collocating q points to satisfy the governing equation and boundary condition (141), we have a non-square linear system (2), where the scales s i j are determined such that each column of the coefficient matrix A has the same norm.
By using the DOA, we take m = 4 , q = 200 , m 0 = 5 , R 0 = 1000 and c i j 0 = 0 . Hence, the dimension of A is 200 × 15 , for which system (2) is an over-determined system. The distribution of collocation points is shown in Figure 4a, while the convergence rate is shown in Figure 4b. It is amazing that the DOA converges with nine steps even under a stringent convergence criterion ε = 10 10 . In Figure 5a, we compare the recovered boundary function with the exact function at 1000 points along the boundary. They are almost coincident, and thus we plot the absolute error in Figure 5b, whose ME = 2.95 × 10 12 and the root mean square error (RMSE) = 1.2 × 10 12 . This result is much better than that computed by Feng et al. [46].

5.7. Example 7

The method of fundamental solutions (MFS) is adopted to solve Equation (137) by using
u ( z ) = j = 1 n c j ln z s j , z Ω , s j Ω ¯ c ,
where Ω ¯ c is the complementary set of Ω ¯ , and
s j = ( R cos θ j , R sin θ j ) T , θ j = 2 j π n
are source points, in which R > 0 is a parameter of the circle on which are the source points placed on.
We consider an exact solution u ( x , y ) = x 2 y 2 and specify the Dirichlet boundary condition on the boundary given by Equation (139). In Equation (144), we fix R = 5 . By using the DOA, we take m = 11 and c j 0 = 0 .
Table 6 lists the computed ME for different values of ( n , q ) obtained by the DOA. It can be seen that the DOA is applicable to least-squares problems regardless of whether q < n or q > n .
When we apply the QR to solve this problem with n = 100 and q = 150 , ME = 42.81 is obtained. When we take n = 150 and q = 400 , ME = 2.11 is obtained. Obviously, the QR is not applicable to this problem.

5.8. Example 8

In this example, we consider Equation (2) with a cyclic matrix, whose first row is given by ( 1 , , n ) . The coefficient matrix can be generated by the following procedure:
Do i = 1 : n , Do j = 1 : n , If i = 1 , then B i , j = j ; otherwise , B i , j = B i 1 , j + 1 , If B i , j > n , then B i , j = B i , j n , Enddo .
The exact solution is x i = 1 , i = 1 , , n . The non-square matrix A is obtained by taking the first q rows from B .
Since the right-hand side vector b is obtained from Equation (2) after inserting A and x = 1 = ( 1 , , 1 ) T , the non-square system is consistent. Therefore, we can compute the maximal error (ME) by
ME : = max i = 1 , , n | x i e x i n | ,
where x i e = 1 , i = 1 , , n are the exact solutions and x i n are numerical solutions.
For a large matrix of A with n = 2000 and q = 100 , Table 7 lists the computed ME for different values of m and the INs under ε = 10 5 with the initial guesses x i 0 = 1 + 0.1 i , i = 1 , , n . If we take the initial guesses to be x i 0 = 0 , i = 1 , , n , the accuracy is poor with ME = 0.961 and the convergence is very slow. As shown in Corollary 6, for the zero initial value, we have r 0 = b and for the cyclic matrix, we have r 0 = b = c 1 , where c 1 = j = 1 n j . Thus, A T A r 0 = c 0 2 c 2 r 0 , where c 2 = j = 1 q j . Hence, c 0 = 2001000 2 × 5050 is a huge value, which causes the failure of the DOA with a zero initial value for this problem. The same situation happens for the constant initial guesses x i 0 = c , i = 1 , , n . It is apparent that the DOA is quite accurate and converges faster. It is interesting that a small m is enough to achieve accurate solutions; when m is smaller, the accuracy is better, but it converges slower. In contrast, when m is larger, the accuracy is worse but it converges faster.
Next, we construct a large-sized matrix B from Equation (145) with n replaced by n q . The non-square matrix A is obtained by taking the first n columns from B . With n = 500 and q = 2000 , Table 8 lists the computed ME for different values of m, and IN under ε = 10 5 with the initial guesses x i 0 = 1 + 0.1 i , i = 1 , , n . It is apparent that the DOA is quite accurate and converges faster. When m is increased, the accuracy is better and convergence is faster.
As mentioned, the QR is not applicable to the least-squares problem with q < n ; however, for q > n , the QR is applicable. Table 9 lists the computed ME for different values of ( q , n ) . In the DOA, we take m = 30 , ε = 10 12 , and x i 0 = 1 + 0.1 i , i = 1 , , n . The DOA converges within 25 iterations for the first four cases, and 79 iterations for the last case. The DOA can improve the accuracy by about three and four orders compared to the QR. We also list the error Q T Q I n for the QR method, which is very accurate.
If we take α 0 = 1 in Equation (21), K m reduces to the usual affine Krylov subspace u 0 + K m . Equation (80) becomes
x = u 0 + λ 0 V b V A u 0 .
As shown in Table 10, the DOA is more accurate than the DOA with α 0 = 1 .

6. A Variant of the Karush–Kuhn–Tucker Equation

In many meshless collocation methods to solve linear partial differential equations, we may collocate more q points than the number of coefficients n in the expansion of the solution, which results in an over-determined system in Equation (2) with q > n . The residual vector r = b A x together with Equation (2) can be written as
I q A A T 0 r x = b 0 .
This is called the Karush–Kuhn–Tucker equation [15].
Instead of Equation (146), we may consider a permutation of the Karush–Kuhn–Tucker equation:
A I q 0 A T x r = b 0 ,
which is a ( q + n ) × ( q + n ) system. Equation (147) is a variant of the Karush–Kuhn–Tucker Equation (146) by re-ordering the unknown vector from ( r , x ) to ( x , r ) . When the dimension q + n is a moderate value, we can apply the partial pivoting Gaussian elimination method (GEM) to solve Equations (146) and (147). We name the first partial pivoting Gaussian elimination method on Equation (146) as GEM1, and on Equation (147) as GEM2. It would be convincing if medium-size least-squares problems shown below the partial pivoting Gaussian elimination method can be an effective and accurate method. Moreover, the variant of the Karush–Kuhn–Tucker equation is more accurate than the original Karush–Kuhn–Tucker equation, even they have the same condition numbers.

6.1. Example 9

We consider Example 7 again, which is now solved by the above four methods. Table 11 lists the computed ME for different values of ( n , q ) obtained by GEM1 and GEM2. Here, CGNR is the conjugate gradient method applied to the normal equation with minimal residuals, while CGNE is the conjugate gradient method applied to the normal equation with minimal errors [15]. CGNR is also known as the CGLS (least-squares), and CGNE is Craig’s method. In this test, we find that when the partial pivoting Gaussian elimination method is applied to Equation (147) as GEM2, the accuracy is the best.

6.2. Example 10

We consider Example 3 again, which is now solved by the above four methods, the QR method and the DOA with ε = 10 14 . Based on Equations (88)–(90), this problem is solved via SVD. As shown in Table 12, the GEM2, DOA and SVD are competitive and are more accurate than other methods.
We also list the errors Q T Q I n for the QR method in Table 12, which is very accurate. However, since quite small singular values are present for the highly ill-conditioned Hilbert matrix, the QR method gradually deteriorates when the condition number is increased from 7.04 × 10 4 to 2.1 × 10 7 .

7. Conclusions

Least-squares problems arise in a lot of applications, and many iterative algorithms are already available to seek their solutions. In this paper, a new concept of a varying affine Krylov subspace was introduced, which is different to the fixed-type affine Krylov subspace used in the CG-type numerical solution of over-determined least-squares problems. In the m + 1 -dimensional varying affine Krylov subspace, a closed-form double optimal solution in a simple form in Equation (42) was derived and further reduced to Equation (80), which was obtained by two minimizations in Equations (40) and (41). We analyzed a key equation (72) to link these two optimizations together. The double optimal solution is an excellent approximation, as verified for least-squares problems. The iterative DOA was developed, which leads to step-by-step absolute convergence. In each iterative step, by merely inverting an m × m matrix with a small value of m, the computational cost of the DOA is very low, lower than other methods. The Moore–Penrose inverses of non-square matrices were also derived by using the varying affine Krylov subspace method. For over-determined least-squares problems, we proposed a variant of the Karush–Kuhn–Tucker equation, which uses the partial pivoting Gaussian elimination method and is better than the original Karush–Kuhn–Tucker equation and CG-type numerical methods of CGNR (CGLS) and CGNE (Craig’s method). It is very important that the DOA can be applied to both rectangular systems with q > n or q < n .
The novelties involved in this paper are as follows:
  • A double-optimization-based iterative algorithm for least-squares problems was developed in a varying affine Krylov subspace.
  • For dense large-size least-squares problems, the proposed iterative DOA is efficient and accurate.
  • The DOA can be applied to both rectangular systems with extremal cases of q n or q n .
  • A variant of the Karush–Kuhn–Tucker equation was presented, which is an improved version of the original Karush–Kuhn–Tucker equation.
The idea of varying the affine Krylov subspace is useful to find a better solution of linear algebraic equations based on the mechanism of double optimization. In the future, we may extend the new methods of DOSs and the DOA to other linear matrix equations. The limitations are that there are several matrix multiplications needed to construct the projection operator P in the Krylov subspace and the high computational cost for inversion of the matrix C 1 with dimension m.

Author Contributions

Conceptualization, C.-S.L. and C.-W.C.; Methodology, C.-S.L. and C.-W.C.; Software, C.-S.L., C.-W.C. and C.-L.K.; Validation, C.-S.L., C.-W.C. and C.-L.K.; Formal analysis, C.-S.L. and C.-W.C.; Investigation, C.-S.L., C.-W.C. and C.-L.K.; Resources, C.-S.L. and C.-W.C.; Data curation, C.-S.L., C.-W.C. and C.-L.K.; Writing—original draft, C.-S.L.; Writing—review & editing, C.-W.C.; Visualization, C.-S.L., C.-W.C. and C.-L.K.; Supervision, C.-S.L. and C.-W.C.; Project administration, C.-W.C.; Funding acquisition, C.-W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

The authors would like to express their thanks to the reviewers, who supplied valuable opinions to improve the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, C.S. An optimal multi-vector iterative algorithm in a Krylov subspace for solving the ill-posed linear inverse problems. Comput. Mater. Contin. 2013, 33, 175–198. [Google Scholar]
  2. Dongarra, J.; Sullivan, F. Guest editors’ introduction to the top 10 algorithms. Comput. Sci. Eng. 2000, 2, 22–23. [Google Scholar] [CrossRef]
  3. Simoncini, V.; Szyld, D.B. Recent computational developments in Krylov subspace methods for linear systems. Numer. Linear Alg. Appl. 2007, 14, 1–59. [Google Scholar] [CrossRef]
  4. Saad, Y.; Schultz, M.H. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1986, 7, 856–869. [Google Scholar] [CrossRef]
  5. Saad, Y. Krylov subspace methods for solving large unsymmetric linear systems. Math. Comput. 1981, 37, 105–126. [Google Scholar] [CrossRef]
  6. Freund, R.W.; Nachtigal, N.M. QMR: A quasi-minimal residual method for non-Hermitian linear systems. Numer. Math. 1991, 60, 315–339. [Google Scholar] [CrossRef]
  7. van Den Eshof, J.; Sleijpen, G.L.G. Inexact Krylov subspace methods for linear systems. SIAM J. Matrix Ana. Appl. 2004, 26, 125–153. [Google Scholar] [CrossRef]
  8. Hestenes, M.R.; Stiefel, E.L. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  9. Lanczos, C. Solution of systems of linear equations by minimized iterations. J. Res. Nat. Bur. Stand. 1952, 49, 33–53. [Google Scholar] [CrossRef]
  10. Paige, C.C.; Saunders, M.A. Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal. 1975, 12, 617–629. [Google Scholar] [CrossRef]
  11. Fletcher, R. Conjugate Gradient Methods for Indefinite Systems; Lecture Notes in Math; Springer: Berlin/Heidelberg, Germany, 1976; Volume 506, pp. 73–89. [Google Scholar]
  12. Sonneveld, P. CGS: A fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1989, 10, 36–52. [Google Scholar] [CrossRef]
  13. van der Vorst, H.A. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1992, 13, 631–644. [Google Scholar] [CrossRef]
  14. Saad, Y.; van der Vorst, H.A. Iterative solution of linear systems in the 20th century. J. Comput. Appl. Math. 2000, 123, 1–33. [Google Scholar] [CrossRef]
  15. Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
  16. van der Vorst, H.A. Iterative Krylov Methods for Large Linear Systems; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
  17. Golub, G. Numerical methods for solving linear least squares problems. Numer. Math. 1965, 7, 206–216. [Google Scholar] [CrossRef]
  18. Choi, S.C.; Paige, C.C.; Saunders, M.A. MINRES-QLP: A Krylov subspace method for indefinite or singular symmetric systems. SIAM J. Sci. Comput. 2011, 33, 1810–1836. [Google Scholar] [CrossRef]
  19. Petkovic, M.D.; Stanimirovic, P.S. Iterative method for computing Moore-Penrose inverse based on Penrose equations. J. Comput. Appl. Math. 2011, 235, 1604–1613. [Google Scholar] [CrossRef]
  20. Petkovic, M.D.; Stanimirovic, P.S. Two improvements of the iterative method for computing Moore-Penrose inverse based on Penrose equations. J. Comput. Appl. Math. 2014, 267, 61–71. [Google Scholar] [CrossRef]
  21. Katsikis, V.N.; Pappas, D.; Petralias, A. An improved method for the computation of the Moore-Penrose inverse matrix. Appl. Math. Comput. 2011, 217, 9828–9834. [Google Scholar] [CrossRef]
  22. Stanimirovic, I.; Tasic, M. Computation of generalized inverse by using the LDL* decomposition. Appl. Math. Lett. 2012, 25, 526–531. [Google Scholar] [CrossRef]
  23. Sheng, X.; Wang, T. An iterative method to compute Moore-Penrose inverse based on gradient maximal convergence rate. Filomat 2013, 27, 1269–1276. [Google Scholar] [CrossRef]
  24. Toutounian, F.; Ataei, A. A new method for computing Moore-Penrose inverse matrices. J. Comput. Appl. Math. 2009, 228, 412–417. [Google Scholar] [CrossRef]
  25. Zhang, J.; Guo, J. On relaxed greedy randomized coordinate descent methods for solving large linear least-squares problems. Appl. Numer. Math. 2020, 157, 372–384. [Google Scholar] [CrossRef]
  26. Lipitakis, A.-D.; Filelis-Papadopoulos, C.K.; Gravvanis, G.A.; Anagnostopoulos, D. A note on parallel approximate pseudoinverse matrix techniques for solving linear least squares problems. J. Comput. Sci. 2020, 41, 101092. [Google Scholar] [CrossRef]
  27. Niu, Y.-Q.; Zhang, B. A new randomized Gauss–Seidel method for solving linear least-squares problems. Appl. Math. Lett. 2021, 116, 107057. [Google Scholar] [CrossRef]
  28. Bojanczyk, A.W. Algorithms for indefinite linear least squares problems. Linear Algebra Appli. 2021, 623, 104–127. [Google Scholar] [CrossRef]
  29. Du, K.; Ruan, C.-C.; Sun, X.-H. On the convergence of a randomized block coordinate descent algorithm for a matrix least squares problem. Appl. Math. Lett. 2022, 124, 107689. [Google Scholar] [CrossRef]
  30. Chakrabarti, K.; Gupta, N.; Chopra, N. Iterative pre-conditioning for expediting the distributed gradient-descent method: The case of linear least-squares problem. Automatica 2022, 137, 110095. [Google Scholar] [CrossRef]
  31. Zhang, Y.; Li, H. Splitting-based randomized iterative methods for solving indefinite least squares problem. Appl. Math. Comput. 2023, 446, 127892. [Google Scholar] [CrossRef]
  32. Pes, F.; Rodriguez, G. A projection method for general form linear least-squares problems. Appl. Math. Lett. 2023, 145, 108780. [Google Scholar] [CrossRef]
  33. Kuo, Y.-C.; Liu, C.-S. An index search method based inner-outer iterative algorithm for solving nonnegative least squares problems. J. Comput. Appl. Math. 2023, 424, 114954. [Google Scholar] [CrossRef]
  34. Jin, L.-L.; Li, H.B. Greedy double subspaces coordinate descent method for solving linear least-squares problems. J. Comput. Sci. 2023, 70, 102029. [Google Scholar] [CrossRef]
  35. Jahvani, M.; Guay, M. Solving least-squares problems in directed networks: A distributed approach. Comput. Chem. Eng. 2024, 185, 108654. [Google Scholar] [CrossRef]
  36. Laura Arias, M.; Contino, M.; Maestripieri, A.; Marcantognini, S. Matrix representations of multivalued projections and least squares problems. J. Math. Anal. Appl. 2024, 530, 127631. [Google Scholar] [CrossRef]
  37. Golub, G.H.; Reinsch, C. Singular value decomposition and least squares solutions. Numer. Math. 1970, 14, 403–420. [Google Scholar] [CrossRef]
  38. Björck, A. Numerical Methods for Least Squares Problems; SIAM Publisher: Philadelphia, PA, USA, 1996. [Google Scholar]
  39. Liu, C.S. A doubly optimized solution of linear equations system expressed in an affine Krylov subspace. J. Comput. Appl. Math. 2014, 260, 375–394. [Google Scholar] [CrossRef]
  40. Liu, C.S. Optimal algorithms in a Krylov subspace for solving linear inverse problems by MFS. Eng. Anal. Bound. Elem. 2014, 44, 64–75. [Google Scholar] [CrossRef]
  41. Liu, C.S. A double optimal descent algorithm for iteratively solving ill-posed linear inverse problems. Inv. Prob. Sci. Eng. 2015, 23, 38–66. [Google Scholar] [CrossRef]
  42. Liu, C.S. A maximal projection solution of ill-posed linear system in a column subspace, better than the least squares solution. Comput. Math. Appl. 2014, 67, 1998–2014. [Google Scholar] [CrossRef]
  43. Papez, J.; Tichy, P. Estimating error norms in CG-like algorithms for least-squares and least-norm problems. Numer. Algorithms 2024, 1–28. [Google Scholar] [CrossRef]
  44. Abbasi, M.; Nikazad, T. Bounded perturbations resilient iterative methods for linear systems and least squares problems: Operator-based approaches, analysis, and performance evaluation. BIT Numer. Math. 2024, 64, 15. [Google Scholar] [CrossRef]
  45. Xia, Y.; Chen, T.; Shan, J. A novel iterative method for computing generalized inverse. Neural Comput. 2014, 26, 449–465. [Google Scholar] [CrossRef] [PubMed]
  46. Feng, G.; Li, M.; Chen, C.S. On the ill-conditioning of the MFS for irregular boundary data with sufficient regularity. Eng. Anal. Bound. Elem. 2014, 41, 98–102. [Google Scholar] [CrossRef]
  47. Liu, C.S.; Kuo, C.L. A multiple-scale Pascal polynomial triangle solving elliptic equations and inverse Cauchy problems. Eng. Anal. Bound. Elem. 2016, 62, 35–43. [Google Scholar] [CrossRef]
Figure 1. Examples 1 and 2, showing the relative residuals with respect to the number of iterations; the latter is compared with [20].
Figure 1. Examples 1 and 2, showing the relative residuals with respect to the number of iterations; the latter is compared with [20].
Algorithms 17 00211 g001
Figure 2. An over-determined Hilbert linear problem solved by the DOA: (a) relative residual, and (b) numerical error.
Figure 2. An over-determined Hilbert linear problem solved by the DOA: (a) relative residual, and (b) numerical error.
Algorithms 17 00211 g002
Figure 3. A computation of the Moore–Penrose inverse by the DOA, showing the relative residuals in the computation of each column of the Moore–Penrose inverse matrix.
Figure 3. A computation of the Moore–Penrose inverse by the DOA, showing the relative residuals in the computation of each column of the Moore–Penrose inverse matrix.
Algorithms 17 00211 g003
Figure 4. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) the contour and distribution of collocation points, and (b) the convergence rate.
Figure 4. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) the contour and distribution of collocation points, and (b) the convergence rate.
Algorithms 17 00211 g004
Figure 5. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) comparison of numerical and exact boundary conditions, and (b) the numerical error.
Figure 5. An amoeba-like boundary under a non-harmonic boundary function solved by the DOA: (a) comparison of numerical and exact boundary conditions, and (b) the numerical error.
Algorithms 17 00211 g005
Table 1. Example 3: ME obtained by the DOA and QR for different n and a fixed q = 10 .
Table 1. Example 3: ME obtained by the DOA and QR for different n and a fixed q = 10 .
n2345
Q T Q I n 1.63 × 10 15 2.48 × 10 14 3.11 × 10 13 5.24 × 10 12
QR 8.33 × 10 15 3.55 × 10 12 6.77 × 10 10 2.27 × 10 7
SVD 5.55 × 10 16 1.60 × 10 14 1.89 × 10 13 9.09 × 10 13
DOA 1.11 × 10 16 3.55 × 10 15 9.09 × 10 14 7.81 × 10 13
Table 2. Example 3: the ME and NI obtained by the DOA for different m, where ( q , n ) = ( 20 , 8 ) and ε = 10 8 .
Table 2. Example 3: the ME and NI obtained by the DOA for different m, where ( q , n ) = ( 20 , 8 ) and ε = 10 8 .
m23456
ME 4.94 × 10 6 5.30 × 10 8 1.34 × 10 6 2.90 × 10 6 6.28 × 10 6
IN144445
Table 3. Computed results of a rank-deficient matrix in Example 4.
Table 3. Computed results of a rank-deficient matrix in Example 4.
Alg.IP A A A A 2 A A A A 2 ( A A ) T A A 2 ( A A ) T A A 2 IN
[45] 10 5 A T 8.92 × 10 16 3.36 × 10 16 7.85 × 10 17 1.97 × 10 15 50
[45] 0.1 A T 9.17 × 10 20 6.99 × 10 16 7.85 × 10 17 2.86 × 10 15 60
[20] 0.1 A T 1.53 × 10 14 9.21 × 10 17 4.13 × 10 31 1.17 × 10 30 165
DOA x 0 = 10 16 1 5.21 × 10 27 2.57 × 10 29 3.82 × 10 27 1.61 × 10 27 12
Table 4. Computed results of the Hilbert matrix q = 3 , n = 50 in Example 5.
Table 4. Computed results of the Hilbert matrix q = 3 , n = 50 in Example 5.
Alg.IP A A A A 2 A A A A 2 ( A A ) T A A 2 ( A A ) T A A 2 IN
[20] 0.8 A T 1.4 × 10 28 3.7 × 10 21 4.1 × 10 30 3.5 × 10 27 34
DOA x 0 = 0 9.8 × 10 28 2 × 10 20 4.1 × 10 24 9.3 × 10 28 3
Table 5. Computed results of the Hilbert matrix q = 50 , n = 3 in Example 5.
Table 5. Computed results of the Hilbert matrix q = 50 , n = 3 in Example 5.
Alg.IP A A A A 2 A A A A 2 ( A A ) T A A 2 ( A A ) T A A 2 IN
[20] 0.8 A T 1.2 × 10 28 3.7 × 10 21 3.3 × 10 27 6.3 × 10 28 34
DOA x 0 = 0 9 × 10 29 2.6 × 10 22 3.2 × 10 23 5.5 × 10 29 3
Table 6. For the MFS on the 2D Laplace equation, ME with different ( n , q ) obtained by the DOA.
Table 6. For the MFS on the 2D Laplace equation, ME with different ( n , q ) obtained by the DOA.
( n , q ) (100,80)(100,90)(100,110)(100,120)(100,150)
ME 1.88 × 10 8 7.25 × 10 9 7.67 × 10 8 2.13 × 10 8 4.45 × 10 8
Table 7. Example 8 with n = 2000 and q = 100 : ME obtained by the DOA and INs for different m.
Table 7. Example 8 with n = 2000 and q = 100 : ME obtained by the DOA and INs for different m.
m58101215
ME 7.24 × 10 5 1.73 × 10 4 2.95 × 10 4 1.02 × 10 3 4.99 × 10 3
IN17442231610
Table 8. Example 8 with n = 500 and q = 2000 : ME obtained by the DOA and INs for different m.
Table 8. Example 8 with n = 500 and q = 2000 : ME obtained by the DOA and INs for different m.
m1012151820
ME 1.39 × 10 4 6.82 × 10 5 3.20 × 10 5 1.70 × 10 5 1.26 × 10 5
IN12372412519
Table 9. Example 8 with different ( q , n ) , q > n : ME obtained by QR and the DOA.
Table 9. Example 8 with different ( q , n ) , q > n : ME obtained by QR and the DOA.
( q , n ) (1000,500)(1500,500)(1500,1000)(2000,500)(2500,1000)
Q T Q I n 1.81 × 10 12 2.55 × 10 12 8.99 × 10 12 2.20 × 10 12 1.14 × 10 11
QR 3.18 × 10 10 6.91 × 10 10 1.73 × 10 9 1.61 × 10 9 4.56 × 10 9
DOA 2.49 × 10 13 2.66 × 10 13 2.46 × 10 13 1.77 × 10 13 1.24 × 10 13
Table 10. Example 8 with n = 500 , n q = 2000 , ε = 10 12 , ME1 and IN1 obtained by taking α 0 = 1 in the DOA and ME2 and IN2 obtained by the DOA for different m.
Table 10. Example 8 with n = 500 , n q = 2000 , ε = 10 12 , ME1 and IN1 obtained by taking α 0 = 1 in the DOA and ME2 and IN2 obtained by the DOA for different m.
m202225283035
ME1 1.13 × 10 12 8.50 × 10 13 3.33 × 10 13 2.0 × 10 3 1.88 × 10 13 2.15 × 10 13
IN1624834252115
ME2 9.15 × 10 13 8.59 × 10 13 3.94 × 10 13 2.71 × 10 13 1.77 × 10 13 1.41 × 10 13
IN2644833252116
Table 11. The MFS on a 2D Laplace equation: ME with different ( n , q ) obtained by different methods.
Table 11. The MFS on a 2D Laplace equation: ME with different ( n , q ) obtained by different methods.
( n , q ) (100,120)(100,150)(100,170)(100,200)(150,180)
GEM1 7.76 × 10 6 1.81 × 10 6 6.12 × 10 7 2.99 × 10 6 5.73 × 10 6
GEM2 2.17 × 10 11 3.82 × 10 14 2.24 × 10 14 1.99 × 10 14 3.52 × 10 14
CGNR 8.33 × 10 11 8.23 × 10 11 7.18 × 10 10 6.62 × 10 9 7.98 × 10 10
CGNE 1.84 × 10 11 4.78 × 10 7 5.44 × 10 6 6.46 × 10 10 2.58 × 10 6
Table 12. Example 10: ME with different ( n , q ) obtained by GEM1, GEM2, CGNR, CGNE, the DOA and QR.
Table 12. Example 10: ME with different ( n , q ) obtained by GEM1, GEM2, CGNR, CGNE, the DOA and QR.
( n , q ) (5,6)(5,8)(5,10)(7,25)(8,25)
GEM1 1.80 × 10 6 5.18 × 10 7 4.57 × 10 7 2.20 × 10 3 1.55 × 10 3
GEM2 3.15 × 10 12 2.27 × 10 13 1.02 × 10 12 1.29 × 10 10 4.70 × 10 9
CGNR 4.98 × 10 8 1.62 × 10 7 3.05 × 10 7 4.83 × 10 8 7.64 × 10 8
CGNE 3.52 × 10 12 1.64 × 10 12 1.88 × 10 12 4.83 × 10 8 7.64 × 10 8
DOA 1.88 × 10 13 2.55 × 10 12 7.81 × 10 13 2.72 × 10 9 1.48 × 10 9
SVD 4.85 × 10 12 1.52 × 10 12 9.09 × 10 13 3.08 × 10 10 1.52 × 10 8
Q T Q I n 8.65 × 10 12 4.57 × 10 12 5.24 × 10 12 5.66 × 10 10 1.89 × 10 8
QR 1.05 × 10 6 3.50 × 10 7 2.27 × 10 7 2.46 × 10 3 5.68 × 10 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, C.-S.; Kuo, C.-L.; Chang, C.-W. Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems. Algorithms 2024, 17, 211. https://doi.org/10.3390/a17050211

AMA Style

Liu C-S, Kuo C-L, Chang C-W. Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems. Algorithms. 2024; 17(5):211. https://doi.org/10.3390/a17050211

Chicago/Turabian Style

Liu, Chein-Shan, Chung-Lun Kuo, and Chih-Wen Chang. 2024. "Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems" Algorithms 17, no. 5: 211. https://doi.org/10.3390/a17050211

APA Style

Liu, C. -S., Kuo, C. -L., & Chang, C. -W. (2024). Solving Least-Squares Problems via a Double-Optimal Algorithm and a Variant of the Karush–Kuhn–Tucker Equation for Over-Determined Systems. Algorithms, 17(5), 211. https://doi.org/10.3390/a17050211

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop