1. Introduction
One of the most important numerical methods to solve linear systems is the Krylov subspace method [
1,
2,
3,
4,
5,
6,
7]; since the appearance of pioneering works in [
8,
9], it has been studied extensively. A lot of Krylov subspace methods were reported in [
10,
11,
12,
13,
14,
15,
16].
Many algorithms have been used to solve the least-squares problem with a minimum norm [
17,
18]:
where
denotes the Euclidean norm,
is unknown, and
and
are given.
We are concerned with a varying affine Krylov subspace (VAKS) solution and its corresponding iterative algorithm for solving
which is an over-determined system when
and an under-determined system when
.
Most results for Equation (
2) are directly extendable to Equation (
1), whose solution is called the minimum-norm or pseudo-inverse solution and is given by
, where
denotes the pseudo-inverse of
satisfying the following Penrose equations:
The pseudo-inverse has been investigated and applied to solve linear least-squares problems, and some computational methods have been developed to compute the Moore–Penrose inverse matrix [
19,
20,
21,
22,
23,
24]. In terms of the Moore–Penrose inverse matrix
, the general solution of the least-squares problem is given by
where
is arbitrary. Because
and
are orthogonal, it follows that
is the unique solution of Equation (
1).
QR factorization is a good method to solve Equation (
2). The normalized Gram–Schmidt process is a modification of the original Gram–Schmidt orthogonalization by normalizing each vector after orthogonalization. Let
denote a
matrix obtained by applying the normalized Gram–Schmidt process on the column vectors of
:
which possesses the following orthonormal property:
Hence, by
, it is easy to solve
by applying the backward substitution method to
However, the QR method is only applicable to an over-determined system with
. For an under-determined system with
, the QR method is not applicable because
is no longer an upper triangular matrix.
Recently, many scholars have proposed many algorithms to tackle least-squares problems, like relaxed greedy randomized coordinate descent methods [
25], the parallel approximate pseudo-inverse matrix technique in conjunction with the explicit preconditioned conjugate gradient least squares scheme [
26], the new randomized Gauss–Seidel method [
27], the QR–Cholesky method, the implicit QR method, the bidiagonal implicit QR method [
28], the randomized block coordinate descent algorithm [
29], an iterative pre-conditioning technique [
30], splitting-based randomized iterative methods [
31], the projection method [
32], the index-search-method-based inner–outer iterative algorithm [
33], the greedy double subspace coordinate descent method [
34], a distributed algorithm [
35], and multivalued projections [
36].
As just mentioned, there are many algorithms for solving linear least-squares problems. For small- and medium-sized problems, one can use QR-based methods, and, if the problem is rank-deficient or ill conditioned, singular value decomposition (SVD) can be adopted [
37,
38]. For large-sized least-squares problems, Krylov subspace methods are more efficient than direct methods if the matrix is sparse or otherwise structured. In the present paper, we plan to develop an efficient and accurate Krylov subspace method for large-sized least-squares problems with dense and maybe ill-conditioned matrices, not limited to sparse matrices. It is also interesting that the proposed iterative algorithm can solve least-squares problems with extremal cases of
or
. In fact, for under-determined systems, many methods are not applicable, for instance, QR and conjugate gradient (CG)-type methods, for which the basis vectors of the Krylov subspace are orthogonalized implicitly.
In addition to Equation (
1), Liu [
39] considered the following minimization
for solving ill-posed linear systems of Equation (
2) with
. Liu [
40,
41] employed the double optimization technique to iteratively solve inverse problems. Liu [
42] replaced the Krylov subspace by the column space of
and derived an effective iterative algorithm based on the maximal projection in Equation (
12). These results are quite promising. However, for a non-square system (
2), there exists no similar method of double optimization in the literature.
It is well known that the least-squares solution of the minimization in Equation (
1) is obtained by
When the derivative of
f in Equation (
12) vs.
is equal to zero, we can obtain
Taking the transpose of Equation (
2) to
and inserting it into Equation (
14), we can immediately obtain
which implies Equation (
13) by
. Many numerical methods for least-squares problems are based on Equation (
1); no numerical methods for least-squares problems are based on Equation (
12).
To solve the least-squares problem (
2), the usual approach is the Karush–Kuhn–Tucker equation method, which sets Equation (
2) to a square linear system but with a larger dimension
. Let
be the residual vector. By considering the enforcing constraint:
which is equivalent to the normal form of Equation (
2), Equations (
16) and (
17) constitute a
-dimensional linear system:
The advantage of the so-called Karush–Kuhn–Tucker equation [
15] is that one can employ a linear solver to solve Equation (
18) to obtain the least-squares solution of Equation (
2); however, a disadvantage is that the dimension is enlarged to
.
A major novelty of the present paper is that we employ both Equations (
1) and (
12) to develop a novel numerical method in a varying affine Krylov subspace. The innovative contributions are as follows: this is the first time a new concept of varying affine Krylov subspaces has been introduced, and a double-optimal iterative algorithm has been derived based on the double optimization method to solve least-squares problems. A new variant of the Karush–Kuhn–Tucker equation can significantly improve the accuracy of solving least-squares problems by adopting the partial pivoting Gaussian elimination method.
In
Section 2, the solution of Equation (
2) is expanded in an
-dimensional varying affine Krylov subspace (VAKS), whose
unknown coefficients are optimized in
Section 3 by two merit functions. The double optimal algorithm (DOA) is developed in
Section 4; we prove a key property of absolute convergence, and the decreasing quantity of residual squares is derived explicitly. Examples of linear least-squares problems are given and compared in
Section 5; in addition, Moore–Penrose inverses of non-square matrices are discussed. In
Section 6, we discuss a variant of the Karush–Kuhn–Tucker equation for solving over-determined linear systems, which is compared to conjugate-gradient-type iterative methods. Finally,
Section 7 describes the conclusions.
2. A Varying Affine Krylov Subspace Method
We consider an
-dimensional varying affine Krylov subspace (VAKS) by
where
and
is equivalent to
.
Then, we expand the solution
of Equation (
2) in
by
where
, and
and
are constants to be determined. We take
.
Let
be an
matrix. The Arnoldi process [
15] as a normalized Gram–Schmidt process is used to normalize and orthogonalize the Krylov vectors
, such that the resultant vectors
satisfy
, where
is the Kronecker delta symbol. Select
m, give an initial
, and compute
:
Hence, Equation (
21) can be written as
where
and
satisfies the orthonormal property:
For a consistent system, the exact solution
satisfies
exactly. For an inconsistent system, the exact solution
exactly satisfies Equation (
1) given by
, because the equality in
no longer holds.
On this occasion, we briefly recall the
m-dimensional conjugate gradient (CG)-type iterative method with an approximation to Equation (
2) with
. The CGNR (or CGLS) minimizes the energy function of the error [
15,
43]:
where
denotes the inner product of vectors
and
, and
is the exact solution of Equation (
2). Overall,
in the affine Krylov subspace (AKS) is:
where
. There is some extension of the CGNR (or CGLS) to bounded perturbation resilient iterative methods [
44].
Craig’s idea was to seek the solution of Equation (
2) with
by
By introducing
, we seek
in the AKS by
to minimize
where
is the exact solution of
. The above statement for
is also valid for
with
replaced by
. The resulting CG iterative algorithm is known as the CGNE (or the Craig method), which minimizes the error function [
15,
43].
Notice that the affine Krylov subspaces in Equations (
27) and (
29) are fixed AKSs upon setting
and
, while in Equation (
21), the AKS is varying, denoted as a VAKS, owing to the translation vector
not being fixed to
:
Upon comparing Equation (
21), whose coefficient
before
is an unknown constant, to Equations (
27) and (
29), whose coefficients before
and
are fixed to one, the affine Krylov subspace in Equation (
21) is a varying affine Krylov subspace (VAKS). The dimension of the VAKS is one more than that of the AKS.
4. A Numerical Algorithm
In this section, we develop an iterative algorithm to solve the least-squares problem, starting from an initial guess . We assume that the value of at the kth step is already known, and is computed at the next step via the iterative algorithm. According to the value , the kth step residual can be computed.
When the initial guess
is given, an initial residual is written as follows:
Upon letting
Equation (
2) is equivalent to solving
from
which is deduced by
For system (
111), we seek
in the VAKS by
where
and
. The constants
and
are determined by the following two minimizations:
After inserting Equation (
112) for
, the first minimization can derive
, while the second minimization can derive
.
Since two minimizations in Equations (
114) and (
115) are adopted to determine the descent vector
in Equation (
112), the resulting iterative algorithm to solve the least-square problem might be labeled as a double optimal algorithm (DOA).
To treat the rank deficient least-squares problem, the dimension
m of the VAKS must be
, such that
and
is an
invertible matrix. Consequently, the DOA is depicted by (i), giving
,
,
, an initial value
and the convergence criterion
and (ii) doing
,
until the convergence with
or
. We call
the relative residual.
In the above, is matrix, is a matrix, and are matrices, is an matrix and is a matrix. The computational cost is expanded to compute , which is however quite time-saving because m is a small number. is an fixed matrix computed once and used in the construction of , which requires operations. requires operations; requires operations; requires operations; requires operations; and requires operations. In each iteration, there are operations. Denote the number of iterations by NI. The computational complexity is in total NI .
It is known that m is a key parameter in the Krylov subspace. A proper choice of m can significantly enhance the convergence speed and accuracy. For ill-posed linear least-squares problems, there exists the best value of m, but for well-posed linear least-squares problems, small values of m may slow down convergence. When m is increased, both the convergence speed and accuracy are increased. However, when m is increased, more computational power is required in the construction of the projection operator and the inversion matrix .
We make some comments about the initial value of
. For under-determined systems, there are many solutions; hence, different choices of
would generate different solutions. In general, as required in Equation (
1), the minimal-norm solution can be obtained by setting
, whose norm is zero
. For over-determined systems, we take
such that the initial residual is a non zero vector
and the DOA based on the residual
is solvable. Indeed, the DOA is not sensitive to the initial value of
; we will take
for most problems unless otherwise specified.
The following corollaries help the clarification of the DOA and are based on the DOS used in the residual Equation (
111).
Corollary 3. is orthogonal to in Equation (111) and , i.e., Proof. By Equation (
111), the next
is given by
where
. By Equation (
116), we can also derive Equation (
119) as follows:
For Equation (
111), Equation (
105) is written as
and taking the
inner product to Equation (
119), we have
Inserting
into Equation (
117), we can prove Equation (
118). The DOA is a good approximation of Equation (
2) with a better descent direction
in the VAKS. □
Corollary 4. For the DOA, the convergence rate is Proof. It follows from Equations (
119) and (
120) that
where
is the intersection angle between
and
. With help from Equation (
120), we have
Obviously,
because
. Then, Equation (
123) can be further reduced to
Dividing both sides by
and taking the square-roots of both sides, we can obtain Equation (
122). □
Corollary 5. The residual is decreased step by step, Proof. Taking the squared norms of Equation (
119) and using Equation (
120) generates
and after inserting
, Equation (
126) is proven. Corollary 5 is easily deduced by noting
. □
The property in Equation (
126) is very important, which guarantees that the DOA is absolutely convergent at each iteration.
Remark 3. At the very beginning, the DOS and DOA are developed in the VAKS similar to those in Equation (21). If one takes in Equation (21), reduces to the usual affine Krylov subspace . However, in this situation, several good properties similar to those in Theorem 3 and Corollaries 3–5 will be lost; convergence cannot be guaranteed in the resulting iterative algorithm. On the other hand, if we take and directly work in the -dimensional Krylov subspace , by extending to an -dimensional vector, this would result in failure because after inserting into Equation (62), cannot be defined. Corollary 6. In the iterative algorithm (116), we may encounter a slowly varying point: when is given such that satisfies for some .
Proof. Inserting
and
into Equation (
92) yields
when
,
, and
by Equation (
116). □
5. Numerical Examples
To demonstrate the efficiency and accuracy of the present iterative DOA, several examples are presented. All the numerical computations were carried out in Microsoft Developer Studio with an Intel Core I7-3770, CPU 2.80 GHz and 8 GB memory. The precision is .
5.1. Example 1
In this example, we find a minimal-norm and least-squares solution of the following consistent equation:
Although Example 1 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for a consistent system.
First we apply the DOS with to solve this problem, where we can find a quite accurate solution of , , , and , for which the three residual errors are , and , respectively.
Although the DOS is an accurate solution with an error on the order of
, the accuracy can be improved by the DOA. In the application of the iterative DOA, initial guesses are given by
for some constant. Then, the DOA with
solves this problem with
and
. As shown in
Figure 1a, the DOA converges very fast in only seven steps. Furthermore, we can find very accurate solutions of
,
,
, and
; the norm of
is
. The three residual errors are
,
and
, respectively.
On the other hand, if we take , , and are obtained, which show that the solution = is accurate. However, the norm is not the minimal one. Therefore, the correct one for the minimal-norm solution is . Unless otherwise specified, we will take the initial guess by for most problems computed below.
5.2. Example 2
A minimal norm and least-squares solution is sought for of the following inconsistent equation:
Although Example 2 is simple, we use it to test the DOS and DOA for finding the minimal-norm solution for an inconsistent system.
We first apply the DOS with , where we obtain a very accurate solution with , , and . The residuals are with the residual norm being .
Then, the DOA with
and
solves this problem. Starting from the initial guesses
, as shown in
Figure 1b, the DOA converges very fast in only two steps, where we obtain a very accurate solution with
,
, and
. The residuals are
, with the residual norm being
. For the inconsistent system, the equality in Equation (
130) no longer holds, such that the residual is not a zero vector.
The Moore–Penrose inverse of the coefficient matrix
in Equation (
130) has an exact solution:
Then, we find that the exact solution of Equation (
130) is given by
, as just
mentioned above.
We apply the iterative method developed by Petkovic and Stanimirovic [
20] to find the Moore–Penrose inverse, which has a simpler form:
based on the Penrose Equations (
4) and (
6). Under the same convergence criterion, this iteration process is convergent in 130 steps, as shown in
Figure 1b by a dashed line.
Note that the DOA can be also used to find the Moore–Penrose inverse
by taking the right-hand side
to be a zero vector and the
k-th element to be one, of which the corresponding solution of
denoted by
constitutes the
kth column of
. Then, we can run the DOA
q times from
to obtain
The dimension of
is
. By applying the DOA with
under the same convergence criterion
, we can find a very accurate solution of
, as shown in Equation (
131) with the maximum error of all elements being
, which is achieved in a total of 26 steps and converges faster than the method developed by Petkovic and Stanimirovic [
20].
5.3. Example 3
We consider an over-determined Hilbert linear problem:
where we fix
and
, and the exact solutions
are given.
The DOA with
and
can solve this problem very fast, starting from the initial guesses
, in only four steps, as shown in
Figure 2a. Furthermore, we can find a very accurate solution, as shown in
Figure 2b with the maximum error (ME) being
. If we consider an under-determined system with
and
, the DOA with
is still workable with ME =
.
For least-squares problems, QR is a famous method to directly solve Equation (
2). This process is shown in Equations (
9)–(
11). However, we find that ME =
obtained by QR is less accurate than that obtained via the DOA in the Hilbert least-squares problem with
and
. We have checked the accuracy of the orthogonality in Equation (
10) by
which is on the order of
.
The Hilbert matrix is known as a notorious example of a highly ill-conditioned matrix. For this problem, the condition number is
. Then, in the floating point,
has a rounding error on the the order of
. The algorithm introduces further floating point errors which are of the same order of magnitude. Perturbation theory for the least-squares problem [
38] shows that one can only expect to find a solution that is approximately within
of the exact solution. The error
is within this range; however, the ME =
obtained by QR is not within this range. The reason for this is that when we use Equation (
11) to find the solution in the QR method, very small singular values appearing in the denominator will enlarge the rounding error and the error from
; for example, the last two singular values
and
, upon dividing by those two small values sequentially, lead to ME =
, which is not within the range of
. In contrast, the proposed DOA method with an error of
is within this range. For highly ill-conditioned least-squares problems, the QR method loses some accuracy, and even
well satisfies Equation (
10) within the range of
.
We raise
q to 10, and
Table 1 lists the computed ME for different values of
n. We also list the error
for the QR method, which is worse when
n is increased. Based on Equations (
88)–(
90), this problem is solved by SVD. The DOA is competitive with SVD, and for all cases, it is slightly more accurate than SVD.
To investigate the influence of
m on the DOA,
Table 2 compares the ME and the iteration number (IN). For ill-posed Hilbert problems, there exists the best value
.
5.4. Example 4
We find the Moore–Penrose inverse of the following rank deficient matrix [
45]:
We apply the iterative method of the DOA as demonstrated in
Section 5.2 to find the Moore–Penrose inverse, which has an exact solution:
Under the same convergence criterion
as used by Xia et al. [
45], the iteration process of the DOA with
converges very fast in a total of 12 steps, as shown in
Figure 3. In
Table 3, we compare the DOA with other numerical methods specified by Xia et al. [
45] and Petkovic and Stanimirovic [
20] to asses the performance of the DOA measured by four numerical errors of the Penrose Equations (
3)–(
6) and the iteration number (IN).
In the above, Alg. is the shorthand of algorithm, IP is the shorthand of initial point and IN is the shorthand of “iteration number”.
. The first two algorithms were reported by Xia et al. [
45]. It can be seen that the DOA converges much faster and is more accurate than the other algorithms.
In the computation of the pseudo-inverse, the last two singular values close to zero can be avoided to appear in the matrix
if we take
m small enough. For example, we take
in
Table 3, such that
can be computed accurately without inducing a zero singular value to appear in the denominator to increase the rounding error.
5.5. Example 5
In this example, we find the Moore–Penrose inverse of the Hilbert matrix in Equation (
134), which is more difficult than the previous example. Here, we fix
,
, and
,
. The numerical errors of the Penrose Equations (
3)–(
6) are compared in
Table 4 and
Table 5, respectively.
5.6. Example 6
In this example, we consider a non-harmonic boundary value problem on an amoeba-like boundary:
where
, and the contour is displayed in
Figure 4a by a solid black line. Feng et al. [
46] have pointed out that “non-harmonic boundary data” mean that the solution does not have a harmonic extension to the whole plane and a solution is very difficult to achieve.
Let
be a new variable, and then we come to a Poisson equation under a homogeneous boundary condition:
where
, because
is a non-harmonic boundary function. When
is solved, we can find
.
In order to obtain an accurate solution of
, we use the multiple-scale Pascal triangle polynomial expansion method developed by Liu and Kuo [
47]:
After collocating
q points to satisfy the governing equation and boundary condition (
141), we have a non-square linear system (
2), where the scales
are determined such that each column of the coefficient matrix
has the same norm.
By using the DOA, we take
,
,
,
and
. Hence, the dimension of
is
, for which system (
2) is an over-determined system. The distribution of collocation points is shown in
Figure 4a, while the convergence rate is shown in
Figure 4b. It is amazing that the DOA converges with nine steps even under a stringent convergence criterion
. In
Figure 5a, we compare the recovered boundary function with the exact function at 1000 points along the boundary. They are almost coincident, and thus we plot the absolute error in
Figure 5b, whose ME =
and the root mean square error (RMSE) =
. This result is much better than that computed by Feng et al. [
46].
5.7. Example 7
The method of fundamental solutions (MFS) is adopted to solve Equation (
137) by using
where
is the complementary set of
, and
are source points, in which
is a parameter of the circle on which are the source points placed on.
We consider an exact solution
and specify the Dirichlet boundary condition on the boundary given by Equation (
139). In Equation (
144), we fix
. By using the DOA, we take
and
.
Table 6 lists the computed ME for different values of
obtained by the DOA. It can be seen that the DOA is applicable to least-squares problems regardless of whether
or
.
When we apply the QR to solve this problem with and , ME = 42.81 is obtained. When we take and , ME = 2.11 is obtained. Obviously, the QR is not applicable to this problem.
5.8. Example 8
In this example, we consider Equation (
2) with a cyclic matrix, whose first row is given by
. The coefficient matrix can be generated by the following procedure:
The exact solution is
. The non-square matrix
is obtained by taking the first
q rows from
.
Since the right-hand side vector
is obtained from Equation (
2) after inserting
and
, the non-square system is consistent. Therefore, we can compute the maximal error (ME) by
where
are the exact solutions and
are numerical solutions.
For a large matrix of
with
and
,
Table 7 lists the computed ME for different values of
m and the INs under
with the initial guesses
. If we take the initial guesses to be
, the accuracy is poor with ME = 0.961 and the convergence is very slow. As shown in Corollary 6, for the zero initial value, we have
and for the cyclic matrix, we have
, where
. Thus,
, where
. Hence,
is a huge value, which causes the failure of the DOA with a zero initial value for this problem. The same situation happens for the constant initial guesses
. It is apparent that the DOA is quite accurate and converges faster. It is interesting that a small
m is enough to achieve accurate solutions; when
m is smaller, the accuracy is better, but it converges slower. In contrast, when
m is larger, the accuracy is worse but it converges faster.
Next, we construct a large-sized matrix
from Equation (
145) with
n replaced by
. The non-square matrix
is obtained by taking the first
n columns from
. With
and
,
Table 8 lists the computed ME for different values of
m, and IN under
with the initial guesses
. It is apparent that the DOA is quite accurate and converges faster. When
m is increased, the accuracy is better and convergence is faster.
As mentioned, the QR is not applicable to the least-squares problem with
; however, for
, the QR is applicable.
Table 9 lists the computed ME for different values of
. In the DOA, we take
,
, and
. The DOA converges within 25 iterations for the first four cases, and 79 iterations for the last case. The DOA can improve the accuracy by about three and four orders compared to the QR. We also list the error
for the QR method, which is very accurate.
If we take
in Equation (
21),
reduces to the usual affine Krylov subspace
. Equation (
80) becomes
As shown in
Table 10, the DOA is more accurate than the DOA with
.
7. Conclusions
Least-squares problems arise in a lot of applications, and many iterative algorithms are already available to seek their solutions. In this paper, a new concept of a varying affine Krylov subspace was introduced, which is different to the fixed-type affine Krylov subspace used in the CG-type numerical solution of over-determined least-squares problems. In the
-dimensional varying affine Krylov subspace, a closed-form double optimal solution in a simple form in Equation (
42) was derived and further reduced to Equation (
80), which was obtained by two minimizations in Equations (
40) and (
41). We analyzed a key equation (
72) to link these two optimizations together. The double optimal solution is an excellent approximation, as verified for least-squares problems. The iterative DOA was developed, which leads to step-by-step absolute convergence. In each iterative step, by merely inverting an
matrix with a small value of
m, the computational cost of the DOA is very low, lower than other methods. The Moore–Penrose inverses of non-square matrices were also derived by using the varying affine Krylov subspace method. For over-determined least-squares problems, we proposed a variant of the Karush–Kuhn–Tucker equation, which uses the partial pivoting Gaussian elimination method and is better than the original Karush–Kuhn–Tucker equation and CG-type numerical methods of CGNR (CGLS) and CGNE (Craig’s method). It is very important that the DOA can be applied to both rectangular systems with
or
.
The novelties involved in this paper are as follows:
A double-optimization-based iterative algorithm for least-squares problems was developed in a varying affine Krylov subspace.
For dense large-size least-squares problems, the proposed iterative DOA is efficient and accurate.
The DOA can be applied to both rectangular systems with extremal cases of or .
A variant of the Karush–Kuhn–Tucker equation was presented, which is an improved version of the original Karush–Kuhn–Tucker equation.
The idea of varying the affine Krylov subspace is useful to find a better solution of linear algebraic equations based on the mechanism of double optimization. In the future, we may extend the new methods of DOSs and the DOA to other linear matrix equations. The limitations are that there are several matrix multiplications needed to construct the projection operator in the Krylov subspace and the high computational cost for inversion of the matrix with dimension m.