1. Introduction
Phase retrieval (PR) has many applications in science and engineering, which include X-ray crystallography [
1], molecular imaging [
2], biological imaging [
3], and astronomy [
4]. Mathematically, phase retrieval is a problem about finding a signal
or
satisfying the following phaseless constraints:
where
is the observed measurement,
or
is the measuring vector, and
denotes noise. In optics and quantum physics,
is related to the Fourier transform or the Fractional Fourier transform [
5]. To remove the theoretical barrier, recent works also focus on the generic measurement vectors, namely
independently sampled from
.
Current methods for solving phase retrieval can be generally divided into two groups: convex and non-convex approaches. Convex methods, e.g., PhaseLift [
6] and PhaseCut [
7], utilize a semidefinite relaxation to transfer the original problem into a convex one. Nevertheless, the heavy computational cost of these methods hinders its application to practical scenarios. On the other hand, non-convex methods including the Gerchberg–Saxton (GS) [
8,
9], the (truncated) Wirtinger flow (WF/TWF) [
10,
11], and the (truncated) amplitude flow (AF/TAF) have a significant advantage in lower computational cost and sampling complexity (i.e., the minimal required
to achieve successful recovery) [
12]. More recently, two variants of AF using the reweighting [
13] and the smoothing technique [
14,
15] have been proposed to further lower sampling complexity. However, for the mentioned nonconvex methods, the rate of successful reconstruction relies upon elaborate initialization. Moreover, an initialization point nearby the true solution is also a necessity for convergency property in theoretical analysis.
1.1. Prior Art
Current initialization methods for PR problem include spectral method [
16], the orthogonality promoting method [
12], null vector method [
17], and the (weighted) maximal correlation method [
13]. These methods first construct a
spectral matrix in the form of
where the weight
is a positive real number. The estimate is then given as the (scaled) eigenvector corresponding to the largest or smallest eigenvalue of
. The spectral method and the (weighted) maximal correlation method fall into one category as they seek the leading vector of
as the estimate. We categorize these methods as
correlation-focused methods, since their spectral matrices endow more weight to the measuring vectors more correlated to the original signal. Conversely, the null vector and orthogonality promoting method can be regarded as
orthogonality-focused methods. For the vector
with less correlation to the original signal, the corresponding weight
is larger in the spectral matrix.
These two types of initialization methods each have their own merits and demerits. The correlation-focused methods perform better when the oversalting rate is low. While the situation is contrary as is large enough, the orthogonality-focused method provides a more accurate estimate.
1.2. This Work
In this paper, we propose a method combining the advantages of the above two types of initialization methods. Our intuition is to construct a composite spectral matrix, which is a weighted sum of spectral matrices of correlation-focused methods and orthogonality-focused methods. Hence, the new method is termed as the composite initialization method.
1.3. Article Organization and Notations
This paper adopts the following notations. The bold-font lower-case letter denotes a column vector, e.g., . Bold capital letters such as denote matrices. and stand for the transpose and the conjugate transpose, respectively. Calligraphic letters such as denote a index set, and denotes the number of elements of . is a simple notation of a set of elements in the form of . denotes the -norm of . means for the sake of simplicity, if not specially specified.
The reminder of this article is organized as follows. The algorithm is given in
Section 2.
Section 3 provides the theoretical error analysis about the proposed initialization method with some relevant technique error bounds being placed in the Appendix.
Section 4 illustrates the numerical performance and makes a comparison of the proposed method with other initialization methods.
2. The Formulation of the Composite Initialization Method
This section provides the formulation of the proposed method. Without loss of generality, we assume that
Measuring vectors
are assumed to be independently sampled centered Gaussian variables with identity covariance for the convenience of theoretical analysis. Let
be the true solution of (
1). For concreteness, we focus on the real-valued Gaussian model. We further assume that
since this investigation focuses on the estimation of the original signal. In practice,
can be estimated by
. To begin with, we describe the null vector and the maximal correlation method which form the basis of our algorithm.
The null vector method first picks out a subset of vectors
, where
. Then, the null vector method approximates the original signal
by a vector that is most orthogonal to this subset. Since
is in the ascending order,
can be directly written as
, where
is the cardinality of
. The intuition of the null vector method is as simple as follows. Since
takes a very small value
when
, one can construct the following minimizing problem to estimate
to coincide this property.
Solving (
2) is equivalent to finding the smallest eigenvalue and the corresponding eigenvector of
. The so-called orthogonality promoting method proposed by Wang is an approximately equivalent method that transforms the minimization problem (
2) into a maximization problem.
The maximal correlation method is based on the opposite intuition, which picks out a subset of vectors
corresponding to the largest
elements in
. As
is in the ascending order,
can also be written directly as
The core idea of maximal correlation method is searching for the vector that is most correlated with
. This is achieved by solving the following maximizing problem.
From (
2) and (
3), one can observe that these two methods only utilize a subset of vectors that are either most correlated or most orthogonal to the original signal. To make use of as much as possible information, we try to solve a composite problem combining (
2) and (
3) together.
We can observe that (
4) has a symmetrical structure and can be interpreted as a modified version of (
2), which adds the objection of (
3) as a penalization term. The solution of (
4) will be a more accurate estimate than that of (
3) since it utilize the information from (
2).
Let
. We set the parameter as
in (
4). The true signal
is roughly the solution of (
2) and (
3); thus, it is also the approximated solution of (
4). In next section, we will analyze the error of our proposed method.
The algorithm is presented in Algorithm 1.
is added to ensure the positivity of
. The
in Step 4 is obtained by solving the following system of linear equations:
which can be solved by the conjugate descent method since
M is a positive and Hermitian by using
as initializer.
Since
and
are indistinguishable for the phase retrieval, we evaluate the estimate using the following metric.
The other initialization methods, e.g., the spectral method, the truncated spectral method, and the (weighted) maximal correlation method, are all based on the power iteration. The spectral method is for finding the leading eigenvector of matrix
, which is also realized by the power approach. Chen proposed the truncated spectral method to improve the performance of the spectral method, which constructed another matrix via the threshold parameter
[
17]. The corresponding matrix for iteration is
, where
is defined by the following.
The numerical performance of these initialization methods will be compared later.
Algorithm 1 The composite initialization method. |
Input: Increasingly arranged and corresponding , truncated value , threshold value , a random initialization . - 1:
, , - 2:
- 3:
fordo - 4:
- 5:
- 6:
if , break - 7:
end for
Output: |
3. Theoretical Analysis
In this section, we will present the error estimate of the proposed method under Gaussian assumptions in the real case.
Since the measuring vectors are assumed to be i.i.d. Gaussians, we can assume that
without loss of generality, where
. Otherwise if
, there exists an orthogonal matrix,
, satisfying
. Denote
, then
. Let
, then the original problem (
1) is identical to
because
is composed by Gaussian vectors due to the invariance of Gaussian under the orthogonal transform.
The spectral matrix for orthogonality promoting method is the following.
By denoting
, then
can be written as follows:
where the following is the case.
The matrix for maximal correlation method is the following:
where the following is the case.
The constructed matrix for estimation is the following.
To proceed, we notice the following basic facts:
- 1.
The orthogonality promoting method is for finding the eigenvector of minimum eigenvalue of
.
- 2.
In the ideal case that
is a zero matrix,
degenerates to the following.
If we can also ensure that is positively definite, the eigenvector corresponding to the smallest eigenvalue is exactly the true signal.
These two facts inspire us to estimate the error by computing the eigenvector of matrix after adding perturbation . Thus, the outline of theoretical analysis is concluded as follows:
- 1.
Estimate , where and are the minimum eigenvalue of and , respectively;
- 2.
With , we can then compute the perturbation of corresponding eigenvector, , which is the exact error of our algorithm.
Specifically, we have the following roadmap of theoretical analysis.
Section 3.1 presents bound results for each component of the spectral matrix
. Using the results in
Section 3.1, the bounds of
can then be easily obtained in
Section 3.2. The relationship between perturbation of eigenvalues and eigenvectors is presented in
Section 3.3, which finally induces the error estimation of our algorithm formally in
Section 3.4.
3.1. Analysis of Each Component of the Spectral Matrix
In this part, we will provide the bounds for the variables involved of matrix , which are basic ingredients for estimating perturbation of eigenvalue and eigenvector, namely and . In particular, the upper bound for , lower bound for and the norm of will be analysed.
3.1.1. Upper Bound of
Finding the upper bound of actually consists in finding the upper bound of and the lower bound of . First, we have the upper bound of under statistical meaning in the follow lemma.
Lemma 1. We have the following:where . The proof of Lemma 1 is placed in the
Appendix A and
Appendix B. As for the lower bound of
, we borrow a result from Wang [
13], Lemma 3.
Lemma 2. The following holds with probability exceeding :provided that for some absolute constants , and . Then, we obtained the required upper bound of
Lemma 3. Under the set-up of Lemmas 1 and 2, we have the following:with probability of at leastwhere , , and denote and , respectively. 3.1.2. Lower Bound of the Smallest Eigenvalue of
Since is a linear combination of and , the lower bound of the smallest eigenvalue of can be estimated from the bounds of eigenvalues of and .
According to (
10) and (
12), we rewrite
and
as the following:
where
and
and
are termed Gaussian matrices here since their entries are all sampled from the Gaussian distribution. We notice that
and
are Wishart matrices, which can be written as the product of a Gaussian matrix and its adjoint [
18]. Moreover, the eigenvalue perturbation of Wishart matrix obeys the following classical result.
Theorem 1 ([
19], Corollary 5.35).
Let be an matrix for which its entries are independent standard normal random variables. Then, for every , with a probability of at least , the following events hold simultaneously:where and stand for the smallest and the largest singular value of , respectively. By pplying Theorem 1 to with the following replacements:
- (1)
,
- (2)
, ,
- (3)
,
one can see that the following is the case.
In theoretical analysis, we assume that
is large enough such that
. However, it should be observed that setting
suffices to output reasonable estimates as shown in the numerical test. Based on this assumption, one then has the following.
The above holds with probability exceeding
, where
. Similarly, we have the following:
where
.
Hence, we obtain the lower bound for the smallest eigenvalue of
G.
A necessary condition for the validness of our method is which can be satisfied by choosing proper and
3.1.3. The Upper Bound of
Now, let us turn to the estimation of the norm of
. We denote each item of
as
.
Since
obeys normal distribution,
and
, we can derive the following.
Let
, then
obeys chi-squared distribution, which is a sub-exponential distribution. With
, we can reform
as the following.
The expectation of is obviously .
The variance of is also needed in order obtain the upper bound of . To this end, we need to recall the Bernstein-type inequality for sub-exponential random variable.
Theorem 2. Let be i.i.d. centered sub-exponential random variables with sub-exponential norm denoted as K. Then, for every , we have the following:where is an absolute constant. K is the so-called sub-exponential norm defined as the following. Define a centered sub-exponential random variable
. The sub-exponential norm of
is computed in
Appendix C.
Appendix C tells that the sub-exponential norm
in our case. By replacing
and
in Theorem 2, one can then conclude the following result.
Theorem 3. Since , let , then we have the following:where is an absolute constant. Gathering these results, we have the following theorem.
Theorem 4. The above holds with the probability stated below:with some absolute constant . 3.2. Estimate of
To make an estimate for
, we first recall a classical result from matrix perturbation theory in [
20].
Theorem 5. Let the following be the case: Let the above be Hermitian and set over all and , where stands for the set of eigenvalue of . Then, the following is the case:where and are the j-th smallest one among the eigenvalues of and , respectively. To ensure the effectiveness of our algorithm,
should still be the minimum eigenvalue after adding perturbation in our case. Then,
is simply
that is bounded by (
25). Hence,
can be estimated by the following.
3.3. Computing
With the upper bound of
, we can now estimate the perturbation of the eigenvector. The minimum eigenvector of
is computed by solving
:
where
denotes the
identity matrix here. Without loss of generality, the solution of (
36) can be written in the form of
. Then, we have the following.
Therefore,
can be bounded by the following:
where (38) is derived based on (
37). The form of (39) would be a bit complicated, as stated as follows.
3.4. Main Result
Notice that is exactly . Then, it is straightforward to induce our main theorem by combining the above results.
Theorem 6 (Main result)
. Consider the problem of estimating arbitrary from m phaseless measurements (1) under the Gaussian assumption. Suppose the output of Algorithm 1 is . If , , , and , then we have the following error estimate for the composite initialization method:with a probability of at least the following:where the following is the case, for and some absolute constants .
Probability
P in (
42) can be negative when
m and
n are too small, which shows the limitation of our analysis since
m and
n need to be large enough so as to render the probability reasonable. In the extreme case that
and
are close to 0, our error estimation expression (
41) can be approximated by a simpler form:
which is verified by numerical experiments later.
4. Numerical Results
In this section, we test the accuracy of the proposed method and compare it with other methods, including the spectral method, the truncated spectral method, the reweighted maximal correlation method, the null vector method, and the orthogonality method. The sampling vectors and the original signal are independently randomly generated. To eliminate the influence of error brought by estimation of , the original signal is normalized. We chose for numerical experiments for the proposed method. All the following simulation results are averaged over 80 independent Monte Carlo realizations.
Figure 1 provides the RMSE calculated by (
6) versus the oversampling rate
of the mentioned initialization methods. Obviously, all methods exhibit better performance as
increases. In particular, the proposed initialization method outweighs the other methods. When
roughly, the composite initialization method performs closely as the null vector does, and the convergency behavior coincides with (
45). When
, the proposed method does not lose its accuracy as dramatically as the null vector method does. The proposed algorithm provides the most accurate estimate when oversampling rate is below the information limit, i.e.,
.
Figure 2 illustrates the importance of initialization method on refining the success rate of TAF algorithm [
12]. In each simulation, each initialization generates an estimate as the initializer for the TAF algorithm. A trial is considered to be successful if TAF algorithm can return a result with RMSE less than
. When
is large, all the presented initialization methods ensure almost 100% success rate. However, when
approaches 0, the differences in success rates of various methods appear gradually. Therefore, the composite initialization method can help TAF to achieve better success rate compared with the other two methods.
Table 1 illustrates the CPU time needed for the TAF using our method as an initializer compared with other two typical initialization methods. To distinguish the performance more clearly, the length of signal is set as a large number
. The proposed method and null vector method need to solve a linear system of equations at each inverse power computation, which makes it more time-consuming than the power computation of the maximal correlation method. However, the proposed method provides a more accurate initializer, which can help TAF converges faster. Hence, the overall efficiency of our algorithm is not far behind power-type methods as shown in
Table 1.