Next Article in Journal
C0–Semigroups Approach to the Reliability Model Based on Robot-Safety System
Next Article in Special Issue
Comparative Analysis of Exact Methods for Testing Equivalence of Prevalences in Bilateral and Unilateral Combined Data with and without Assumptions of Correlation
Previous Article in Journal
Symmetric Identities Involving the Extended Degenerate Central Fubini Polynomials Arising from the Fermionic p-Adic Integral on p
Previous Article in Special Issue
Weighted Least Squares Regression with the Best Robustness and High Computability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extensions of Some Statistical Concepts to the Complex Domain

Mathematics and Statistics, McGill University, Montreal, QC H3A 2K6, Canada
Axioms 2024, 13(7), 422; https://doi.org/10.3390/axioms13070422
Submission received: 22 May 2024 / Revised: 14 June 2024 / Accepted: 20 June 2024 / Published: 22 June 2024
(This article belongs to the Special Issue New Perspectives in Mathematical Statistics)

Abstract

:
This paper deals with the extension of principal component analysis, canonical correlation analysis, the Cramer–Rao inequality, and a few other statistical concepts in the real domain to the corresponding complex domain. Optimizations of Hermitian forms under a linear constraint, a bilinear form under Hermitian-form constraints, and similar maxima/minima problems in the complex domain are discussed. Some vector/matrix differential operators are developed to handle the above types of problems. These operators in the complex domain and the optimization problems in the complex domain are believed to be new and novel. These operators will also be useful in maximum likelihood estimation problems, which will be illustrated in the concluding remarks. Detailed steps are given in the derivations so that the methods are easily accessible to everyone.

1. Introduction

In most textbooks on statistics, scalar/vector/matrix-variate random variables in the complex domain and the corresponding statistical analysis in the complex domain, are not discussed. But, in a large number of physical situations, it is natural or more convenient to represent variables in the complex domain. Hence, statistical techniques in the complex domain are required for data analysis in these situations. In the physical science and engineering literature, there are a number of papers dealing with random variables in the complex domain. Data reduction techniques such as principal component analysis in the complex domain seem to be the main topic in these areas. Most of the papers in these applied areas, concentrate on developing algorithms for computing eigenvalues and eigenvectors, which are useful and relevant in principal component analysis, independent component analysis, factor analysis, and so on. Statistical analysis in the complex domain is widely used in the analysis of multi-look return signals in radar [1], in multi-task learning in artificial intelligence and machine learning [2], in problems such as signal processing [3], in principal component analysis and independent component analysis in analyzing meteorological data in the complex domain [4], in optimal allocation of resources, especially energy resources [5], in holography, microscopy and optical metrology [6], in delayed mixing in speech processing, in biomedical signal analysis, and in financial data modeling, etc. [7].
In the present paper, vector/matrix differential operators in the complex domain are defined. Then, these are applied in optimizing a Hermitian form under a linear constraint, a bilinear form under Hermitian-form constraints, etc. Then, as applications of these optimization problems, the real-domain techniques of principal component analysis and canonical correlation analysis are extended to the complex domain. Other statistical concepts such as the Cramer–Rao inequality, least square procedure, and related aspects are extended to the complex domain. Detailed derivations are given so that the methods will be accessible even to beginners.
The following notation is used in this paper: Scalar variables, whether mathematical or random, are denoted by lower-case letters such as x , y . Vector/matrix variables are denoted by capital letters such as X , Y . Scalar constants are denoted by a , b , etc., and vector, matrix constants by A , B , etc. The wedge product of the differentials d x and d y is defined as d x d y = d y d x , where x and y are two real scalar variables, so that d x d x = 0 , d y d y = 0 . Let X = ( x j k ) be an m × n matrix with distinct real scalar variables x j k as elements, then d X = j = 1 m k = 1 n d x j k . The transpose of a matrix X is denoted by a prime as X . For a p × p matrix X, if X = X (symmetric), then d X = j k d x j k = j k d x j k . Variables in the complex domain are written with a tilde, such as x ˜ , y ˜ , X ˜ , Y ˜ . If X ˜ is a p × q matrix in the complex domain, then X ˜ = X 1 + i X 2 , i = ( 1 ) , X 1 , X 2 are real, then d X ˜ is defined as d X ˜ = d X 1 d X 2 . The determinant of a real p × p matrix Y is written as | Y | or as det ( Y ) and if Y ˜ is in the complex domain, then the absolute value of the determinant of Y ˜ is written as | det ( Y ) | . Also, tr ( Y ) means the trace of the square matrix Y. A p × p real matrix A being positive definite is written as A > O , and X ˜ = X ˜ * > O indicates that the matrix X ˜ , in the complex domain, is Hermitian positive definite. Other notation is explained when it occurs for the first time.
This paper is organized as follows: Section 2 starts with defining a vector differential operator in the complex domain. Then, with the help of this operator, some optimization problems such as optimizing a linear form subject to Hermitian-form constraint, a Hermitian form under Hermitian-form constraint, and a bilinear form with Hermitian-form constraints are discussed. Then, a matrix differential operator in the complex domain is defined. Differentiations of the trace of a product of matrices and determinant of a matrix, by using the matrix differential operator in the complex domain, are dealt with. Section 3 deals with the extension of principal component analysis to the complex domain. Section 4 delves into the extension of canonical correlation analysis to the complex domain. Section 5 examines the extension of the Cramer–Rao inequality, Cauchy–Schwarz inequality, and the least square estimation procedure to the complex domain. Detailed steps are given in each situation.

2. Optimization Involving Linear Forms, Traces, and Determinants

Here, we will consider linear forms and their differentiations first, and then, we will consider some situations of optimizing a Hermitian form with linear constraint and optimization of a linear form with Hermitian-form constraint. We consider some basic results to start with. Let
A = a 1 a p , X ˜ = x ˜ 1 x ˜ p , A = A 1 + i A 2 , i = ( 1 ) , X ˜ = X 1 + i X 2
where A 1 , A 2 , X 1 , X 2 are real vectors. Assume that A is a non-null arbitrary coefficient vector and X ˜ is a known vector random variable in the complex domain.
Theorem 1.
For A and X ˜ as defined above,
u ˜ = A * X ˜ u ˜ A = 2 X ˜ .
Proof. 
This can be easily seen from the following: u ˜ = ( A 1 i A 2 ) ( X 1 + i X 2 ) = A 1 X 1 + A 2 X 2 + i ( A 1 X 2 A 2 X 1 ) Then,
u ˜ A 1 = X 1 + i X 2   and   u ˜ A 2 = X 2 i X 1
from the corresponding real variable case. That is,
( A 1 + i A 2 ) u ˜ = u ˜ A = 2 ( X 1 + i X 2 ) = 2 X ˜ .
Now, we consider a slightly more general result. Consider u ˜ = A * Σ 11 X ˜ , Σ 11 = Σ 11 * > O . Let A c = ( A 1 i A 2 ) and A * = ( A 1 i A 2 ) . Then, from ( 2 ) note that u ˜ A c = O and u ˜ A * = O ; u ˜ X ˜ = O ; u ˜ X ˜ c = 2 A and u ˜ X ˜ * = 2 A , where A c means the complex conjugate of A.
Theorem 2.
For X ˜ and A as defined above, let u ˜ = A * Σ 11 X ˜ . Let Σ 11 be a p × p matrix, free of the elements in A and X ˜ . Then,
u ˜ A = 2 Σ 11 X ˜ .
Proof. 
Since Σ 11 and X ˜ are free of the elements in A, we may take Σ 11 X ˜ = B = B 1 + i B 2 , i = ( 1 ) , B 1 , B 2 real. Then, from Theorem 1 the result follows. □
Now, consider a p × 1 vector A and a p × p matrix Σ and the Hermitian form u ˜ = A * Σ A in the complex domain where Σ = Σ * > O is free of the elements in A. Then, we have the following result:
Theorem 3.
For the Hermitian form as defined above, where A = A 1 + i A 2 , Σ = Σ 1 + i Σ 2 , Σ = Σ * > O , A 1 , A 2 , Σ 1 , Σ 2 are real,
A [ u ˜ ] = A [ A * Σ A ] = 2 Σ A .
Proof. 
Opening up u ˜ , we have the following:
u ˜ = A * Σ A = ( A 1 i A 2 ) Σ ( A 1 + i A 2 ) = A 1 Σ A 1 + A 2 Σ A 2 + i ( A 1 Σ A 2 A 2 Σ A 1 ) .
Since Σ = Σ * , we have Σ 1 = Σ 1 (real symmetric), Σ 2 = Σ 2 (real skew symmetric). A 1 Σ A 1 = A 1 Σ 1 A 1 + i A 1 Σ 2 A 1 = A 1 Σ 1 A 1 because A 1 Σ 2 A 1 = 0 due to Σ 2 being real skew symmetric. Similarly, A 2 Σ A 2 = A 2 Σ 1 A 2 . Then, from the results in the real case, we have the following:
A 1 [ A 1 Σ 1 A 1 ] = 2 Σ 1 A 1 , A 2 [ A 2 Σ 1 A 2 ] = 2 Σ 1 A 2 .
Consider
i ( A 1 Σ A 2 A 2 Σ A 1 ) = i [ ( A 1 Σ 1 A 2 A 2 Σ 1 A 1 ) + i ( A 1 Σ 2 A 2 A 2 Σ 2 A 1 ) ] .
But, Σ 1 = Σ 1 and ( A 2 Σ 1 A 1 ) = A 1 Σ 1 A 2 and real 1 × 1 , and hence, A 1 Σ 1 A 2 A 2 Σ 1 A 1 = 0 . Also, A 2 Σ 2 A 1 = A 1 Σ 2 A 2 because Σ 2 = Σ 2 , and hence, A 1 Σ 2 A 2 A 2 Σ 2 A 1 = 2 A 1 Σ 2 A 2 = 2 A 2 Σ 2 A 1 . Therefore,
i ( A 1 Σ A 2 A 2 Σ A 1 ) = i 2 2 A 1 Σ 2 A 2 = i 2 2 A 2 Σ 2 A 1 .
Then, from the real case,
A 1 i 2 2 A 1 Σ 2 A 2 = 2 i 2 Σ 2 A 2 = 2 i Σ 2 i A 2 A 1 A * Σ A = 2 Σ 1 A 1 + 2 i Σ 2 i A 2 A 2 2 i 2 A 2 Σ 2 A 1 = 2 i 2 Σ 2 A 1 = 2 Σ 2 A 1 A 2 A * Σ A = 2 Σ 1 A 2 + 2 Σ 2 A 1
from the corresponding real case. Then, from (2)–(5),
A [ A * Σ A ] = ( A 1 + i A 2 ) [ A * Σ A ] = [ 2 Σ 1 A 1 + 2 ( i Σ 2 ) ( i A 2 ) ] + i [ 2 Σ 1 A 2 + 2 Σ 2 A 1 ] = 2 Σ 1 A 1 + i 2 Σ 1 A 2 + 2 i Σ 2 ( A 1 + i A 2 ) = 2 Σ 1 A + 2 i Σ 2 A = 2 Σ A .
This establishes the result. □

2.1. Optimization of a Linear Form Subject to Hermitian-Form Constraint

Consider a linear form u ˜ = A * X ˜ , where A and X ˜ are p × 1 non-null vectors, with A being arbitrary, and X ˜ being a known vector variable with variance Var ( u ˜ ) = A * Σ 11 A , where Σ 11 = Σ 11 * > O is the covariance matrix in X ˜ . Consider the optimization of u ˜ , subject to the constraint that the variance of u ˜ is fixed, say unity, that is, A * Σ 11 A = 1 . Then, we have the following result:
Theorem 4.
For the linear form u ˜ = A * X ˜ and the constraint A * Σ 11 A = 1 , as defined above,
max A * Σ 11 A = 1 [ A * X ˜ ] = X ˜ * Σ 11 1 X ˜ .
Proof. 
Let w ˜ = A * X ˜ λ ( A * Σ 11 A 1 ) , where λ is a Lagrangian multiplier. Observe that Σ 11 = Σ 11 * and we assume that it is Hermitian positive definite. From previous results (1) and (2),
w ˜ A = 2 X ˜ λ 2 Σ 11 A = O X ˜ = λ Σ 11 A λ = A * X ˜ .
That is,
A = 1 λ Σ 11 1 X ˜ A * = 1 λ c X ˜ * Σ 11 1 λ = A * X ˜ = 1 λ c X ˜ * Σ 11 1 X ˜ .
But from ( 9 ) , λ = A * X ˜ which means that the maximum of our linear form is the largest λ and the minimum of our linear form is the smallest λ . But, from ( 9 ) and ( 10 ) , λ λ c = | λ | 2 = X ˜ * Σ 11 1 X ˜ , where | λ | means the absolute value of λ and λ c is the complex conjugate of λ .
max A * Σ 11 A = 1 [ u ˜ ] = X ˜ * Σ 11 1 X ˜ , X ˜ * Σ 11 1 X ˜ > 0
being positive definite Hermitian form. This completes the proof. □

2.2. Optimization of a Hermitian Form Subject to Linear Constraint

Now, consider a problem of optimizing A * Σ 11 A subject to A * X ˜ = α fixed, where A and X ˜ are p × 1 vectors and Σ 11 = Σ 11 * > O is a p × p Hermitian positive definite matrix, free of the elements of A. Then, we have the following result:
Theorem 5.
For the Hermitian form A * Σ 11 A and linear form constraint A * X ˜ = α , we have
max A * X ˜ = α [ A * Σ 11 A ] = + , min A * X ˜ = α [ A * Σ 11 A ] = | α | 2 X ˜ * Σ 11 1 X ˜ .
Proof. 
Let λ be a Lagrangian multiplier and consider w ˜ = A * Σ 11 A λ ( A * X ˜ α ) . From (1) and (2), we have
w ˜ A = 2 Σ 11 A 2 λ X ˜ Σ 11 A = λ X ˜ A * Σ 11 A = λ A * X ˜ = λ α .
From ( 12 ) ,
A = λ Σ 11 1 X ˜ α = A * X ˜ = λ c X ˜ * Σ 11 1 X ˜ | λ | = | α | X ˜ * Σ 11 1 X ˜
because X ˜ * Σ 11 1 X ˜ is Hermitian positive definite, where | λ | and | α | denote the absolute values of λ and α , respectively. Then, max [ X ˜ * A X ˜ ] = + since A is arbitrary and since the linear restriction cannot eliminate the effect of A fully. Hence, we look for the minimum. From ( 13 ) ,
| λ | = | α | X ˜ * Σ 11 1 X ˜ | λ α | = | α | 2 X ˜ * Σ 11 1 X ˜ .
Hence,
min A * X ˜ = 1 [ A * Σ 11 A ] = | α | 2 X ˜ * Σ 11 1 X ˜ .
This completes the proof. □

2.3. Differentiation of Traces of Matrix Products and Matrix Differential Operator

Now, we consider a few matrix-variate cases, where also the problem is essentially optimization involving vector variable situations. Let A = ( a j k ) be a p × q matrix of arbitrary elements a j k . Let X ˜ = ( x ˜ j k ) be a p × q matrix in the complex domain with distinct scalar complex variables x ˜ j k as elements. Consider u ˜ = tr ( A * X ˜ ) . Let A j and X ˜ j be the j-th columns of A and X ˜ , respectively. Then, u ˜ = j = 1 q A j * X ˜ j . Let A j = A j 1 + i A j 2 , i = ( 1 ) , A j 1 , A j 2 are real p × 1 vectors. Let u ˜ j = A j * X ˜ j . Consider the operator
A j = ( A j 1 + i A j 2 ) , A j * = ( A j 1 i A j 2 ) , A j c = ( A j 1 i A j 2 )
and then, u ˜ j A j = u ˜ A j since we are taking partial derivatives. Then, we have the following result:
Theorem 6.
Let u ˜ , A , A j , X ˜ j be as defined above. Then,
u ˜ A = A [ tr ( A * X ˜ ) ] = 2 X ˜ .
Proof. 
From the previous result,
u ˜ j A j = ( A j 1 + i A j 2 ) u ˜ j = ( A j 1 + i A j 2 ) ( A j * X ˜ j ) = 2 X ˜ j , j = 1 , . . . , q ,
from Theorem 1. But, u ˜ j A j = u ˜ A j . Now, stack up these side by side as q columns for j = 1 , . . . , q . Note that A = [ A 1 , . . . , A q ] , A = [ A 1 , . . . , A q ] , and u ˜ A j = 2 X ˜ j , and hence, u ˜ A = 2 X ˜ . This completes the proof. □
For the next result to be established, we need a result on matrices, which will be stated here as a lemma.
Lemma 1.
For two p × p real matrices A and B, where A = A (symmetric) and B = B (skew symmetric), tr ( A B ) = 0 .
Proof. 
For two p × p matrices A 1 and B 1 , it is well known that tr ( A 1 ) = tr ( A 1 ) and tr ( A 1 B 1 ) = tr ( B 1 A 1 ) . Then, for our matrices A and B, A = A , B = B , we have tr ( A B ) = tr ( A B ) = tr ( B A ) = tr ( B A ) = tr ( A B ) = tr ( A B ) . That is, tr ( A B ) = tr ( A B ) tr ( A B ) = 0 tr ( A B ) = 0 , tr ( A B ) = 0 . □
For the next problem, let A = A * be a p × p Hermitian matrix of arbitrary elements a j k . Let X ˜ = X ˜ * be a Hermitian matrix in the complex domain with distinct complex scalar variables as elements except for the Hermitian property. Let A = A 1 + i A 2 , X ˜ = X 1 + i X 2 , where A 1 , A 2 , X 1 , X 2 are real. Then, A 1 = A 1 , X 1 = X 1 (symmetric), A 2 = A 2 , X 2 = X 2 (skew symmetric). Let u ˜ = tr ( A * X ˜ ) . Then, we have the following result:
Theorem 7.
Let A , X ˜ , A , A 1 , A 2 , X 1 , X 2 , and u ˜ = tr ( A * X ˜ ) be as defined above. Then,
u ˜ A = 2 X ˜ diag ( X ˜ )
where diag ( X ˜ ) means a diagonal matrix consisting of only the diagonal elements from X ˜ .
Proof. 
Opening up u ˜ , we have the following:
u ˜ = tr ( A * X ˜ ) = tr { ( A 1 i A 2 ) ( X 1 + i X 2 ) } = tr { A 1 X 1 + A 2 X 2 + i ( A 1 X 2 A 2 X 1 ) } = tr ( A 1 X 1 + A 2 X 2 )
since tr ( A 1 X 2 ) = 0 , tr ( A 2 X 1 ) = 0 by Lemma 1. Now, from the known result for symmetric matrices in the real case, we have
A 1 [ tr ( A 1 X 1 ) ] = 2 X 1 diag ( X 1 ) .
If the ( j , k ) -th element, for j k , in A 2 is + a j k , then the ( j , k ) -th element in A 2 is a j k and the ( k , j ) -th element in A 2 is a j k . Hence, tr ( A 2 X 2 ) = 2 j < k a j k x j k = j > k a j k x j k . Hence,
A 2 [ tr ( A 2 X 2 ) ] = 2 X 2
since the diagonal elements in A 2 and X 2 are already zeros. Combining ( 15 ) and ( 16 ) , we have
A [ tr ( A * X ˜ ) ] = 2 X ˜ diag ( X ˜ )
which corresponds to the result in the real case. □

2.4. Differentiation of a Determinant in the Complex Domain

Here, we will start with defining the derivative of a scalar quantity with respect to a matrix or we will define a matrix differential operator first. Let X = ( x j k ) be an m × n real matrix and let X ˜ = ( x ˜ j k ) be an m × n matrix in the complex domain. Then, we can always write X ˜ = X 1 + i X 2 , where i = ( 1 ) , X 1 , X 2 are real m × n matrices. Then, matrix differential operators X and X ˜ will be defined as the following:
X = x 11 . . . x 1 n . . . x m 1 . . . x m n   and   X ˜ = ( X 1 + i X 2 ) .
Consider a p × p nonsingular matrix X ˜ in the complex domain and let | X ˜ | be its determinant. Then, we have the following result:
Theorem 8.
For the p × p nonsingular matrix X ˜ in the complex domain,
X ˜ [ | X ˜ | ] = 2 | X ˜ c | ( X ˜ * ) 1 f o r   a   g e n e r a l   X ˜
and
X ˜ [ | X ˜ | ] = | X ˜ | [ 2 X ˜ 1 diag ( X ˜ 1 ) ]   f o r   X ˜ = X ˜ * .
Proof. 
The cofactor expansion of a determinant holds in the real and complex domains and it is the following:
| X ˜ | = x ˜ 11 C ˜ 11 + . . . + x ˜ 1 p C ˜ 1 p = x ˜ 21 C ˜ 21 + . . . + x ˜ 2 p C ˜ 2 p = x ˜ p 1 C ˜ p 1 + . . . + x ˜ p p C ˜ p p
where C ˜ j k is the cofactor of x ˜ j k for all j and k. Let C ˜ = ( C ˜ j k ) be the matrix of cofactors. For a general matrix, all x ˜ j k ’s are distinct and the corresponding C ˜ j k ’s are also distinct. Hence, in the real case, taking the partial derivative of the j-th line in ( 17 ) we have x j k [ | X | ] = C j k for all j and k. Hence, in the real case
X [ | X | ] = C = | X | ( X ) 1 = | X | ( X ) 1 .
This is a known result. But, in the complex case, the situation is different. Before we tackle the complex-domain situation, we will develop some necessary tools. It is a known result that X ˜ 1 = 1 | X ˜ | C ˜ . But when X ˜ = X ˜ * (Hermitian), the eigenvalues are real, and hence, the determinant is real. Then, X ˜ 1 is Hermitian, thereby C ˜ is also Hermitian. The following results will be helpful when we apply our matrix differential operators to scalar functions of matrices in the complex domain. For two scalar complex variables x ˜ and y ˜ the following can be easily verified.
x ˜ [ x ˜ y ˜ ] = 0 , x ˜ c [ x ˜ c y ˜ c ] = 0 x ˜ [ x ˜ c y ˜ ] = 2 y ˜ , x ˜ [ x ˜ c y ˜ c ] = 2 y ˜ c , x ˜ c [ x ˜ y ˜ ] = 2 y ˜ .
For convenience, let us consider the term
u ˜ j k = x ˜ j k c C ˜ j k c = ( x 1 j k i x 2 j k ) ( C 1 j k i C 2 j k ) = x 1 j k C 1 j k x 2 j k C 2 j k i ( x 1 j k C 2 j k + x 2 j k C 1 j k )
where x ˜ j k = x 1 j k + i x 2 j k , C ˜ j k = C 1 j k + i C 2 j k , with x 1 j k , x 2 j k , C 1 j k , C 2 j k being real scalar quantities. Note that x ˜ j k = x ˜ k j * when X ˜ is Hermitian. Hence, for example, when we differentiate with respect to x ˜ 21 it is equivalent to differentiating with respect to x ˜ 12 * and vice versa. When x ˜ 12 * C ˜ 12 * is differentiated with respect to x ˜ 12 it is equivalent to differentiating x ˜ 21 C ˜ 21 with respect to x ˜ 12 , and so on. Then,
x 1 j k [ u ˜ j k ] = C 1 j k i C 2 j k , x 2 j k [ u ˜ j k ] = C 2 j k i C 1 j k , x ˜ j k [ u ˜ j k ] = 2 C ˜ j k c .
This is the derivative of the complex conjugate of the ( j , k ) -th element in ( 18 ) for all j and k when X ˜ is a general matrix with distinct scalar complex variables as elements. Then,
X ˜ [ | X ˜ | ] = 0 , X ˜ [ | X ˜ c | ] = 2 C ˜ c = 2 | X ˜ c | [ X ˜ * ] 1 X ˜ c [ | X ˜ | ] = 2 C ˜ = 2 | X ˜ | ( X ˜ ) 1 .
When X ˜ = X ˜ * (Hermitian) we have | X ˜ | = | X ˜ * | = | X ˜ c | . Then, the result in ( 20 ) holds for the k-th term in the j-th line in ( 18 ) . But, when X ˜ is Hermitian, there is one more element contributing to x 1 j k and x 2 j k . This is the j-th term in the k-th line of ( 18 ) . Thus, the sum of the contributions coming from these two terms is the derivative of | X ˜ c | with respect to x ˜ j k . The sum of the contributions is the following, observing that x 1 j k = x 1 k j , C 1 j k = C 1 k j , x 2 j k = x 2 k j , C 2 j k = C 2 k j :
x 1 j k C 1 j k x 2 j k C 2 j k i ( x 1 j k C 2 j k + x 2 j k C 1 j k ) + x 1 k j C 1 k j x 2 k j C 2 k j i ( x 1 k j C 2 k j + x 2 k j C 1 k j ) = 2 x 1 j k C 1 j k 2 x 2 j k C 2 j k i [ x 1 j k ( C 2 j k + C 2 k j ) + C 1 j k ( x 2 j k + x 2 k j ) ] = 2 x 1 j k C 1 j k 2 x 2 j k C 2 j k   since   C 2 j k + C 2 k j = 0 , x 2 j k + x 2 k j = 0 .
Now,
x ˜ j k [ x ˜ j k c C ˜ j k c + x ˜ k j c C ˜ k j c ] = 2 C ˜ j k c = 2 C ˜ k j
for all j k . When j = k , the diagonal elements in X ˜ and C ˜ are real, and hence, the term occurs only once, and therefore,
x ˜ j j [ x ˜ j j c C ˜ j j c ] = C ˜ j j c = C j j .
Since we are taking the partial derivative, the derivatives are the same as differentiation of | X ˜ c | . Hence,
X ˜ [ | X ˜ c | ] = 2 C ˜ c diag ( C ˜ c ) = | X ˜ * | [ 2 ( X ˜ * ) 1 diag ( ( X ˜ * ) 1 ) ] = | X ˜ | [ 2 X ˜ 1 diag ( X ˜ 1 ) ]
and then,
X ˜ [ | X ˜ | ] = | X ˜ | [ 2 X ˜ 1 diag ( X ˜ 1 ) ] .
Here, we have used two properties that for a p × p matrix B, | B | = | B | and ( B c ) = B * = B if B is Hermitian. Then, we have the following theorem: □
Theorem 9.
When the p × p nonsingular matrix X ˜ in the complex domain is Hermitian, that is, X ˜ = X ˜ * , then,
X ˜ [ | X ˜ | ] = | X ˜ | [ 2 X ˜ 1 diag ( X ˜ 1 ) ] .
In the following sections, we will consider some applications of the results obtained in Section 2.

3. Principal Component Analysis in the Complex Domain

This is an application of the mathematical problem of optimization of a Hermitian form under a Hermitian-form constraint.
In many physical situations, variables occur in pairs, such as time and phase, and hence, the most appropriate representation of such variables is through complex variables because a scalar complex variable can be taken as a pair of real variables. Let x ˜ = x 1 + i x 2 , i = ( 1 ) , x 1 , x 2 are real scalar variables, be a scalar complex variable. Then, a statistical density associated with x ˜ is a real-valued scalar function f ( x ˜ ) of x ˜ such that f ( x ˜ ) 0 over the entire complex plane, something like a hill on the complex plane, so that the total volume under the surface is unity, that is, x ˜ f ( x ˜ ) d x ˜ = 1 , where d x ˜ = d x 1 d x 2 , the wedge product of the differentials d x 1 and d x 2 , respectively. Then, the center of gravity of the hill f ( x ˜ ) is at E [ x ˜ ] = x ˜ x ˜ f ( x ˜ ) d x ˜ and the square of the measure of scatter in x ˜ is σ 2 = E [ ( x ˜ E ( x ˜ ) ) ( x ˜ E ( x ˜ ) ) * ] , where E [ ( · ) ] means the expected value when x ˜ is a scalar complex random variable and f ( x ˜ ) is its density.
If the scatter is small, then the variable x ˜ is concentrated near the center of gravity E [ x ˜ ] . If σ 2 is large, then x ˜ is spread thin far and wide, and hence, it is more or less unrecognizable. If a large number of scalar variables are being considered as possible variables to be included into a model, then the ones with the larger scatter are the important variables to be included into the model. For convenience, one can consider linear functions of such variables because linear functions also contain individual variables. A linear function is of the form u ˜ = a 1 * x ˜ 1 + . . . + a p * x ˜ k , where a 1 * , . . . , a p * are constants with x ˜ 1 , . . . , x ˜ p being scalar complex variables. For example, a 2 * = 0 = . . . = a p * = 0 gives x ˜ 1 , where a * indicates the conjugate transpose and for a scalar quantity y, y ˜ * = y ˜ c only, where c in the exponent indicates the complex conjugate. We may write the linear function as u ˜ = A * X ˜ , where
A = a 1 a p , A = [ a 1 , . . . , a p ]   and   X ˜ = x ˜ 1 x ˜ p , X ˜ = [ x ˜ 1 . . . . , x ˜ p ]
where a prime indicates the transpose. The expected value of A * X ˜ is E [ u ˜ ] = A * E [ X ˜ ] and the variance–covariance matrix or the covariance matrix in X ˜ is denoted by Σ > O (Hermitian positive definite). Then, the variance of u ˜ , denoted by Var ( u ˜ ) , is given by Var ( u ˜ ) = E [ ( A * ( X ˜ E ( X ˜ ) ) ( X ˜ E ( X ˜ ) ) * A ] = A * Σ A . The most important linear function is that linear function having the maximum variance. Hence, we may compute max A [ A * Σ A ] . But, since Σ > O we have a positive definite Hermitian form in A * Σ A and its maximum over A is at + . Thus, unrestricted maximization does not make sense. Without loss of generality we may take A * A = 1 because this can always be achieved for any non-null A. Then, the maximization amounts to maximizing A * Σ A within the unit sphere A * A = 1 . There will be a maximum and a minimum in this case. We may incorporate the restriction by using a Lagrangian multiplier. Consider
w ˜ = A * Σ A λ ( A * A 1 )
where λ is a Lagrangian multiplier. Maximization will be achieved by using the following result, which will be stated as a lemma.
Lemma 2.
Let Y ˜ = Y 1 + i Y 2 , i = ( 1 ) , Y 1 , Y 2 are p × 1 vectors with distinct real scalar variables as elements. Let B = B * be a p × p constant Hermitian matrix. Let the partial differential operator Y be defined as
Y ˜ = Y 1 + i Y 2 .
Then,
Y ˜ [ Y ˜ * B Y ˜ ] = 2 B Y ˜ , B = B * > O , Y ˜ [ Y ˜ * Y ˜ ] = 2 Y ˜ .
Proof. 
Let B = B * = B 1 + i B 2 . When B is Hermitian, B 1 = B 1 , B 2 = B 2 , that is, B 1 is real symmetric and B 2 is real skew symmetric. Let Y ˜ = Y 1 + i Y 2 , i = ( 1 ) , Y 1 , Y 2 are real. Then, Y ˜ * B Y ˜ = ( Y 1 i Y 2 ) B ( Y 1 + i Y 2 ) = Y 1 B Y 1 + Y 2 B Y 2 + i ( Y 1 B Y 2 Y 2 B Y 1 ) , where Y 1 B Y 1 = Y 1 B 1 Y 1 + i Y 1 B 2 Y 1 = Y 1 B 1 Y 1 because Y 1 B 2 Y 1 = 0 due to B 2 = B 2 . From the real case it is well known that Y 1 [ Y 1 B 1 Y 1 ] = 2 B 1 Y 1 . Similarly, Y 2 [ Y 2 B Y 2 ] = 2 B 1 Y 2 . Now,
i Y 1 [ Y 1 B Y 2 Y 2 B Y 1 ] = i Y 1 [ Y 1 B 1 Y 2 Y 2 B 1 Y 1 + i ( Y 1 B 2 Y 2 Y 2 B 2 Y 1 ) ] = 0 + i 2 Y 1 [ Y 1 B 2 Y 2 + Y 1 B 2 Y 2 ] = i 2 2 B 2 Y 2 = 2 B 2 Y 2 .
Similarly,
Y 2 [ Y 1 B Y 2 Y 2 B Y 1 ] = 2 i 2 B 2 Y 1 = 2 B 2 Y 1
where we have used two properties. When B 1 = B 1 , that is, it is real symmetric, we have Y 1 B 1 Y 2 = Y 2 B 1 Y 1 because both are 1 × 1 real, and hence, each is equal to its transpose, and therefore, the difference is zero. When B 2 = B 2 we have Y 2 B 2 Y 1 = Y 1 B 2 Y 2 , because both are 1 × 1 real, and hence, each is equal to its transpose. Then, from the real operator operating on a real linear form the result follows. Similar results hold when differentiating with respect to the operator Y 2 . Thus, we have the following:
Y 1 [ Y ˜ * B Y ˜ ] = 2 B 1 Y 1 2 B 2 Y 2 = 2 B 1 Y 1 + i 2 2 B 2 Y 2   and Y 2 [ Y ˜ * B Y ˜ ] = 2 B 1 Y 2 + 2 B 2 Y 1 .
Therefore,
( Y 1 + i Y 2 ) [ Y ˜ * B Y ˜ ] = 2 B 1 ( Y 1 + i Y 2 ) + 2 i B 2 ( Y 1 + i Y 2 ) = 2 B 1 Y ˜ + 2 i B 2 Y ˜ = 2 B Y ˜ .
Hence, the result. For B = I , the identity matrix, the result on Y ˜ * Y ˜ follows, that is, Y ˜ ( Y ˜ * Y ˜ ) = 2 Y ˜ . Now, by using Lemma 1 we can differentiate w ˜ in ( 22 ) . That is,
A w ˜ = O , λ = 0 Σ A λ A = O ( Σ λ I ) A = O | Σ λ I | = 0
where | ( · ) | means the determinant of the square matrix ( · ) . From ( 23 ) , Σ A = λ A and pre-multiplying by A * and using the fact that A * A = 1 , we have A * Σ A = λ . Hence,
max A * A = 1 [ A * Σ A ] = λ 1   and   min A * A = 1 [ A * Σ A ] = λ p
where λ 1 is the largest eigenvalue of Σ = Σ * > O and λ p is the smallest eigenvalue of Σ . When Σ is Hermitian, all its eigenvalues are real and when it is Hermitian positive definite, all its eigenvalues are real positive also. Hence, the procedure is the following: Take the largest eigenvalue of Σ , say λ 1 . Then, through (23), compute an eigenvector corresponding to λ 1 , that is, solve Σ A = λ 1 A for an A. Then, normalize this eigenvector through A 1 * A 1 = 1 . That is, if an eigenvector corresponding to λ 1 is α 1 , then compute A 1 = 1 α 1 * α 1 α 1 . This A 1 is the normalized eigenvector corresponding to λ 1 . Now, consider u ˜ 1 = A 1 * X ˜ . This u ˜ 1 is the first principal component in the sense of the linear function having the maximum variance. Now, take the second largest eigenvalue λ 2 . Go through the same procedure and construct the normalized eigenvector A 2 corresponding to λ 2 . Then, u ˜ 2 = A 2 * X ˜ is the second principal component. Continue the process and stop the process when the variance of u ˜ j , namely, λ j , falls below a preassigned number. If there is no preassigned number, then u ˜ 1 , . . . , u ˜ p will be the p principal components. Here, we have assumed that the eigenvalues of Σ are distinct. When the eigenvalues are distinct, we can show that the eigenvectors corresponding to the distinct eigenvalues of a symmetric or Hermitian matrix are orthogonal to each other. Hence, our principal components will be orthogonal to each other in the sense that the joint dispersion in the pair ( u ˜ i , u ˜ j ) is 0 for i j or the covariance between u ˜ i and u ˜ j is zero when i j . The covariance is defined as the following: Let U ˜ be a p × 1 vector in the complex domain and let V ˜ be a q × 1 vector in the complex domain. Then, the covariance of U ˜ on V ˜ is defined and denoted as Cov ( U ˜ , V ˜ ) = E [ ( U ˜ E ( U ˜ ) ) ( V ˜ E ( V ˜ ) ) * ] , whenever this expected value exists, so that when V ˜ = U ˜ , then Cov ( U ˜ , V ˜ ) = Cov ( U ˜ ) , the covariance matrix in U ˜ , and when p = 1 , it is the variance of the scalar complex variable u ˜ . □
When the covariance matrix Σ is unknown, then we may construct sample principal components. Let our population be the p × 1 vector X ˜ , X ˜ = [ x ˜ 1 , . . . , x ˜ p ] , where x ˜ j , j = 1 , . . . , p are distinct scalar complex variables. Consider n independently and identically distributed (iid) such p-vectors. Then, we have a simple random sample of size n from X ˜ . Then, the sample matrix is the p × n , n > p matrix, denoted as the following:
X ˜ = [ X ˜ 1 , . . . , X ˜ n ] = x ˜ 11 x ˜ 12 . . . x ˜ 1 n x ˜ 21 x ˜ 22 . . . x ˜ 2 n . . . x ˜ p 1 x ˜ p 2 . . . x ˜ p n .
Let the sample average be denoted by X ˜ ¯ = 1 n [ X ˜ 1 + . . . + X ˜ n ] and the matrix of sample averages be denoted by the bold letter X ˜ ¯ = [ X ˜ ¯ , . . . , X ˜ ¯ ] . Then, the sample sum of products matrix S ˜ is given by
S ˜ = [ X ˜ X ˜ ¯ ] [ X ˜ X ˜ ¯ ] * = ( s ˜ j k )
where s ˜ j k = r = 1 n ( x ˜ j r x ˜ ¯ j ) ( x ˜ r k x ˜ ¯ k ) * . The motivation in using the sample sum of products matrix S ˜ is that 1 n 1 S ˜ is an unbiased estimator of Σ . Since we will be normalizing the eigenvectors, operate with S ˜ itself. Compute the eigenvalues of S ˜ . Take the largest eigenvalue of S ˜ . Call it m 1 . Construct an eigenvector corresponding to m 1 and normalize it through M 1 * M 1 = 1 , where M 1 is the normalized eigenvector corresponding to m 1 . Then, v ˜ 1 = M 1 * X ˜ is the first sample principal component. When the columns of the sample matrix are not linearly related then we have S ˜ = S ˜ * > O (Hermitian positive definite) and all eigenvalues m 1 , . . . , m p will be positive. We assume that the eigenvalues are distinct m 1 > m 2 > . . . > m p . This will be true almost surely. Now, take m 2 and construct M 2 and the second principal component v ˜ 2 = M 2 * X ˜ and continue the process. We can show that the covariances between v ˜ j and v ˜ k will be zeros for all j k . This property follows from the fact that when the matrix is symmetric or Hermitian, the eigenvectors corresponding to distinct eigenvalues are orthogonal. When the population X ˜ is p-variate complex Gaussian, then we can show that S ˜ will be a complex Wishart distributed with degrees of freedom n 1 and parameter matrix Σ . The distributions of the largest, smallest, and j-th largest eigenvalues and the corresponding eigenvectors of S ˜ in the complex domain are given in [8].

4. Canonical Correlation Analysis in the Complex Domain

This is an application of the mathematical problem of optimization of a bilinear form in the complex domain, under two Hermitian-form constraints. The following application is regarding the prediction of one set of variables by using another set of variables.
Consider two sets of scalar complex variables
S 1 = { x ˜ 1 , . . . , x ˜ p }   and   S 2 = { y ˜ 1 , . . . , y ˜ q }
where p need not be equal to q. Consider the appended vector
Z ˜ = X ˜ Y ˜ , X ˜ = x ˜ 1 x ˜ p , Y ˜ = y ˜ 1 y ˜ q , E [ X ˜ ] = μ x , E [ Y ˜ ] = μ y , Σ = E X ˜ μ x Y ˜ μ y [ X ˜ * μ x * , Y ˜ * μ y * ] = Cov ( Z ˜ ) = Σ 11 Σ 12 Σ 21 Σ 22
where Σ 11 = Cov ( X ˜ ) , Σ 22 = Cov ( Y ˜ ) , Σ 12 = Cov ( X ˜ , Y ˜ ) , and Σ 12 * = Σ 21 , and vice-versa. That is, Σ is the covariance matrix in Z ˜ , Σ 11 is the covariance matrix in X ˜ , Σ 22 is the covariance matrix in Y ˜ , and so on. Also, Σ = Σ * > O , Σ 11 = Σ 11 * > O , Σ 22 = Σ 22 * > O . Our aim is to predict the variables in the set S 1 by using the variables in the set S 2 and vice-versa, and obtain the “best” predictors; “best” in the sense of having the maximum joint dispersion. In order to represent each set S 1 and S 2 , we will take arbitrary linear functions of the variables in each set. Consider linear functions u ˜ = A * X ˜ and v ˜ = B * Y ˜ , where
A = a 1 a p , B = b 1 b q , u ˜ = A * X ˜ = a 1 * x ˜ 1 + . . . + a p * x ˜ p v ˜ = B * Y ˜ = b 1 * y ˜ 1 + . . . + b q * y ˜ q ,
where X ˜ and Y ˜ are listed above already. Since a j and b j are scalar constant quantities, a j * = a j c , j = 1 , . . . , p and b j * = b j c , j = 1 , . . . , q . Variances for linear functions are already seen in Section 2. Therefore, Var ( u ˜ ) = A * Σ 11 A , Var ( v ˜ ) = B * Σ 22 B and Cov ( u ˜ , v ˜ ) = A * Σ 12 B , Σ 21 = Σ 12 * , B * Σ 21 A = Cov ( v ˜ , u ˜ ) . Here, Σ 12 and Σ 21 can be taken as measures of joint dispersion or joint variation between X ˜ and Y ˜ and A * Σ 12 B = Cov ( u ˜ , v ˜ ) as the joint dispersion between u ˜ and v ˜ . As a criterion for the “best” predictor of u ˜ by using v ˜ and vice versa, we may take that the pair u ˜ and v ˜ have the maximum joint variation. The best predictor of u ˜ by using v ˜ as that pair having the maximum A * Σ 12 B and the best predictor of v ˜ by using u ˜ as that pair having the maximum value for B * Σ 21 A . Since covariances depend upon the units of measurements of the variables involved, we may take a scale-free covariance by taking
ρ = Cov ( u ˜ , v ˜ ) Var ( u ˜ ) Var ( v ˜ ) = A * Σ 12 B [ A * Σ 11 A ] [ B * Σ 22 B ] .
Further, as explained in Section 2, without loss of generality we may take A * Σ 11 A = 1 and B * Σ 22 B = 1 or confine the bilinear form (hyperboloid) within unit positive definite Hermitian forms (ellipsoids) in order to prevent them from going to + . Hence, our procedure simplifies to optimizing A * Σ 12 B subject to the conditions A * Σ 11 A = 1 , B * Σ 22 B = 1 and computing that pair of A and B which will maximize A * Σ 12 B . As before, we may use the Lagrangian multipliers λ 1 and λ 2 and consider the function
w ˜ = A * Σ 12 B λ 1 ( A * Σ 11 A 1 ) λ 2 ( B * Σ 22 B 1 ) .
In order to optimize this w ˜ we need one result on differentiation of a bilinear form, which will be stated as a lemma.
Lemma 3.
Let X ˜ = X 1 + i X 2 , i = ( 1 ) , X 1 , X 2 are real, p × 1 vectors of distinct real scalar variables x 1 j and x 2 j , respectively, where x ˜ j = x j 1 + i x j 2 , j = 1 , . . . , p . Let Y ˜ = Y 1 + i Y 2 , i = ( 1 ) , Y 1 , Y 2 are real, q × 1 vectors of distinct real scalar variables y j 1 and y j 2 , respectively, where y ˜ j = y j 1 + i y j 2 , j = 1 , . . . , q , where x j 1 , x j 2 , y j 1 , y j 2 are real. Let the partial differential operators be as defined in Section 2, namely,
X 1 = x 11 x p 1 , X 2 = x 12 x p 2 , X ˜ = ( X 1 + i X 2 )
and similar operators involving Y ˜ = Y 1 + i Y 2 . Then,
X ˜ [ X ˜ * A Y ˜ ] = 2 A Y ˜   a n d   Y ˜ [ Y ˜ * A * X ˜ ] = 2 A * X ˜ .
Proof. 
Opening up X ˜ * A Y ˜ we have the following:
X ˜ * A Y ˜ = ( X 1 i X 2 ) A ( Y 1 + i Y 2 ) = X 1 A Y 1 + X 2 A Y 2 + i ( X 1 A Y 2 X 2 A Y 1 ) .
Then, from the known results in the real case, we have the following:
X 1 [ X ˜ * A Y ˜ ] = A Y 1 + i A Y 2   and   X 2 [ X ˜ * A Y ˜ ] = A Y 2 i A Y 1
irrespective of whether A is real or in the complex domain. Then,
X ˜ [ X ˜ * A Y ˜ ] = ( X 1 + i X 2 ) [ X ˜ * A Y ˜ ] = 2 A Y ˜ .
Similarly, Y ˜ [ Y ˜ * A * X ˜ ] = 2 A * X ˜ . This completes the proof. □
Now, differentiating w ˜ in ( 25 ) , we have the following:
w ˜ A = O Σ 12 B λ 1 Σ 11 A = O
w ˜ * B = O Σ 21 A λ 2 Σ 22 B = O .
Now, premultiply ( 26 ) by A * and ( 27 ) by B * to obtain the following, observing that Σ 21 = Σ 12 * :
A * Σ 12 B = λ 1 , B * Σ 21 A = λ 2 , λ 2 = λ 1 c
or λ 1 = λ , λ 2 = λ c . Take B from ( 27 ) and substitute in ( 26 ) to obtain the following:
( Σ 11 1 Σ 12 Σ 22 1 Σ 21 λ λ c I ) A = O , λ λ c = | λ | 2 .
This shows that λ λ c = | λ | 2 = μ is an eigenvalue of Σ 11 1 Σ 12 Σ 22 1 Σ 21 , where the matrix is p × p . From symmetry, it follows that μ is also an eigenvalue of Σ 22 1 Σ 21 Σ 11 1 Σ 12 , where the matrix is q × q . Hence, all the nonzero μ ’s are common to both of these matrices. Hence, the procedure is the following: If p q , then compute the eigenvalues of Σ 11 1 Σ 12 Σ 22 1 Σ 21 , otherwise, compute the eigenvalues of the other matrix Σ 22 1 Σ 21 Σ 11 1 Σ 12 , both will give the same nonzero eigenvalues. Let μ 1 be the largest and μ r be the smallest nonzero eigenvalues. Then, we have the results
max A * Σ 11 A = 1 , B * Σ 22 B = 1 [ A * Σ 12 B ] = μ 1
and
min A * Σ 11 A = 1 , B * Σ 22 B = 1 [ A * Σ 12 B ] = μ r .
Then, the procedure is the following: If p q , then compute all the eigenvalues of Σ 11 1 Σ 12 Σ 22 1 Σ 21 . Let the largest eigenvalue be μ 1 . Then, compute one eigenvector corresponding to μ 1 . Use the equation ( Σ 12 Σ 22 1 Σ 21 μ 1 Σ 11 ) A = O . Let it be A 11 . Then, normalize it through A 11 * Σ 11 A 11 = 1 . That is, compute A 1 = 1 A 11 * Σ 11 A 11 A 11 . Then, compute u ˜ 1 = A 1 * X ˜ . Then, use the same eigenvalue μ 1 and compute one eigenvector from the equation ( Σ 21 Σ 11 1 Σ 12 μ 1 Σ 22 ) B = O . Let it be B 11 . Then, normalize it through B 11 * Σ 22 B 11 = 1 , that is, compute B 1 = 1 B 11 * Σ 22 B 11 B 11 . Now, compute v ˜ 1 = B 1 * Y ˜ . Then, ( u ˜ 1 , v ˜ 1 ) is the first pair of canonical variables in the sense u ˜ 1 is the best predictor of v ˜ 1 and vice-versa. Now, take the second largest eigenvalue μ 2 of Σ 11 1 Σ 12 Σ 22 1 Σ 21 . Then, compute one eigenvector corresponding to μ 2 , that is, solve the equation ( Σ 12 Σ 22 1 Σ 21 μ 2 Σ 11 ) A = O . Let A 21 be that eigenvector. Normalize it, that is, compute A 2 = 1 A 21 * Σ 11 A 21 A 21 . Now, compute u ˜ 2 = A 2 * X ˜ . Use the same μ 2 and solve for B from the equation ( Σ 21 Σ 11 1 Σ 12 μ 2 Σ 22 ) B = O . Let B 21 be one solution. Then, normalize it, that is, compute B 2 = 1 B 21 * Σ 22 B 21 B 21 . Now, compute v ˜ 2 = B 2 * Y ˜ . Then, ( u ˜ 2 , v ˜ 2 ) is the second pair of canonical variables. Continue the process until μ j falls below a preassigned limit. If there is no such preassigned limit, then compute all the pairs, that is, p if p q ; otherwise q. If q < p , then start with the computation of the eigenvalues of Σ 22 1 Σ 21 Σ 11 1 Σ 12 and proceed parallel to the steps used in the case p q . Observe that the symmetric format of Σ 11 1 Σ 12 Σ 22 1 Σ 21 is Σ 11 1 2 Σ 12 Σ 22 1 Σ 21 Σ 11 1 2 . This form is also available from the same starting Equation ( 28 ) . The symmetric format can always be written in the form C * C for some matrix C, and hence, the symmetric form is either Hermitian positive definite or Hermitian positive semi-definite, and therefore, all the nonzero eigenvalues are positive. Let us assume that the eigenvalues μ 1 , μ 2 , . . . are distinct, μ 1 > μ 2 > . . . > μ p if p q . It is a known result that eigenvectors corresponding to distinct eigenvalues of Hermitian or symmetric matrices are orthogonal to each other. Hence, u ˜ 1 , u ˜ 2 , . . . are non-correlated. Similarly, v ˜ 1 , v ˜ 2 , . . . are non-correlated.
If Σ 11 , Σ 12 , Σ 22 are not available, then take a simple random sample of size n, n > p + q , from Z ˜ = X ˜ Y ˜ . Then, compute the sample sum of products matrix:
S ˜ = S ˜ 11 S ˜ 12 S ˜ 21 S ˜ 22 ,
as for the case of principal component analysis. Now, continue with S ˜ j k ’s as with Σ j k ’s. Then, we will obtain the pairs of sample canonical variables. Some distributional aspects of sample canonical variables and the sample canonical correlation matrix U ˜ = S ˜ 11 1 2 S ˜ 12 S ˜ 22 1 S ˜ 21 S ˜ 11 1 2 are discussed in [8]. Note that when S 11 is 1 × 1 , then we have the square of the multiple correlation coefficient in U ˜ = u ˜ , which is scalar. In this case, our starting set S 1 will have only one complex scalar variable and the set S 2 will have q variables. Here, the problem is to predict one variable in S 1 by using the variables in S 2 . The exact distributions of the canonical correlation matrix U ˜ and the square of the absolute value of the multiple correlation coefficient u ˜ are available in explicit forms in [8].

5. Covariance and Correlation in the Complex Domain

Consider scalar complex variables first. Let x ˜ 1 and y ˜ 1 be two scalar complex variables or scalar variables in the complex domain. Then, the mean values or expected values of x ˜ 1 and y ˜ 1 are, respectively, E [ x ˜ 1 ] = x ˜ 1 x ˜ 1 f ( x ˜ 1 ) d x ˜ 1 , E [ y ˜ 1 ] = y ˜ 1 y ˜ 1 g ( y ˜ 1 ) d y ˜ 1 , where E [ · ] means the expected value of [ · ] , and f and g are the densities of x ˜ 1 and y ˜ 1 , respectively. Let x ˜ = x ˜ 1 E [ x ˜ 1 ] , y ˜ = y ˜ 1 E [ y ˜ 1 ] . Let x ˜ = x 11 + i x 12 , i = ( 1 ) , x 11 , x 12 are real scalar variables, and let y ˜ = y 11 + i y 12 , i = ( 1 ) , y 11 , y 12 are real scalar variables. Then, E [ x ˜ ] = 0 , E [ y ˜ ] = 0 . Then, the variance, denoted by Var ( · ) , and covariance, denoted by Cov ( · , · ) , are the following:
Var ( x ˜ 1 ) = Var ( x ˜ ) = E [ x ˜ x ˜ * ] = σ 11 Var ( y ˜ 1 ) = Var ( y ˜ ) = E [ y ˜ y ˜ * ] = σ 22 Cov ( x ˜ 1 , y ˜ 1 ) = Cov ( x ˜ , y ˜ ) = E [ x ˜ y ˜ * ] = σ 12
where, for example, x ˜ * = x ˜ c , with * indicating the conjugate transpose and c indicating the conjugate only. Here, we have only scalar variables, and hence, the complex conjugate transpose is only complex conjugate. For convenience, we have used the notation σ 11 , σ 22 , σ 12 . Then, σ 21 = E [ y ˜ x ˜ * ] = E [ y ˜ x ˜ c ] = Cov ( y ˜ , x ˜ ) . For a scalar complex variable x ˜ = x 11 + i x 12 , x ˜ * = x 11 i x 12 = x 11 i x 12 = x ˜ c . Let us examine the variances of sum and difference. Let
u ˜ = x ˜ 1 E [ x ˜ 1 ] V a r ( x 1 ˜ ) = x ˜ σ 11 , v ˜ = y ˜ 1 E [ y ˜ 1 ] Var ( y ˜ 1 ) = y ˜ σ 22 .
Then,
Var ( u ˜ + v ˜ ) = E [ ( u ˜ + v ˜ E ( u ˜ + v ˜ ) ) ( u ˜ + v ˜ E ( u ˜ + v ˜ ) ) * ] = E [ x ˜ x ˜ * σ 11 + y ˜ y ˜ * σ 22 + x ˜ y ˜ * + y ˜ x ˜ * σ 11 σ 22 ]
This can be simplified as the following:
Var ( u ˜ + v ˜ ) = 2 + 2 [ Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) ] 2 σ 11 σ 22 0
Var ( u ˜ v ˜ ) = E [ x ˜ x ˜ * ] σ 11 + E [ y ˜ y ˜ * ] σ 22 E [ x ˜ y ˜ * + y ˜ x ˜ * ] σ 11 σ 22 = 2 2 [ Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) ] 2 σ 11 σ 22 0
From ( 29 ) and ( 30 ) we have
1 Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) 2 σ 11 σ 22 1 .
Let us examine the quantity, Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) = E [ x ˜ y ˜ * ] + E [ y ˜ x ˜ * ] . Note that
E [ x ˜ y ˜ * ] = E [ ( x 11 + i x 12 ) ( y 11 i y 12 ) ] = E [ x 11 y 11 + x 12 y 12 + i ( x 12 y 11 x 11 y 12 ) ] E [ y ˜ x ˜ * ] = E [ x 11 y 11 + x 12 y 12 + i ( y 12 x 11 y 11 x 12 ) ] E [ x ˜ y ˜ * ] + E [ y ˜ x ˜ * ] = Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) = 2 E [ x 11 y 11 + x 12 y 12 ] = 2 E [ ( x ˜ y ˜ * ) ]
Hence,
1 2 Cov ( x ˜ , y ˜ ) + Cov ( y ˜ , x ˜ ) σ 11 σ 22 = E [ x 11 y 11 + x 12 y 12 σ 11 σ 22 ] = E [ ( x ˜ y ˜ * ) σ 11 σ 22 ] .
Therefore, we may define the correlation coefficient in the complex domain, denoted by r ˜ , as the following:
r ˜ = ( Cov ( x ˜ , y ˜ ) ) σ 11 σ 22 1 r ˜ 1
where ( · ) denotes the real part in ( · ) . Also, note that | x ˜ y ˜ * | 2 = | x ˜ | 2 | y ˜ | 2 . This will motivate us to examine the dot product of two vectors in the complex domain and the Cauchy–Schwarz inequality in the complex domain.

5.1. The Cauchy–Schwarz Inequality in the Complex Domain

Let X ˜ , X ˜ = [ x ˜ 1 , . . . , x ˜ p ] , and Y ˜ , Y ˜ = [ y ˜ 1 , . . . , y ˜ p ] be two p × 1 vectors in the complex domain. We will define the dot product between X ˜ and Y ˜ , and it will be denoted and defined as the following: X ˜ Y ˜ = X ˜ * Y ˜ and
| X ˜ Y ˜ | 2 = | X ˜ * Y ˜ | 2 = | x ˜ 1 c y ˜ 1 | 2 + . . . + | x ˜ p c y ˜ p | 2 = | x ˜ 1 | 2 | y ˜ 1 | 2 + . . . + | x ˜ p | 2 | y ˜ p | 2 = ( x 11 2 + x 12 2 ) ( y 11 2 + y 12 2 ) + . . . + ( x p 1 2 + x p 2 2 ) ( y p 1 2 + y p 2 2 ) ( x 11 2 + x 12 2 + . . . + x p 1 2 + x p 2 2 ) ( y 11 2 + y 12 2 + . . . + y p 1 2 + y p 2 2 ) | X ˜ Y ˜ | | X ˜ | | Y ˜ | .
Thus, the Cauchy–Schwarz inequality holds for the complex domain also.

5.2. Minimum-Variance Unbiased Estimators in the Complex Domain

In the class of linear estimators A * X ˜ for the parametric function g ( θ ) , which linear function is the minimum-variance unbiased estimator for g ( θ ) ? Let
A = a 1 a p , X ˜ = x ˜ 1 x ˜ p E [ A * X ˜ ] = A * E [ X ˜ ] = A * μ ˜ = g ( θ ) ,
and Var ( A * X ˜ ) = A * Σ A , E [ X ˜ ] = μ ˜ where Σ is the covariance matrix in X ˜ . Our aim here is to minimize A * Σ A subject to the constraint A * μ ˜ = g ( θ ) (given). Let λ be a Lagrangian multiplier and let w ˜ = A * Σ A λ ( A * μ ˜ g ( θ ) ) . Then,
w ˜ A = O 2 Σ A 2 λ μ ˜ = O Σ A = λ μ ˜ .
That is, A * Σ A = λ A * μ ˜ = λ g ( θ ) . Hence, g ( θ ) multiplied by the minimum value of λ gives the minimum of A * Σ A . From ( 34 ) ,
A = λ Σ 1 μ ˜ g ( θ ) = A * μ ˜ = λ c μ ˜ * Σ 1 μ ˜ λ c = g ( θ ) μ ˜ * Σ 1 μ ˜ .
Then,
λ g ( θ ) = g g * μ ˜ * Σ 1 μ ˜ = | g | 2 μ ˜ * Σ 1 μ ˜
is the minimum value of the variance of our linear estimator. Hence, the minimum-variance unbiased estimator is
T ( X ˜ ) = g μ ˜ * Σ 1 μ ˜ [ μ ˜ * Σ 1 X ˜ ] .
Note also that
g = A * μ ˜ = ( A * Σ 1 2 ) ( Σ 1 2 μ ˜ ) A * Σ A μ ˜ Σ 1 μ ˜ A * Σ A g μ ˜ * Σ 1 μ ˜ .
The first inequality follows from the inequality established in Section 5.1.

5.3. Cramer–Rao-Type Inequality in the Complex Domain

Let x ˜ 1 , . . . , x ˜ n be a simple random sample from the population designated by the density f ( x ˜ j ) , where x ˜ j , j = 1 , . . . , n are independently and identically distributed (iid). Then, the joint density L = j = 1 n f ( x ˜ j ) . Let T ( x ˜ 1 , . . . , x ˜ n ) , denoted as T ( X ˜ ) , be a statistic with the expected value g ( θ ) . That is, E [ T ] = X ˜ T L d X ˜ = g ( θ ) . Differentiating with respect to θ , we have
g ( θ ) θ = X ˜ T ( L θ ) d X ˜ = X ˜ T ( ln L θ ) L d X ˜ = E [ T ( ln L θ ) ]
where we have assumed that the support of X ˜ is free of θ and differentiation inside the integral is valid. But, from the total integral being one, we have
X ˜ L d X ˜ = 1 X ˜ ( ln L θ ) L d X ˜ = 0 E [ ln L θ ] = 0 .
From ( 35 ) and ( 36 ) , we have E [ T ( ln L θ ) ] = Cov ( T , ln L θ ) because for any two scalar random variables u and v real, or u ˜ and v ˜ in the complex domain, Cov ( u , v ) = E [ ( u E ( u ) ) ( v E ( v ) ) ] = E [ u ( v E ( v ) ) ] = E [ ( u E ( u ) ) v ] and the corresponding results in the complex domain also hold. Therefore, since E [ ln L θ ] = 0 , E [ T ( ln L θ ) ] = Cov ( T , ln L θ ) . Then,
| Cov ( T , ln L θ ) | Var ( T ) Var ( ln L θ ) Var ( T ˜ ) | g θ | 2 Var ( ln L θ ) .
Note that
Var ( ln L θ ) = E [ ( ln L θ ) 2 ] = n Var ( ln f θ ) = n E [ ( ln L θ ) 2 ] .
This shows that the Cramer–Rao-type inequality holds in the complex domain also.

5.4. Least Square Estimation in Linear Models in the Complex Domain

Let us examine whether the least square procedure, in the class of linear models, holds in the complex domain also. Let x ˜ 1 , . . . , x ˜ k be preassigned complex numbers or observations on k random variables in the complex domain. Let y ˜ be a scalar complex variable. Then, a linear model of x ˜ 1 , . . . , x ˜ k for predicting y ˜ , can be of the following form:
y ˜ = a 0 c + β * X ˜ , β = a 1 a k , X ˜ = x ˜ 1 x ˜ k .
But, in order to predict y ˜ by using linear predictors we must know the conditional distribution of y ˜ , given x ˜ 1 , . . . , x ˜ k , and also the conditional expectation must be linear in x ˜ 1 , . . . , x ˜ k . If the conditional distribution is not known, then we may use a distribution-free procedure. One such procedure is the estimation procedure by using the least square method. In this method, we set up a corresponding model of the following form for the j-th observation on y ˜ , namely, y ˜ j , for j = 1 , . . . , n , where n > k + 1 is the sample size.
y ˜ = a 0 + a 1 x ˜ 1 j + . . . + a k x ˜ k j + ϵ j , j = 1 , . . . , n
corresponding to the linear model in the real case, where ϵ j is the random part or the sum total contributions coming from unknown factors, corresponding to y ˜ j . Then, if we sum up the observations and divide by n we obtain the sample averages y ˜ ¯ , x ˜ ¯ r , r = 1 , . . . , k , where, for example, x ˜ ¯ r = 1 n j = 1 n x ˜ r j . Then, from ( 39 ) , we have
y ˜ ¯ = a 0 + a 1 x ˜ ¯ 1 + . . . + a k x ˜ ¯ k + 0 a 0 = y ˜ ¯ a 1 x ˜ ¯ 1 . . . a k x ˜ ¯ k c 0 c = y ˜ ¯ a 1 c x ˜ ¯ 1 . . . a k c x ˜ ¯ k .
We have taken the error sum as zero without much loss of generality. Since a 0 is available from ( 40 ) , we may rewrite ( 39 ) as follows:
y ˜ j y ˜ ¯ = a 1 ( x ˜ 1 j x ˜ ¯ 1 ) + . . . + a k ( x ˜ k j x ˜ ¯ k ) + ϵ j ϵ ¯ , j = 1 , . . . , n .
We may write all the equations in ( 41 ) together as U ˜ = Z ˜ β + e , where
U ˜ = y ˜ 1 y ˜ ¯ y ˜ n y ˜ ¯ , Z ˜ = x ˜ 11 x ˜ ¯ 1 . . . x ˜ k 1 x ˜ ¯ k . . . x ˜ 1 n x ˜ ¯ 1 . . . x ˜ k n x ˜ ¯ k , e = e ˜ 1 e ˜ n , β = a 1 a k .
Note that U ˜ is an n × 1 matrix, Z ˜ is an n × k matrix, β is a k × 1 matrix, and e ˜ is an n × 1 matrix. Then, the sum of squares of the absolute values of the errors is the following:
e ˜ * e ˜ = ( U ˜ Z ˜ β ) * ( U ˜ Z ˜ β ) .
In the least square procedure, we minimize this error sum of squares of the absolute values of the errors, and then, estimate the parameter vector β . Note that ( U ˜ Z ˜ β ) * = U ˜ * β * Z ˜ * and
e ˜ * e ˜ β = O O O 2 Z ˜ * U ˜ + 2 Z ˜ * Z ˜ β = O β = ( Z ˜ * Z ˜ ) 1 Z ˜ * U ˜
where we have assumed that Z ˜ * Z ˜ is a nonsingular matrix because x ˜ r j ’s are preassigned numbers, and hence, Z ˜ can be taken as a full-rank matrix with rank k < n . Then, the estimated β , as per the least square estimate, again denoted by β , is β = ( Z ˜ * Z ˜ ) 1 Z ˜ * U ˜ and the estimated model for y ˜ is a 0 c + β * X ˜ , X ˜ = [ x ˜ 1 , . . . , x ˜ k ] , β * = U ˜ * Z ˜ ( Z ˜ * Z ˜ ) 1 or the estimated y ˜ is
y ˜ = a 0 c + β * X ˜ = a 0 c + U ˜ * Z ˜ ( Z ˜ * Z ˜ ) 1 X ˜
where β is available from ( 43 ) and a 0 c = y ˜ ¯ β * X ˜ ¯ , X ˜ ¯ = [ x ˜ ¯ 1 , . . . , x ˜ ¯ k ] . This shows that the least square procedure in the complex domain also runs parallel to that in the real domain. If e ˜ is assumed to have an n-variate complex Gaussian distribution, then the inference problems also runs parallel to those in the real domain.

6. Concluding Remarks

In this paper, we have introduced vector/matrix differential operators in the complex domain. These differential operators in the complex domain are believed to be new. With the help of these operators, we have examined the optimization of a linear form with Hermitian-form constraint, optimization of a Hermitian form with linear form as well as Hermitian-form constraint, and optimization of a bilinear form with Hermitian-form constraints, where the linear forms and bilinear forms involve vectors and matrices in the complex domain. As applications of these optimization problems, we have extended principal component analysis and canonical correlation analysis to the complex domain. Also extended to the complex domain are the Cramer–Rao inequality, the Cauchy–Schwarz inequality, minimum-variance unbiased estimation, and least square analysis. If we use the general definition of a density f ( X ) as a real-valued scalar function such that f ( X ) 0 in the domain of X and X f ( X ) d X = 1 , where the argument X may be scalar or vector or matrix or a sequence of matrices in the real or complex domains [8], then the structures of the joint density, marginal density, conditional density, etc., will be parallel to those in the real domain. Then, we will be able to extend Bayesian analysis to the complex domain. One can also explore extending other multivariate statistical techniques such as factor analysis, classification problems, cluster analysis, analysis of variance, analysis of covariance, etc., to the complex domain. These are some of the open problems. Since the likelihood function L is a product of densities at the observed sample point, in the simple random sample case, this L will be a real-valued scalar function. Then, one can extend the maximum likelihood method of estimation to the complex domain. For example, in the p-variate complex Gaussian case, in the case of a simple random sample of size n, we have
ln L = c n p ln | Σ ˜ | tr ( Σ ˜ 1 S ˜ ) j = 1 n ( X ˜ j μ ˜ ) * Σ ˜ 1 ( X ˜ j μ ˜ )
where c is a constant, S ˜ is the sample sum of squares and cross-products matrix, μ ˜ is the population mean value, and Σ ˜ > O is the Hermitian positive definite covariance matrix. We have already established results on vector and matrix differential operators in the complex domain, operating on the trace and determinant involving a Hermitian positive definite matrix. Hence, all the terms in μ ˜ [ ln L ] = O and Σ ˜ [ ln L ] = O are defined and such equations are already solved in our discussion of our operators operating on traces and determinants. Thus, we will see that μ ˜ [ ln L ] = O yields the sample average as the estimate/estimator of μ ˜ , and Σ ˜ [ ln L ] = O , with the estimate on μ ˜ , yields the estimate/estimator of Σ ˜ as 1 n S ˜ . One can also examine the maximum likelihood estimation involving other scalar/vector/matrix-variate densities in the complex domain. The above are some of the open problems.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Deng, X. Texture Analysis and Physical Interpretation of Polarimetric SAR Data. Ph.D Thesis, Universitat Politechnica de Catalunya, Barcelona, Spain, 2016. [Google Scholar]
  2. Du, L.; Liu, H.; Wang, P.; Feng, B.; Pan, M.; Bao, Z. Noise robust radar HRRP target recognition based on multitask factor analysis with small training data size. IEEE Trans. Signal Process. 2012, 60, 3546–3560. [Google Scholar]
  3. Hellings, C.; Gogler, P.; Utschick, W. Composite real principal component analysis of complex signals. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 2216–2220. [Google Scholar]
  4. Horel, J.D. Complex principal component anslysis:theory and examples. J. Clim. Appl. Meteorol. 1984, 23, 21660–21673. [Google Scholar] [CrossRef]
  5. Katkovnik, V.; Ponomarenko, M.; Egiazarian, K. Complex-valued image denosing based on group-wise complex-domain sparsity. arXiv 2017, arXiv:1711.00362vl. [Google Scholar]
  6. Liu, J.; Xu, X.; Zhang, F.; Gao, Y.; Gao, W. Modeling of spatial distribution characteristics of high proportion renewable energy based on complex principal component analysis. In Proceedings of the 2020 IEEE Sustainable Power and Energy Conference (iSPEC), Chengdu, China, 23–25 November 2020; pp. 193–198. [Google Scholar]
  7. Morup, M.; Madsen, K.H.; Hansen, L.K. Shifted independent component analysis. In Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation: ICA2007, London, UK, 9–12 September 2007; pp. 89–96. [Google Scholar]
  8. Mathai, A.M.; Provost, S.B.; Haubold, H.J. Multivariate Statistical Analysis in the Real and Complex Domains; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mathai, A.M. Extensions of Some Statistical Concepts to the Complex Domain. Axioms 2024, 13, 422. https://doi.org/10.3390/axioms13070422

AMA Style

Mathai AM. Extensions of Some Statistical Concepts to the Complex Domain. Axioms. 2024; 13(7):422. https://doi.org/10.3390/axioms13070422

Chicago/Turabian Style

Mathai, Arak M. 2024. "Extensions of Some Statistical Concepts to the Complex Domain" Axioms 13, no. 7: 422. https://doi.org/10.3390/axioms13070422

APA Style

Mathai, A. M. (2024). Extensions of Some Statistical Concepts to the Complex Domain. Axioms, 13(7), 422. https://doi.org/10.3390/axioms13070422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop