Extensions of Some Statistical Concepts to the Complex Domain

Mathai, Arak M.

doi:10.3390/axioms13070422

Open AccessArticle

Extensions of Some Statistical Concepts to the Complex Domain

by

Arak M. Mathai

Mathematics and Statistics, McGill University, Montreal, QC H3A 2K6, Canada

Axioms 2024, 13(7), 422; https://doi.org/10.3390/axioms13070422

Submission received: 22 May 2024 / Revised: 14 June 2024 / Accepted: 20 June 2024 / Published: 22 June 2024

(This article belongs to the Special Issue New Perspectives in Mathematical Statistics)

Download Versions Notes

Abstract

:

This paper deals with the extension of principal component analysis, canonical correlation analysis, the Cramer–Rao inequality, and a few other statistical concepts in the real domain to the corresponding complex domain. Optimizations of Hermitian forms under a linear constraint, a bilinear form under Hermitian-form constraints, and similar maxima/minima problems in the complex domain are discussed. Some vector/matrix differential operators are developed to handle the above types of problems. These operators in the complex domain and the optimization problems in the complex domain are believed to be new and novel. These operators will also be useful in maximum likelihood estimation problems, which will be illustrated in the concluding remarks. Detailed steps are given in the derivations so that the methods are easily accessible to everyone.

Keywords:

principal component analysis; canonical correlation analysis; extensions to complex domain; linear forms; Hermitian forms; vector; matrix differential operators; Cramer–Rao inequality in the complex domain

MSC:

62H25; 62H20; 62J10; 15B52; 15B57

1. Introduction

In most textbooks on statistics, scalar/vector/matrix-variate random variables in the complex domain and the corresponding statistical analysis in the complex domain, are not discussed. But, in a large number of physical situations, it is natural or more convenient to represent variables in the complex domain. Hence, statistical techniques in the complex domain are required for data analysis in these situations. In the physical science and engineering literature, there are a number of papers dealing with random variables in the complex domain. Data reduction techniques such as principal component analysis in the complex domain seem to be the main topic in these areas. Most of the papers in these applied areas, concentrate on developing algorithms for computing eigenvalues and eigenvectors, which are useful and relevant in principal component analysis, independent component analysis, factor analysis, and so on. Statistical analysis in the complex domain is widely used in the analysis of multi-look return signals in radar [1], in multi-task learning in artificial intelligence and machine learning [2], in problems such as signal processing [3], in principal component analysis and independent component analysis in analyzing meteorological data in the complex domain [4], in optimal allocation of resources, especially energy resources [5], in holography, microscopy and optical metrology [6], in delayed mixing in speech processing, in biomedical signal analysis, and in financial data modeling, etc. [7].

In the present paper, vector/matrix differential operators in the complex domain are defined. Then, these are applied in optimizing a Hermitian form under a linear constraint, a bilinear form under Hermitian-form constraints, etc. Then, as applications of these optimization problems, the real-domain techniques of principal component analysis and canonical correlation analysis are extended to the complex domain. Other statistical concepts such as the Cramer–Rao inequality, least square procedure, and related aspects are extended to the complex domain. Detailed derivations are given so that the methods will be accessible even to beginners.

The following notation is used in this paper: Scalar variables, whether mathematical or random, are denoted by lower-case letters such as

x, y

. Vector/matrix variables are denoted by capital letters such as

X, Y

. Scalar constants are denoted by

a, b

, etc., and vector, matrix constants by

A, B

, etc. The wedge product of the differentials

d x

and

d y

is defined as

d x \land d y = - d y \land d x

, where x and y are two real scalar variables, so that

d x \land d x = 0, d y \land d y = 0

. Let

X = (x_{j k})

be an

m \times n

matrix with distinct real scalar variables

x_{j k}

as elements, then

d X = \land_{j = 1}^{m} \land_{k = 1}^{n} d x_{j k}

. The transpose of a matrix X is denoted by a prime as

X^{'}

. For a

p \times p

matrix X, if

X = X^{'}

(symmetric), then

d X = \land_{j \geq k} d x_{j k} = \land_{j \leq k} d x_{j k}

. Variables in the complex domain are written with a tilde, such as

\tilde{x}, \tilde{y}, \tilde{X}, \tilde{Y}

. If

\tilde{X}

is a

p \times q

matrix in the complex domain, then

\tilde{X} = X_{1} + i X_{2}, i = \sqrt{(- 1)}, X_{1}, X_{2}

are real, then

d \tilde{X}

is defined as

d \tilde{X} = d X_{1} \land d X_{2}

. The determinant of a real

p \times p

matrix Y is written as

| Y |

or as

\det (Y)

and if

\tilde{Y}

is in the complex domain, then the absolute value of the determinant of

\tilde{Y}

is written as

| \det (Y) |

. Also,

tr (Y)

means the trace of the square matrix Y. A

p \times p

real matrix A being positive definite is written as

A > O

, and

\tilde{X} = {\tilde{X}}^{*} > O

indicates that the matrix

\tilde{X}

, in the complex domain, is Hermitian positive definite. Other notation is explained when it occurs for the first time.

This paper is organized as follows: Section 2 starts with defining a vector differential operator in the complex domain. Then, with the help of this operator, some optimization problems such as optimizing a linear form subject to Hermitian-form constraint, a Hermitian form under Hermitian-form constraint, and a bilinear form with Hermitian-form constraints are discussed. Then, a matrix differential operator in the complex domain is defined. Differentiations of the trace of a product of matrices and determinant of a matrix, by using the matrix differential operator in the complex domain, are dealt with. Section 3 deals with the extension of principal component analysis to the complex domain. Section 4 delves into the extension of canonical correlation analysis to the complex domain. Section 5 examines the extension of the Cramer–Rao inequality, Cauchy–Schwarz inequality, and the least square estimation procedure to the complex domain. Detailed steps are given in each situation.

2. Optimization Involving Linear Forms, Traces, and Determinants

Here, we will consider linear forms and their differentiations first, and then, we will consider some situations of optimizing a Hermitian form with linear constraint and optimization of a linear form with Hermitian-form constraint. We consider some basic results to start with. Let

A = [\begin{matrix} a_{1} \\ ⋮ \\ a_{p} \end{matrix}], \tilde{X} = [\begin{matrix} {\tilde{x}}_{1} \\ ⋮ \\ {\tilde{x}}_{p} \end{matrix}], A = A_{1} + i A_{2}, i = \sqrt{(- 1)}, \tilde{X} = X_{1} + i X_{2}

where

A_{1}, A_{2}, X_{1}, X_{2}

are real vectors. Assume that A is a non-null arbitrary coefficient vector and

\tilde{X}

is a known vector random variable in the complex domain.

Theorem 1.

For A and

\tilde{X}

as defined above,

\tilde{u} = A^{*} \tilde{X} \Rightarrow \frac{\partial \tilde{u}}{\partial A} = 2 \tilde{X} .

(1)

Proof.

This can be easily seen from the following:

\tilde{u} = (A_{1}^{'} - i A_{2}^{'}) (X_{1} + i X_{2}) = A_{1}^{'} X_{1} + A_{2}^{'} X_{2} + i (A_{1}^{'} X_{2} - A_{2}^{'} X_{1})

Then,

\frac{\partial \tilde{u}}{\partial A_{1}} = X_{1} + i X_{2} and \frac{\partial \tilde{u}}{\partial A_{2}} = X_{2} - i X_{1}

(2)

from the corresponding real variable case. That is,

(\frac{\partial}{\partial A_{1}} + i \frac{\partial}{\partial A_{2}}) \tilde{u} = \frac{\partial \tilde{u}}{\partial A} = 2 (X_{1} + i X_{2}) = 2 \tilde{X} .

□

Now, we consider a slightly more general result. Consider

\tilde{u} = A^{*} Σ_{11} \tilde{X}, Σ_{11} = Σ_{11}^{*} > O

. Let

\frac{\partial}{\partial A^{c}} = (\frac{\partial}{\partial A_{1}} - i \frac{\partial}{\partial A_{2}})

and

\frac{\partial}{\partial A^{*}} = (\frac{\partial}{\partial A_{1}^{'}} - i \frac{\partial}{\partial A_{2}^{'}})

. Then, from

(2)

note that

\frac{\partial \tilde{u}}{\partial A^{c}} = O

and

\frac{\partial \tilde{u}}{\partial A^{*}} = O;

\frac{\partial \tilde{u}}{\partial \tilde{X}} = O

;

\frac{\partial \tilde{u}}{\partial {\tilde{X}}^{c}} = 2 A

and

\frac{\partial \tilde{u}}{\partial {\tilde{X}}^{*}} = 2 A^{'}

, where

A^{c}

means the complex conjugate of A.

Theorem 2.

For

\tilde{X}

and A as defined above, let

\tilde{u} = A^{*} Σ_{11} \tilde{X}

. Let

Σ_{11}

be a

p \times p

matrix, free of the elements in A and

\tilde{X}

. Then,

\frac{\partial \tilde{u}}{\partial A} = 2 Σ_{11} \tilde{X} .

(3)

Proof.

Since

Σ_{11}

and

\tilde{X}

are free of the elements in A, we may take

Σ_{11} \tilde{X} = B = B_{1} + i B_{2}

,

i = \sqrt{(- 1)}, B_{1}, B_{2}

real. Then, from Theorem 1 the result follows. □

Now, consider a

p \times 1

vector A and a

p \times p

matrix

Σ

and the Hermitian form

\tilde{u} = A^{*} Σ A

in the complex domain where

Σ = Σ^{*} > O

is free of the elements in A. Then, we have the following result:

Theorem 3.

For the Hermitian form as defined above, where

A = A_{1} + i A_{2}, Σ = Σ_{1} + i Σ_{2},

Σ = Σ^{*} > O, A_{1}, A_{2}, Σ_{1}, Σ_{2}

are real,

\frac{\partial}{\partial A} [\tilde{u}] = \frac{\partial}{\partial A} [A^{*} Σ A] = 2 Σ A .

(4)

Proof.

Opening up

\tilde{u}

, we have the following:

\tilde{u} = A^{*} Σ A = (A_{1}^{'} - i A_{2}^{'}) Σ (A_{1} + i A_{2}) = A_{1}^{'} Σ A_{1} + A_{2}^{'} Σ A_{2} + i (A_{1}^{'} Σ A_{2} - A_{2}^{'} Σ A_{1}) .

(5)

Since

Σ = Σ^{*}

, we have

Σ_{1} = Σ_{1}^{'}

(real symmetric),

Σ_{2} = - Σ_{2}^{'}

(real skew symmetric).

A_{1}^{'} Σ A_{1} = A_{1}^{'} Σ_{1} A_{1} + i A_{1}^{'} Σ_{2} A_{1} = A_{1}^{'} Σ_{1} A_{1}

because

A_{1}^{'} Σ_{2} A_{1} = 0

due to

Σ_{2}

being real skew symmetric. Similarly,

A_{2}^{'} Σ A_{2} = A_{2}^{'} Σ_{1} A_{2}

. Then, from the results in the real case, we have the following:

\frac{\partial}{\partial A_{1}} [A_{1}^{'} Σ_{1} A_{1}] = 2 Σ_{1} A_{1}, \frac{\partial}{\partial A_{2}} [A_{2}^{'} Σ_{1} A_{2}] = 2 Σ_{1} A_{2} .

(6)

Consider

i (A_{1}^{'} Σ A_{2} - A_{2}^{'} Σ A_{1}) = i [(A_{1}^{'} Σ_{1} A_{2} - A_{2}^{'} Σ_{1} A_{1}) + i (A_{1}^{'} Σ_{2} A_{2} - A_{2}^{'} Σ_{2} A_{1})] .

But,

Σ_{1} = Σ_{1}^{'}

and

{(A_{2}^{'} Σ_{1} A_{1})}^{'} = A_{1}^{'} Σ_{1} A_{2}

and real

1 \times 1

, and hence,

A_{1}^{'} Σ_{1} A_{2} - A_{2}^{'} Σ_{1} A_{1} = 0

. Also,

A_{2}^{'} Σ_{2} A_{1} = - A_{1}^{'} Σ_{2} A_{2}

because

Σ_{2} = - Σ_{2}^{'}

, and hence,

A_{1}^{'} Σ_{2} A_{2} - A_{2}^{'} Σ_{2} A_{1} = 2 A_{1}^{'} Σ_{2} A_{2} = - 2 A_{2}^{'} Σ_{2} A_{1}

. Therefore,

i (A_{1}^{'} Σ A_{2} - A_{2}^{'} Σ A_{1}) = i^{2} 2 A_{1}^{'} Σ_{2} A_{2} = - i^{2} 2 A_{2}^{'} Σ_{2} A_{1} .

Then, from the real case,

\begin{matrix} \frac{\partial}{\partial A_{1}} [i^{2} 2 A_{1}^{'} Σ_{2} A_{2}] & = 2 i^{2} Σ_{2} A_{2} = 2 (i Σ_{2}) (i A_{2}) \Rightarrow \\ \frac{\partial}{\partial A_{1}} (A^{*} Σ A) & = 2 Σ_{1} A_{1} + 2 (i Σ_{2}) (i A_{2}) \\ \frac{\partial}{\partial A_{2}} (- 2 i^{2} A_{2}^{'} Σ_{2} A_{1}) & = - 2 i^{2} Σ_{2} A_{1} = 2 Σ_{2} A_{1} \Rightarrow \\ \frac{\partial}{\partial A_{2}} (A^{*} Σ A) & = 2 Σ_{1} A_{2} + 2 Σ_{2} A_{1} \end{matrix}

(7)

from the corresponding real case. Then, from (2)–(5),

\begin{matrix} \frac{\partial}{\partial A} [A^{*} Σ A] & = (\frac{\partial}{\partial A_{1}} + i \frac{\partial}{\partial A_{2}}) [A^{*} Σ A] \\ = [2 Σ_{1} A_{1} + 2 (i Σ_{2}) (i A_{2})] + i [2 Σ_{1} A_{2} + 2 Σ_{2} A_{1}] \\ = 2 Σ_{1} A_{1} + i 2 Σ_{1} A_{2} + 2 i Σ_{2} (A_{1} + i A_{2}) = 2 Σ_{1} A + 2 i Σ_{2} A = 2 Σ A . \end{matrix}

This establishes the result. □

2.1. Optimization of a Linear Form Subject to Hermitian-Form Constraint

Consider a linear form

\tilde{u} = A^{*} \tilde{X}

, where A and

\tilde{X}

are

p \times 1

non-null vectors, with A being arbitrary, and

\tilde{X}

being a known vector variable with variance

Var (\tilde{u}) = A^{*} Σ_{11} A

, where

Σ_{11} = Σ_{11}^{*} > O

is the covariance matrix in

\tilde{X}

. Consider the optimization of

\tilde{u}

, subject to the constraint that the variance of

\tilde{u}

is fixed, say unity, that is,

A^{*} Σ_{11} A = 1

. Then, we have the following result:

Theorem 4.

For the linear form

\tilde{u} = A^{*} \tilde{X}

and the constraint

A^{*} Σ_{11} A = 1

, as defined above,

max_{A^{*} Σ_{11} A = 1} [A^{*} \tilde{X}] = \sqrt{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}} .

(8)

Proof.

Let

\tilde{w} = A^{*} \tilde{X} - λ (A^{*} Σ_{11} A - 1)

, where

λ

is a Lagrangian multiplier. Observe that

Σ_{11} = Σ_{11}^{*}

and we assume that it is Hermitian positive definite. From previous results (1) and (2),

\frac{\partial \tilde{w}}{\partial A} = 2 \tilde{X} - λ 2 Σ_{11} A = O \Rightarrow \tilde{X} = λ Σ_{11} A \Rightarrow λ = A^{*} \tilde{X} .

(9)

That is,

A = \frac{1}{λ} Σ_{11}^{- 1} \tilde{X} \Rightarrow A^{*} = \frac{1}{λ^{c}} {\tilde{X}}^{*} Σ_{11}^{- 1} \Rightarrow λ = A^{*} \tilde{X} = \frac{1}{λ^{c}} {\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X} .

(10)

But from

(9)

,

λ = A^{*} \tilde{X}

which means that the maximum of our linear form is the largest

λ

and the minimum of our linear form is the smallest

λ

. But, from

(9)

and

(10)

,

λ λ^{c} = {| λ |}^{2} = {\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}

, where

| λ |

means the absolute value of

λ

and

λ^{c}

is the complex conjugate of

λ

.

max_{A^{*} Σ_{11} A = 1} [\tilde{u}] = \sqrt{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}}, {\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X} > 0

being positive definite Hermitian form. This completes the proof. □

2.2. Optimization of a Hermitian Form Subject to Linear Constraint

Now, consider a problem of optimizing

A^{*} Σ_{11} A

subject to

A^{*} \tilde{X} = α

fixed, where A and

\tilde{X}

are

p \times 1

vectors and

Σ_{11} = Σ_{11}^{*} > O

is a

p \times p

Hermitian positive definite matrix, free of the elements of A. Then, we have the following result:

Theorem 5.

For the Hermitian form

A^{*} Σ_{11} A

and linear form constraint

A^{*} \tilde{X} = α

, we have

max_{A^{*} \tilde{X} = α} [A^{*} Σ_{11} A] = + \infty, min_{A^{*} \tilde{X} = α} [A^{*} Σ_{11} A] = \frac{{| α |}^{2}}{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}} .

(11)

Proof.

Let

λ

be a Lagrangian multiplier and consider

\tilde{w} = A^{*} Σ_{11} A - λ (A^{*} \tilde{X} - α)

. From (1) and (2), we have

\frac{\partial \tilde{w}}{\partial A} = 2 Σ_{11} A - 2 λ \tilde{X} \Rightarrow Σ_{11} A = λ \tilde{X} \Rightarrow A^{*} Σ_{11} A = λ A^{*} \tilde{X} = λ α .

(12)

From

(12)

,

A = λ Σ_{11}^{- 1} \tilde{X} \Rightarrow α = A^{*} \tilde{X} = λ^{c} {\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X} \Rightarrow | λ | = \frac{| α |}{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}}

(13)

because

{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}

is Hermitian positive definite, where

| λ |

and

| α |

denote the absolute values of

λ

and

α

, respectively. Then,

max [{\tilde{X}}^{*} A \tilde{X}] = + \infty

since A is arbitrary and since the linear restriction cannot eliminate the effect of A fully. Hence, we look for the minimum. From

(13)

,

| λ | = \frac{| α |}{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}} \Rightarrow | λ α | = \frac{{| α |}^{2}}{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}} .

Hence,

min_{A^{*} \tilde{X} = 1} [A^{*} Σ_{11} A] = \frac{{| α |}^{2}}{{\tilde{X}}^{*} Σ_{11}^{- 1} \tilde{X}} .

This completes the proof. □

2.3. Differentiation of Traces of Matrix Products and Matrix Differential Operator

Now, we consider a few matrix-variate cases, where also the problem is essentially optimization involving vector variable situations. Let

A = (a_{j k})

be a

p \times q

matrix of arbitrary elements

a_{j k}

. Let

\tilde{X} = ({\tilde{x}}_{j k})

be a

p \times q

matrix in the complex domain with distinct scalar complex variables

{\tilde{x}}_{j k}

as elements. Consider

\tilde{u} = tr (A^{*} \tilde{X})

. Let

A_{j}

and

{\tilde{X}}_{j}

be the j-th columns of A and

\tilde{X}

, respectively. Then,

\tilde{u} = \sum_{j = 1}^{q} A_{j}^{*} {\tilde{X}}_{j}

. Let

A_{j} = A_{j 1} + i A_{j 2},

i = \sqrt{(- 1)}, A_{j 1}, A_{j 2}

are real

p \times 1

vectors. Let

{\tilde{u}}_{j} = A_{j}^{*} {\tilde{X}}_{j}

. Consider the operator

\frac{\partial}{\partial A_{j}} = (\frac{\partial}{\partial A_{j 1}} + i \frac{\partial}{\partial A_{j 2}}), \frac{\partial}{\partial A_{j}^{*}} = (\frac{\partial}{\partial A_{j 1}^{'}} - i \frac{\partial}{\partial A_{j 2}^{'}}), \frac{\partial}{\partial A_{j}^{c}} = (\frac{\partial}{\partial A_{j 1}} - i \frac{\partial}{\partial A_{j 2}})

and then,

\frac{\partial {\tilde{u}}_{j}}{\partial A_{j}} = \frac{\partial \tilde{u}}{\partial A_{j}}

since we are taking partial derivatives. Then, we have the following result:

Theorem 6.

Let

\tilde{u}, A, A_{j}, {\tilde{X}}_{j}

be as defined above. Then,

\frac{\partial \tilde{u}}{\partial A} = \frac{\partial}{\partial A} [tr (A^{*} \tilde{X})] = 2 \tilde{X} .

(14)

Proof.

From the previous result,

\begin{matrix} \frac{\partial {\tilde{u}}_{j}}{\partial A_{j}} & = (\frac{\partial}{\partial A_{j 1}} + i \frac{\partial}{\partial A_{j 2}}) {\tilde{u}}_{j} \\ = (\frac{\partial}{\partial A_{j 1}} + i \frac{\partial}{\partial A_{j 2}}) (A_{j}^{*} {\tilde{X}}_{j}) = 2 {\tilde{X}}_{j}, j = 1, . . ., q, \end{matrix}

from Theorem 1. But,

\frac{\partial {\tilde{u}}_{j}}{\partial A_{j}} = \frac{\partial \tilde{u}}{\partial A_{j}}

. Now, stack up these side by side as q columns for

j = 1, . . ., q

. Note that

A = [A_{1}, . . ., A_{q}], \frac{\partial}{\partial A} = [\frac{\partial}{\partial A_{1}}, . . ., \frac{\partial}{\partial A_{q}}]

, and

\frac{\partial \tilde{u}}{\partial A_{j}} = 2 {\tilde{X}}_{j}

, and hence,

\frac{\partial \tilde{u}}{\partial A} = 2 \tilde{X}

. This completes the proof. □

For the next result to be established, we need a result on matrices, which will be stated here as a lemma.

Lemma 1.

For two

p \times p

real matrices A and B, where

A = A^{'}

(symmetric) and

B = - B^{'}

(skew symmetric),

tr (A^{'} B) = 0

.

Proof.

For two

p \times p

matrices

A_{1}

and

B_{1}

, it is well known that

tr (A_{1}) = tr (A_{1}^{'})

and

tr (A_{1} B_{1}) = tr (B_{1} A_{1})

. Then, for our matrices A and B,

A = A^{'}, B = - B^{'}

, we have

tr (A^{'} B) = tr {(A^{'} B)}^{'} = tr (B^{'} A) = - tr (B A) = - tr (A B) = - tr (A^{'} B)

. That is,

tr (A^{'} B) = - tr (A^{'} B) \Rightarrow tr (A^{'} B) = 0 \Rightarrow tr (A B) = 0, tr (A B^{'}) = 0

. □

For the next problem, let

A = A^{*}

be a

p \times p

Hermitian matrix of arbitrary elements

a_{j k}

. Let

\tilde{X} = {\tilde{X}}^{*}

be a Hermitian matrix in the complex domain with distinct complex scalar variables as elements except for the Hermitian property. Let

A = A_{1} + i A_{2}, \tilde{X} = X_{1} + i X_{2}

, where

A_{1}, A_{2}, X_{1}, X_{2}

are real. Then,

A_{1} = A_{1}^{'}, X_{1} = X_{1}^{'}

(symmetric),

A_{2} = - A_{2}^{'}, X_{2} = - X_{2}^{'}

(skew symmetric). Let

\tilde{u} = tr (A^{*} \tilde{X})

. Then, we have the following result:

Theorem 7.

Let

A, \tilde{X}, A, A_{1}, A_{2}, X_{1}, X_{2}

, and

\tilde{u} = tr (A^{*} \tilde{X})

be as defined above. Then,

\frac{\partial \tilde{u}}{\partial A} = 2 \tilde{X} - diag (\tilde{X})

where

diag (\tilde{X})

means a diagonal matrix consisting of only the diagonal elements from

\tilde{X}

.

Proof.

Opening up

\tilde{u}

, we have the following:

\begin{matrix} \tilde{u} & = tr (A^{*} \tilde{X}) = tr {(A_{1}^{'} - i A_{2}^{'}) (X_{1} + i X_{2})} \\ = tr {A_{1}^{'} X_{1} + A_{2}^{'} X_{2} + i (A_{1}^{'} X_{2} - A_{2}^{'} X_{1})} \\ = tr (A_{1}^{'} X_{1} + A_{2}^{'} X_{2}) \end{matrix}

since

tr (A_{1}^{'} X_{2}) = 0, tr (A_{2}^{'} X_{1}) = 0

by Lemma 1. Now, from the known result for symmetric matrices in the real case, we have

\frac{\partial}{\partial A_{1}} [tr (A_{1} X_{1})] = 2 X_{1} - diag (X_{1}) .

(15)

If the

(j, k)

-th element, for

j \neq k

, in

A_{2}

is

+ a_{j k}

, then the

(j, k)

-th element in

A_{2}^{'}

is

- a_{j k}

and the

(k, j)

-th element in

A_{2}^{'}

is

a_{j k}

. Hence,

tr (A_{2}^{'} X_{2}) = 2 \sum_{j < k} a_{j k} x_{j k} = \sum_{j > k} a_{j k} x_{j k}

. Hence,

\frac{\partial}{\partial A_{2}} [tr (A_{2}^{'} X_{2})] = 2 X_{2}

(16)

since the diagonal elements in

A_{2}

and

X_{2}

are already zeros. Combining

(15)

and

(16)

, we have

\frac{\partial}{\partial A} [tr (A^{*} \tilde{X})] = 2 \tilde{X} - diag (\tilde{X})

which corresponds to the result in the real case. □

2.4. Differentiation of a Determinant in the Complex Domain

Here, we will start with defining the derivative of a scalar quantity with respect to a matrix or we will define a matrix differential operator first. Let

X = (x_{j k})

be an

m \times n

real matrix and let

\tilde{X} = ({\tilde{x}}_{j k})

be an

m \times n

matrix in the complex domain. Then, we can always write

\tilde{X} = X_{1} + i X_{2}

, where

i = \sqrt{(- 1)}, X_{1}, X_{2}

are real

m \times n

matrices. Then, matrix differential operators

\frac{\partial}{\partial X}

and

\frac{\partial}{\partial \tilde{X}}

will be defined as the following:

\frac{\partial}{\partial X} = [\begin{matrix} \frac{\partial}{\partial x_{11}} & . . . & \frac{\partial}{\partial x_{1 n}} \\ ⋮ & . . . & ⋮ \\ \frac{\partial}{\partial x_{m 1}} & . . . & \frac{\partial}{\partial x_{m n}} \end{matrix}] and \frac{\partial}{\partial \tilde{X}} = (\frac{\partial}{\partial X_{1}} + i \frac{\partial}{\partial X_{2}}) .

Consider a

p \times p

nonsingular matrix

\tilde{X}

in the complex domain and let

| \tilde{X} |

be its determinant. Then, we have the following result:

Theorem 8.

For the

p \times p

nonsingular matrix

\tilde{X}

in the complex domain,

\frac{\partial}{\partial \tilde{X}} [| \tilde{X} |] = 2 | {\tilde{X}}^{c} | {({\tilde{X}}^{*})}^{- 1} f o r a g e n e r a l \tilde{X}

and

\frac{\partial}{\partial \tilde{X}} [| \tilde{X} |] = | \tilde{X} | [2 {\tilde{X}}^{- 1} - diag ({\tilde{X}}^{- 1})] f o r \tilde{X} = {\tilde{X}}^{*} .

Proof.

The cofactor expansion of a determinant holds in the real and complex domains and it is the following:

\begin{matrix} | \tilde{X} | & = {\tilde{x}}_{11} {\tilde{C}}_{11} + . . . + {\tilde{x}}_{1 p} {\tilde{C}}_{1 p} \\ = {\tilde{x}}_{21} {\tilde{C}}_{21} + . . . + {\tilde{x}}_{2 p} {\tilde{C}}_{2 p} \\ ⋮ \\ = {\tilde{x}}_{p 1} {\tilde{C}}_{p 1} + . . . + {\tilde{x}}_{p p} {\tilde{C}}_{p p} \end{matrix}

(17)

where

{\tilde{C}}_{j k}

is the cofactor of

{\tilde{x}}_{j k}

for all j and k. Let

\tilde{C} = ({\tilde{C}}_{j k})

be the matrix of cofactors. For a general matrix, all

{\tilde{x}}_{j k}

’s are distinct and the corresponding

{\tilde{C}}_{j k}

’s are also distinct. Hence, in the real case, taking the partial derivative of the j-th line in

(17)

we have

\frac{\partial}{\partial x_{j k}} [| X |] = C_{j k}

for all j and k. Hence, in the real case

\frac{\partial}{\partial X} [| X |] = C = | X^{'} | {(X^{'})}^{- 1} = | X | {(X^{'})}^{- 1} .

This is a known result. But, in the complex case, the situation is different. Before we tackle the complex-domain situation, we will develop some necessary tools. It is a known result that

{\tilde{X}}^{- 1} = \frac{1}{| \tilde{X} |} {\tilde{C}}^{'}

. But when

\tilde{X} = {\tilde{X}}^{*}

(Hermitian), the eigenvalues are real, and hence, the determinant is real. Then,

{\tilde{X}}^{- 1}

is Hermitian, thereby

\tilde{C}

is also Hermitian. The following results will be helpful when we apply our matrix differential operators to scalar functions of matrices in the complex domain. For two scalar complex variables

\tilde{x}

and

\tilde{y}

the following can be easily verified.

\begin{matrix} \frac{\partial}{\partial \tilde{x}} [\tilde{x} \tilde{y}] = & 0, \frac{\partial}{\partial {\tilde{x}}^{c}} [{\tilde{x}}^{c} {\tilde{y}}^{c}] = 0 \\ \frac{\partial}{\partial \tilde{x}} [{\tilde{x}}^{c} \tilde{y}] & = 2 \tilde{y}, \frac{\partial}{\partial \tilde{x}} [{\tilde{x}}^{c} {\tilde{y}}^{c}] = 2 {\tilde{y}}^{c}, \frac{\partial}{\partial {\tilde{x}}^{c}} [\tilde{x} \tilde{y}] = 2 \tilde{y} . \end{matrix}

(18)

For convenience, let us consider the term

\begin{matrix} {\tilde{u}}_{j k} & = {\tilde{x}}_{j k}^{c} {\tilde{C}}_{j k}^{c} = (x_{1 j k} - i x_{2 j k}) (C_{1 j k} - i C_{2 j k}) \\ = x_{1 j k} C_{1 j k} - x_{2 j k} C_{2 j k} - i (x_{1 j k} C_{2 j k} + x_{2 j k} C_{1 j k}) \end{matrix}

(19)

where

{\tilde{x}}_{j k} = x_{1 j k} + i x_{2 j k}, {\tilde{C}}_{j k} = C_{1 j k} + i C_{2 j k}

, with

x_{1 j k}, x_{2 j k}, C_{1 j k}, C_{2 j k}

being real scalar quantities. Note that

{\tilde{x}}_{j k} = {\tilde{x}}_{k j}^{*}

when

\tilde{X}

is Hermitian. Hence, for example, when we differentiate with respect to

{\tilde{x}}_{21}

it is equivalent to differentiating with respect to

{\tilde{x}}_{12}^{*}

and vice versa. When

{\tilde{x}}_{12}^{*} {\tilde{C}}_{12}^{*}

is differentiated with respect to

{\tilde{x}}_{12}

it is equivalent to differentiating

{\tilde{x}}_{21} {\tilde{C}}_{21}

with respect to

{\tilde{x}}_{12}

, and so on. Then,

\frac{\partial}{\partial x_{1 j k}} [{\tilde{u}}_{j k}] = C_{1 j k} - i C_{2 j k}, \frac{\partial}{\partial x_{2 j k}} [{\tilde{u}}_{j k}] = - C_{2 j k} - i C_{1 j k}, \Rightarrow \frac{\partial}{\partial {\tilde{x}}_{j k}} [{\tilde{u}}_{j k}] = 2 {\tilde{C}}_{j k}^{c} .

(20)

This is the derivative of the complex conjugate of the

(j, k)

-th element in

(18)

for all j and k when

\tilde{X}

is a general matrix with distinct scalar complex variables as elements. Then,

\begin{matrix} \frac{\partial}{\partial \tilde{X}} [| \tilde{X} |] & = 0, \frac{\partial}{\partial \tilde{X}} [| {\tilde{X}}^{c} |] = 2 {\tilde{C}}^{c} = 2 | {\tilde{X}}^{c} | {[{\tilde{X}}^{*}]}^{- 1} \\ \frac{\partial}{\partial {\tilde{X}}^{c}} [| \tilde{X} |] & = 2 \tilde{C} = 2 | \tilde{X} | {({\tilde{X}}^{'})}^{- 1} . \end{matrix}

(21)

When

\tilde{X} = {\tilde{X}}^{*}

(Hermitian) we have

| \tilde{X} | = | {\tilde{X}}^{*} | = | {\tilde{X}}^{c} |

. Then, the result in

(20)

holds for the k-th term in the j-th line in

(18)

. But, when

\tilde{X}

is Hermitian, there is one more element contributing to

x_{1 j k}

and

x_{2 j k}

. This is the j-th term in the k-th line of

(18)

. Thus, the sum of the contributions coming from these two terms is the derivative of

| {\tilde{X}}^{c} |

with respect to

{\tilde{x}}_{j k}

. The sum of the contributions is the following, observing that

x_{1 j k} = x_{1 k j}, C_{1 j k} = C_{1 k j},

x_{2 j k} = - x_{2 k j}, C_{2 j k} = - C_{2 k j}

:

\begin{matrix} x_{1 j k} C_{1 j k} - x_{2 j k} C_{2 j k} - i (x_{1 j k} C_{2 j k} + x_{2 j k} C_{1 j k}) \\ + x_{1 k j} C_{1 k j} - x_{2 k j} C_{2 k j} - i (x_{1 k j} C_{2 k j} + x_{2 k j} C_{1 k j}) \\ = 2 x_{1 j k} C_{1 j k} - 2 x_{2 j k} C_{2 j k} - i [x_{1 j k} (C_{2 j k} + C_{2 k j}) + C_{1 j k} (x_{2 j k} + x_{2 k j})] \\ = 2 x_{1 j k} C_{1 j k} - 2 x_{2 j k} C_{2 j k} since C_{2 j k} + C_{2 k j} = 0, x_{2 j k} + x_{2 k j} = 0 . \end{matrix}

Now,

\frac{\partial}{\partial {\tilde{x}}_{j k}} [{\tilde{x}}_{j k}^{c} {\tilde{C}}_{j k}^{c} + {\tilde{x}}_{k j}^{c} {\tilde{C}}_{k j}^{c}] = 2 {\tilde{C}}_{j k}^{c} = 2 {\tilde{C}}_{k j}

for all

j \neq k

. When

j = k

, the diagonal elements in

\tilde{X}

and

\tilde{C}

are real, and hence, the term occurs only once, and therefore,

\frac{\partial}{\partial {\tilde{x}}_{j j}} [{\tilde{x}}_{j j}^{c} {\tilde{C}}_{j j}^{c}] = {\tilde{C}}_{j j}^{c} = C_{j j} .

Since we are taking the partial derivative, the derivatives are the same as differentiation of

| {\tilde{X}}^{c} |

. Hence,

\begin{matrix} \frac{\partial}{\partial \tilde{X}} [| {\tilde{X}}^{c} |] & = 2 {\tilde{C}}^{c} - diag ({\tilde{C}}^{c}) = | {\tilde{X}}^{*} | [2 {({\tilde{X}}^{*})}^{- 1} - diag ({({\tilde{X}}^{*})}^{- 1})] \\ = | \tilde{X} | [2 {\tilde{X}}^{- 1} - diag ({\tilde{X}}^{- 1})] \end{matrix}

and then,

\frac{\partial}{\partial \tilde{X}} [| \tilde{X} |] = | \tilde{X} | [2 {\tilde{X}}^{- 1} - diag ({\tilde{X}}^{- 1})] .

Here, we have used two properties that for a

p \times p

matrix B,

| B | = | B^{'} |

and

{(B^{c})}^{'} = B^{*} = B

if B is Hermitian. Then, we have the following theorem: □

Theorem 9.

When the

p \times p

nonsingular matrix

\tilde{X}

in the complex domain is Hermitian, that is,

\tilde{X} = {\tilde{X}}^{*}

, then,

\frac{\partial}{\partial \tilde{X}} [| \tilde{X} |] = | \tilde{X} | [2 {\tilde{X}}^{- 1} - diag ({\tilde{X}}^{- 1})] .

In the following sections, we will consider some applications of the results obtained in Section 2.

3. Principal Component Analysis in the Complex Domain

This is an application of the mathematical problem of optimization of a Hermitian form under a Hermitian-form constraint.

In many physical situations, variables occur in pairs, such as time and phase, and hence, the most appropriate representation of such variables is through complex variables because a scalar complex variable can be taken as a pair of real variables. Let

\tilde{x} = x_{1} + i x_{2},

i = \sqrt{(- 1)}, x_{1}, x_{2}

are real scalar variables, be a scalar complex variable. Then, a statistical density associated with

\tilde{x}

is a real-valued scalar function

f (\tilde{x})

of

\tilde{x}

such that

f (\tilde{x}) \geq 0

over the entire complex plane, something like a hill on the complex plane, so that the total volume under the surface is unity, that is,

\int_{\tilde{x}} f (\tilde{x}) d \tilde{x} = 1

, where

d \tilde{x} = d x_{1} \land d x_{2}

, the wedge product of the differentials

d x_{1}

and

d x_{2}

, respectively. Then, the center of gravity of the hill

f (\tilde{x})

is at

E [\tilde{x}] = \int_{\tilde{x}} \tilde{x} f (\tilde{x}) d \tilde{x}

and the square of the measure of scatter in

\tilde{x}

is

σ^{2} = E [(\tilde{x} - E (\tilde{x})) {(\tilde{x} - E (\tilde{x}))}^{*}]

, where

E [(\cdot)]

means the expected value when

\tilde{x}

is a scalar complex random variable and

f (\tilde{x})

is its density.

If the scatter is small, then the variable

\tilde{x}

is concentrated near the center of gravity

E [\tilde{x}]

. If

σ^{2}

is large, then

\tilde{x}

is spread thin far and wide, and hence, it is more or less unrecognizable. If a large number of scalar variables are being considered as possible variables to be included into a model, then the ones with the larger scatter are the important variables to be included into the model. For convenience, one can consider linear functions of such variables because linear functions also contain individual variables. A linear function is of the form

\tilde{u} = a_{1}^{*} {\tilde{x}}_{1} + . . . + a_{p}^{*} {\tilde{x}}_{k}

, where

a_{1}^{*}, . . ., a_{p}^{*}

are constants with

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{p}

being scalar complex variables. For example,

a_{2}^{*} = 0 = . . . = a_{p}^{*} = 0

gives

{\tilde{x}}_{1}

, where a * indicates the conjugate transpose and for a scalar quantity y,

{\tilde{y}}^{*} = {\tilde{y}}^{c}

only, where c in the exponent indicates the complex conjugate. We may write the linear function as

\tilde{u} = A^{*} \tilde{X}

, where

A = [\begin{matrix} a_{1} \\ ⋮ \\ a_{p} \end{matrix}], A^{'} = [a_{1}, . . ., a_{p}] and \tilde{X} = [\begin{matrix} {\tilde{x}}_{1} \\ ⋮ \\ {\tilde{x}}_{p} \end{matrix}], {\tilde{X}}^{'} = [{\tilde{x}}_{1} . . . ., {\tilde{x}}_{p}]

where a prime indicates the transpose. The expected value of

A^{*} \tilde{X}

is

E [\tilde{u}] = A^{*} E [\tilde{X}]

and the variance–covariance matrix or the covariance matrix in

\tilde{X}

is denoted by

Σ > O

(Hermitian positive definite). Then, the variance of

\tilde{u}

, denoted by

Var (\tilde{u})

, is given by

Var (\tilde{u}) = E [(A^{*} (\tilde{X} - E (\tilde{X})) {(\tilde{X} - E (\tilde{X}))}^{*} A] = A^{*} Σ A

. The most important linear function is that linear function having the maximum variance. Hence, we may compute

{max}_{A} [A^{*} Σ A]

. But, since

Σ > O

we have a positive definite Hermitian form in

A^{*} Σ A

and its maximum over A is at

+ \infty

. Thus, unrestricted maximization does not make sense. Without loss of generality we may take

A^{*} A = 1

because this can always be achieved for any non-null A. Then, the maximization amounts to maximizing

A^{*} Σ A

within the unit sphere

A^{*} A = 1

. There will be a maximum and a minimum in this case. We may incorporate the restriction by using a Lagrangian multiplier. Consider

\tilde{w} = A^{*} Σ A - λ (A^{*} A - 1)

(22)

where

λ

is a Lagrangian multiplier. Maximization will be achieved by using the following result, which will be stated as a lemma.

Lemma 2.

Let

\tilde{Y} = Y_{1} + i Y_{2}, i = \sqrt{(- 1)}, Y_{1}, Y_{2}

are

p \times 1

vectors with distinct real scalar variables as elements. Let

B = B^{*}

be a

p \times p

constant Hermitian matrix. Let the partial differential operator

\frac{\partial}{\partial Y}

be defined as

\frac{\partial}{\partial \tilde{Y}} = \frac{\partial}{\partial Y_{1}} + i \frac{\partial}{\partial Y_{2}} .

Then,

\frac{\partial}{\partial \tilde{Y}} [{\tilde{Y}}^{*} B \tilde{Y}] = 2 B \tilde{Y}, B = B^{*} > O, \frac{\partial}{\partial \tilde{Y}} [{\tilde{Y}}^{*} \tilde{Y}] = 2 \tilde{Y} .

Proof.

Let

B = B^{*} = B_{1} + i B_{2}

. When B is Hermitian,

B_{1} = B_{1}^{'}, B_{2} = - B_{2}^{'}

, that is,

B_{1}

is real symmetric and

B_{2}

is real skew symmetric. Let

\tilde{Y} = Y_{1} + i Y_{2}, i = \sqrt{(- 1)}, Y_{1}, Y_{2}

are real. Then,

{\tilde{Y}}^{*} B \tilde{Y} = (Y_{1}^{'} - i Y_{2}^{'}) B (Y_{1} + i Y_{2}) = Y_{1}^{'} B Y_{1} + Y_{2}^{'} B Y_{2} + i (Y_{1}^{'} B Y_{2} - Y_{2}^{'} B Y_{1})

, where

Y_{1}^{'} B Y_{1} = Y_{1}^{'} B_{1} Y_{1} + i Y_{1}^{'} B_{2} Y_{1} = Y_{1}^{'} B_{1} Y_{1}

because

Y_{1}^{'} B_{2} Y_{1} = 0

due to

B_{2} = - B_{2}^{'}

. From the real case it is well known that

\frac{\partial}{\partial Y_{1}} [Y_{1}^{'} B_{1} Y_{1}] = 2 B_{1} Y_{1}

. Similarly,

\frac{\partial}{\partial Y_{2}} [Y_{2}^{'} B Y_{2}] = 2 B_{1} Y_{2}

. Now,

\begin{matrix} i \frac{\partial}{\partial Y_{1}} [Y_{1}^{'} B Y_{2} - Y_{2}^{'} B Y_{1}] & = i \frac{\partial}{\partial Y_{1}} [Y_{1}^{'} B_{1} Y_{2} - Y_{2}^{'} B_{1} Y_{1} + i (Y_{1}^{'} B_{2} Y_{2} - Y_{2}^{'} B_{2} Y_{1})] \\ = 0 + i^{2} \frac{\partial}{\partial Y_{1}} [Y_{1}^{'} B_{2} Y_{2} + Y_{1}^{'} B_{2} Y_{2}] = i^{2} 2 B_{2} Y_{2} = - 2 B_{2} Y_{2} . \end{matrix}

Similarly,

\frac{\partial}{\partial Y_{2}} [- Y_{1}^{'} B Y_{2} - Y_{2}^{'} B Y_{1}] = - 2 i^{2} B_{2} Y_{1} = 2 B_{2} Y_{1}

where we have used two properties. When

B_{1} = B_{1}^{'}

, that is, it is real symmetric, we have

Y_{1}^{'} B_{1} Y_{2} = Y_{2}^{'} B_{1} Y_{1}

because both are

1 \times 1

real, and hence, each is equal to its transpose, and therefore, the difference is zero. When

B_{2} = - B_{2}^{'}

we have

Y_{2}^{'} B_{2} Y_{1} = - Y_{1}^{'} B_{2} Y_{2}

, because both are

1 \times 1

real, and hence, each is equal to its transpose. Then, from the real operator operating on a real linear form the result follows. Similar results hold when differentiating with respect to the operator

\frac{\partial}{\partial Y_{2}}

. Thus, we have the following:

\begin{matrix} \frac{\partial}{\partial Y_{1}} [{\tilde{Y}}^{*} B \tilde{Y}] & = 2 B_{1} Y_{1} - 2 B_{2} Y_{2} = 2 B_{1} Y_{1} + i^{2} 2 B_{2} Y_{2} and \\ \frac{\partial}{\partial Y_{2}} [{\tilde{Y}}^{*} B \tilde{Y}] & = 2 B_{1} Y_{2} + 2 B_{2} Y_{1} . \end{matrix}

Therefore,

\begin{matrix} (\frac{\partial}{\partial Y_{1}} + i \frac{\partial}{\partial Y_{2}}) [{\tilde{Y}}^{*} B \tilde{Y}] & = 2 B_{1} (Y_{1} + i Y_{2}) + 2 i B_{2} (Y_{1} + i Y_{2}) = 2 B_{1} \tilde{Y} + 2 i B_{2} \tilde{Y} \\ = 2 B \tilde{Y} . \end{matrix}

Hence, the result. For

B = I

, the identity matrix, the result on

{\tilde{Y}}^{*} \tilde{Y}

follows, that is,

\frac{\partial}{\partial \tilde{Y}} ({\tilde{Y}}^{*} \tilde{Y}) = 2 \tilde{Y}

. Now, by using Lemma 1 we can differentiate

\tilde{w}

in

(22)

. That is,

\begin{matrix} \frac{\partial}{\partial A} \tilde{w} & = O, \frac{\partial}{\partial λ} = 0 \Rightarrow Σ A - λ A = O \\ \Rightarrow (Σ - λ I) A = O \Rightarrow | Σ - λ I | = 0 \end{matrix}

(23)

where

| (\cdot) |

means the determinant of the square matrix

(\cdot)

. From

(23)

,

Σ A = λ A

and pre-multiplying by

A^{*}

and using the fact that

A^{*} A = 1

, we have

A^{*} Σ A = λ

. Hence,

max_{A^{*} A = 1} [A^{*} Σ A] = λ_{1} and min_{A^{*} A = 1} [A^{*} Σ A] = λ_{p}

(24)

where

λ_{1}

is the largest eigenvalue of

Σ = Σ^{*} > O

and

λ_{p}

is the smallest eigenvalue of

Σ

. When

Σ

is Hermitian, all its eigenvalues are real and when it is Hermitian positive definite, all its eigenvalues are real positive also. Hence, the procedure is the following: Take the largest eigenvalue of

Σ

, say

λ_{1}

. Then, through (23), compute an eigenvector corresponding to

λ_{1}

, that is, solve

Σ A = λ_{1} A

for an A. Then, normalize this eigenvector through

A_{1}^{*} A_{1} = 1

. That is, if an eigenvector corresponding to

λ_{1}

is

α_{1}

, then compute

A_{1} = \frac{1}{\sqrt{α_{1}^{*} α_{1}}} α_{1}

. This

A_{1}

is the normalized eigenvector corresponding to

λ_{1}

. Now, consider

{\tilde{u}}_{1} = A_{1}^{*} \tilde{X}

. This

{\tilde{u}}_{1}

is the first principal component in the sense of the linear function having the maximum variance. Now, take the second largest eigenvalue

λ_{2}

. Go through the same procedure and construct the normalized eigenvector

A_{2}

corresponding to

λ_{2}

. Then,

{\tilde{u}}_{2} = A_{2}^{*} \tilde{X}

is the second principal component. Continue the process and stop the process when the variance of

{\tilde{u}}_{j}

, namely,

λ_{j}

, falls below a preassigned number. If there is no preassigned number, then

{\tilde{u}}_{1}, . . ., {\tilde{u}}_{p}

will be the p principal components. Here, we have assumed that the eigenvalues of

Σ

are distinct. When the eigenvalues are distinct, we can show that the eigenvectors corresponding to the distinct eigenvalues of a symmetric or Hermitian matrix are orthogonal to each other. Hence, our principal components will be orthogonal to each other in the sense that the joint dispersion in the pair

({\tilde{u}}_{i}, {\tilde{u}}_{j})

is 0 for

i \neq j

or the covariance between

{\tilde{u}}_{i}

and

{\tilde{u}}_{j}

is zero when

i \neq j

. The covariance is defined as the following: Let

\tilde{U}

be a

p \times 1

vector in the complex domain and let

\tilde{V}

be a

q \times 1

vector in the complex domain. Then, the covariance of

\tilde{U}

on

\tilde{V}

is defined and denoted as

Cov (\tilde{U}, \tilde{V}) = E [(\tilde{U} - E (\tilde{U})) {(\tilde{V} - E (\tilde{V}))}^{*}]

, whenever this expected value exists, so that when

\tilde{V} = \tilde{U}

, then

Cov (\tilde{U}, \tilde{V}) = Cov (\tilde{U})

, the covariance matrix in

\tilde{U}

, and when

p = 1

, it is the variance of the scalar complex variable

\tilde{u}

. □

When the covariance matrix

Σ

is unknown, then we may construct sample principal components. Let our population be the

p \times 1

vector

\tilde{X}, {\tilde{X}}^{'} = [{\tilde{x}}_{1}, . . ., {\tilde{x}}_{p}]

, where

{\tilde{x}}_{j}, j = 1, . . ., p

are distinct scalar complex variables. Consider n independently and identically distributed (iid) such p-vectors. Then, we have a simple random sample of size n from

\tilde{X}

. Then, the sample matrix is the

p \times n, n > p

matrix, denoted as the following:

\tilde{X} = [{\tilde{X}}_{1}, . . ., {\tilde{X}}_{n}] = [\begin{matrix} {\tilde{x}}_{11} & {\tilde{x}}_{12} & . . . & {\tilde{x}}_{1 n} \\ {\tilde{x}}_{21} & {\tilde{x}}_{22} & . . . & {\tilde{x}}_{2 n} \\ ⋮ & ⋮ & . . . & ⋮ \\ {\tilde{x}}_{p 1} & {\tilde{x}}_{p 2} & . . . & {\tilde{x}}_{p n} \end{matrix}] .

Let the sample average be denoted by

\bar{\tilde{X}} = \frac{1}{n} [{\tilde{X}}_{1} + . . . + {\tilde{X}}_{n}]

and the matrix of sample averages be denoted by the bold letter

\bar{\tilde{X}} = [\bar{\tilde{X}}, . . ., \bar{\tilde{X}}]

. Then, the sample sum of products matrix

\tilde{S}

is given by

\tilde{S} = [\tilde{X} - \bar{\tilde{X}}] {[\tilde{X} - \bar{\tilde{X}}]}^{*} = ({\tilde{s}}_{j k})

where

{\tilde{s}}_{j k} = \sum_{r = 1}^{n} ({\tilde{x}}_{j r} - {\bar{\tilde{x}}}_{j}) {({\tilde{x}}_{r k} - {\bar{\tilde{x}}}_{k})}^{*}

. The motivation in using the sample sum of products matrix

\tilde{S}

is that

\frac{1}{n - 1} \tilde{S}

is an unbiased estimator of

Σ

. Since we will be normalizing the eigenvectors, operate with

\tilde{S}

itself. Compute the eigenvalues of

\tilde{S}

. Take the largest eigenvalue of

\tilde{S}

. Call it

m_{1}

. Construct an eigenvector corresponding to

m_{1}

and normalize it through

M_{1}^{*} M_{1} = 1

, where

M_{1}

is the normalized eigenvector corresponding to

m_{1}

. Then,

{\tilde{v}}_{1} = M_{1}^{*} \tilde{X}

is the first sample principal component. When the columns of the sample matrix are not linearly related then we have

\tilde{S} = {\tilde{S}}^{*} > O

(Hermitian positive definite) and all eigenvalues

m_{1}, . . ., m_{p}

will be positive. We assume that the eigenvalues are distinct

m_{1} > m_{2} > . . . > m_{p}

. This will be true almost surely. Now, take

m_{2}

and construct

M_{2}

and the second principal component

{\tilde{v}}_{2} = M_{2}^{*} \tilde{X}

and continue the process. We can show that the covariances between

{\tilde{v}}_{j}

and

{\tilde{v}}_{k}

will be zeros for all

j \neq k

. This property follows from the fact that when the matrix is symmetric or Hermitian, the eigenvectors corresponding to distinct eigenvalues are orthogonal. When the population

\tilde{X}

is p-variate complex Gaussian, then we can show that

\tilde{S}

will be a complex Wishart distributed with degrees of freedom

n - 1

and parameter matrix

Σ

. The distributions of the largest, smallest, and j-th largest eigenvalues and the corresponding eigenvectors of

\tilde{S}

in the complex domain are given in [8].

4. Canonical Correlation Analysis in the Complex Domain

This is an application of the mathematical problem of optimization of a bilinear form in the complex domain, under two Hermitian-form constraints. The following application is regarding the prediction of one set of variables by using another set of variables.

Consider two sets of scalar complex variables

S_{1} = {{\tilde{x}}_{1}, . . ., {\tilde{x}}_{p}} and S_{2} = {{\tilde{y}}_{1}, . . ., {\tilde{y}}_{q}}

where p need not be equal to q. Consider the appended vector

\begin{matrix} \tilde{Z} & = [\begin{matrix} \tilde{X} \\ \tilde{Y} \end{matrix}], \tilde{X} = [\begin{matrix} {\tilde{x}}_{1} \\ ⋮ \\ {\tilde{x}}_{p} \end{matrix}], \tilde{Y} = [\begin{matrix} {\tilde{y}}_{1} \\ ⋮ \\ {\tilde{y}}_{q} \end{matrix}], E [\tilde{X}] = μ_{x}, E [\tilde{Y}] = μ_{y}, \\ Σ & = E [[\begin{matrix} \tilde{X} - μ_{x} \\ \tilde{Y} - μ_{y} \end{matrix}] [{\tilde{X}}^{*} - μ_{x}^{*}, {\tilde{Y}}^{*} - μ_{y}^{*}]] = Cov (\tilde{Z}) = [\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix}] \end{matrix}

where

Σ_{11} = Cov (\tilde{X}), Σ_{22} = Cov (\tilde{Y}), Σ_{12} = Cov (\tilde{X}, \tilde{Y})

, and

Σ_{12}^{*} = Σ_{21}

, and vice-versa. That is,

Σ

is the covariance matrix in

\tilde{Z}

,

Σ_{11}

is the covariance matrix in

\tilde{X}

,

Σ_{22}

is the covariance matrix in

\tilde{Y}

, and so on. Also,

Σ = Σ^{*} > O, Σ_{11} = Σ_{11}^{*} > O, Σ_{22} = Σ_{22}^{*} > O

. Our aim is to predict the variables in the set

S_{1}

by using the variables in the set

S_{2}

and vice-versa, and obtain the “best” predictors; “best” in the sense of having the maximum joint dispersion. In order to represent each set

S_{1}

and

S_{2}

, we will take arbitrary linear functions of the variables in each set. Consider linear functions

\tilde{u} = A^{*} \tilde{X}

and

\tilde{v} = B^{*} \tilde{Y}

, where

A = [\begin{matrix} a_{1} \\ ⋮ \\ a_{p} \end{matrix}], B = [\begin{matrix} b_{1} \\ ⋮ \\ b_{q} \end{matrix}], \begin{matrix} \tilde{u} & = A^{*} \tilde{X} = a_{1}^{*} {\tilde{x}}_{1} + . . . + a_{p}^{*} {\tilde{x}}_{p} \\ \tilde{v} & = B^{*} \tilde{Y} = b_{1}^{*} {\tilde{y}}_{1} + . . . + b_{q}^{*} {\tilde{y}}_{q} \end{matrix},

where

\tilde{X}

and

\tilde{Y}

are listed above already. Since

a_{j}

and

b_{j}

are scalar constant quantities,

a_{j}^{*} = a_{j}^{c}, j = 1, . . ., p

and

b_{j}^{*} = b_{j}^{c}, j = 1, . . ., q

. Variances for linear functions are already seen in Section 2. Therefore,

Var (\tilde{u}) = A^{*} Σ_{11} A, Var (\tilde{v}) = B^{*} Σ_{22} B

and

Cov (\tilde{u}, \tilde{v}) = A^{*} Σ_{12} B,

Σ_{21} = Σ_{12}^{*}, B^{*} Σ_{21} A = Cov (\tilde{v}, \tilde{u})

. Here,

Σ_{12}

and

Σ_{21}

can be taken as measures of joint dispersion or joint variation between

\tilde{X}

and

\tilde{Y}

and

A^{*} Σ_{12} B = Cov (\tilde{u}, \tilde{v})

as the joint dispersion between

\tilde{u}

and

\tilde{v}

. As a criterion for the “best” predictor of

\tilde{u}

by using

\tilde{v}

and vice versa, we may take that the pair

\tilde{u}

and

\tilde{v}

have the maximum joint variation. The best predictor of

\tilde{u}

by using

\tilde{v}

as that pair having the maximum

A^{*} Σ_{12} B

and the best predictor of

\tilde{v}

by using

\tilde{u}

as that pair having the maximum value for

B^{*} Σ_{21} A

. Since covariances depend upon the units of measurements of the variables involved, we may take a scale-free covariance by taking

ρ = \frac{Cov (\tilde{u}, \tilde{v})}{\sqrt{Var (\tilde{u}) Var (\tilde{v})}} = \frac{A^{*} Σ_{12} B}{\sqrt{[A^{*} Σ_{11} A] [B^{*} Σ_{22} B]}} .

Further, as explained in Section 2, without loss of generality we may take

A^{*} Σ_{11} A = 1

and

B^{*} Σ_{22} B = 1

or confine the bilinear form (hyperboloid) within unit positive definite Hermitian forms (ellipsoids) in order to prevent them from going to

+ \infty

. Hence, our procedure simplifies to optimizing

A^{*} Σ_{12} B

subject to the conditions

A^{*} Σ_{11} A = 1,

B^{*} Σ_{22} B = 1

and computing that pair of A and B which will maximize

A^{*} Σ_{12} B

. As before, we may use the Lagrangian multipliers

λ_{1}

and

λ_{2}

and consider the function

\tilde{w} = A^{*} Σ_{12} B - λ_{1} (A^{*} Σ_{11} A - 1) - λ_{2} (B^{*} Σ_{22} B - 1) .

(25)

In order to optimize this

\tilde{w}

we need one result on differentiation of a bilinear form, which will be stated as a lemma.

Lemma 3.

Let

\tilde{X} = X_{1} + i X_{2}, i = \sqrt{(- 1)}, X_{1}, X_{2}

are real,

p \times 1

vectors of distinct real scalar variables

x_{1 j}

and

x_{2 j}

, respectively, where

{\tilde{x}}_{j} = x_{j 1} + i x_{j 2}, j = 1, . . ., p

. Let

\tilde{Y} = Y_{1} + i Y_{2},

i = \sqrt{(- 1)}, Y_{1}, Y_{2}

are real,

q \times 1

vectors of distinct real scalar variables

y_{j 1}

and

y_{j 2}

, respectively, where

{\tilde{y}}_{j} = y_{j 1} + i y_{j 2}, j = 1, . . ., q

, where

x_{j 1}, x_{j 2}, y_{j 1}, y_{j 2}

are real. Let the partial differential operators be as defined in Section 2, namely,

\frac{\partial}{\partial X_{1}} = [\begin{matrix} \frac{\partial}{\partial x_{11}} \\ ⋮ \\ \frac{\partial}{\partial x_{p 1}} \end{matrix}], \frac{\partial}{\partial X_{2}} = [\begin{matrix} \frac{\partial}{\partial x_{12}} \\ ⋮ \\ \frac{\partial}{\partial x_{p 2}} \end{matrix}], \frac{\partial}{\partial \tilde{X}} = (\frac{\partial}{\partial X_{1}} + i \frac{\partial}{\partial X_{2}})

and similar operators involving

\tilde{Y} = Y_{1} + i Y_{2}

. Then,

\frac{\partial}{\partial \tilde{X}} [{\tilde{X}}^{*} A \tilde{Y}] = 2 A \tilde{Y} a n d \frac{\partial}{\partial \tilde{Y}} [{\tilde{Y}}^{*} A^{*} \tilde{X}] = 2 A^{*} \tilde{X} .

Proof.

Opening up

{\tilde{X}}^{*} A \tilde{Y}

we have the following:

{\tilde{X}}^{*} A \tilde{Y} = (X_{1}^{'} - i X_{2}^{'}) A (Y_{1} + i Y_{2}) = X_{1}^{'} A Y_{1} + X_{2}^{'} A Y_{2} + i (X_{1}^{'} A Y_{2} - X_{2}^{'} A Y_{1}) .

Then, from the known results in the real case, we have the following:

\frac{\partial}{\partial X_{1}} [{\tilde{X}}^{*} A \tilde{Y}] = A Y_{1} + i A Y_{2} and \frac{\partial}{\partial X_{2}} [{\tilde{X}}^{*} A \tilde{Y}] = A Y_{2} - i A Y_{1}

irrespective of whether A is real or in the complex domain. Then,

\frac{\partial}{\partial \tilde{X}} [{\tilde{X}}^{*} A \tilde{Y}] = (\frac{\partial}{\partial X_{1}} + i \frac{\partial}{\partial X_{2}}) [{\tilde{X}}^{*} A \tilde{Y}] = 2 A \tilde{Y} .

Similarly,

\frac{\partial}{\partial \tilde{Y}} [{\tilde{Y}}^{*} A^{*} \tilde{X}] = 2 A^{*} \tilde{X} .

This completes the proof. □

Now, differentiating

\tilde{w}

in

(25)

, we have the following:

\begin{matrix} \frac{\partial \tilde{w}}{\partial A} = O & \Rightarrow Σ_{12} B - λ_{1} Σ_{11} A = O \end{matrix}

(26)

\begin{matrix} \frac{\partial {\tilde{w}}^{*}}{\partial B} = O & \Rightarrow Σ_{21} A - λ_{2} Σ_{22} B = O . \end{matrix}

(27)

Now, premultiply

(26)

by

A^{*}

and

(27)

by

B^{*}

to obtain the following, observing that

Σ_{21} = Σ_{12}^{*}

:

A^{*} Σ_{12} B = λ_{1}, B^{*} Σ_{21} A = λ_{2}, \Rightarrow λ_{2} = λ_{1}^{c}

or

λ_{1} = λ, λ_{2} = λ^{c}

. Take B from

(27)

and substitute in

(26)

to obtain the following:

(Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21} - λ λ^{c} I) A = O, λ λ^{c} = {| λ |}^{2} .

(28)

This shows that

λ λ^{c} = {| λ |}^{2} = μ

is an eigenvalue of

Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21}

, where the matrix is

p \times p

. From symmetry, it follows that

μ

is also an eigenvalue of

Σ_{22}^{- 1} Σ_{21} Σ_{11}^{- 1} Σ_{12}

, where the matrix is

q \times q

. Hence, all the nonzero

μ

’s are common to both of these matrices. Hence, the procedure is the following: If

p \leq q

, then compute the eigenvalues of

Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21}

, otherwise, compute the eigenvalues of the other matrix

Σ_{22}^{- 1} Σ_{21} Σ_{11}^{- 1} Σ_{12}

, both will give the same nonzero eigenvalues. Let

μ_{1}

be the largest and

μ_{r}

be the smallest nonzero eigenvalues. Then, we have the results

max_{A^{*} Σ_{11} A = 1, B^{*} Σ_{22} B = 1} [A^{*} Σ_{12} B] = μ_{1}

and

min_{A^{*} Σ_{11} A = 1, B^{*} Σ_{22} B = 1} [A^{*} Σ_{12} B] = μ_{r} .

Then, the procedure is the following: If

p \leq q

, then compute all the eigenvalues of

Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21}

. Let the largest eigenvalue be

μ_{1}

. Then, compute one eigenvector corresponding to

μ_{1}

. Use the equation

(Σ_{12} Σ_{22}^{- 1} Σ_{21} - μ_{1} Σ_{11}) A = O

. Let it be

A_{11}

. Then, normalize it through

A_{11}^{*} Σ_{11} A_{11} = 1

. That is, compute

A_{1} = \frac{1}{\sqrt{A_{11}^{*} Σ_{11} A_{11}}} A_{11}

. Then, compute

{\tilde{u}}_{1} = A_{1}^{*} \tilde{X}

. Then, use the same eigenvalue

μ_{1}

and compute one eigenvector from the equation

(Σ_{21} Σ_{11}^{- 1} Σ_{12} - μ_{1} Σ_{22}) B = O

. Let it be

B_{11}

. Then, normalize it through

B_{11}^{*} Σ_{22} B_{11} = 1

, that is, compute

B_{1} = \frac{1}{\sqrt{B_{11}^{*} Σ_{22} B_{11}}} B_{11}

. Now, compute

{\tilde{v}}_{1} = B_{1}^{*} \tilde{Y}

. Then,

({\tilde{u}}_{1}, {\tilde{v}}_{1})

is the first pair of canonical variables in the sense

{\tilde{u}}_{1}

is the best predictor of

{\tilde{v}}_{1}

and vice-versa. Now, take the second largest eigenvalue

μ_{2}

of

Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21}

. Then, compute one eigenvector corresponding to

μ_{2}

, that is, solve the equation

(Σ_{12} Σ_{22}^{- 1} Σ_{21} - μ_{2} Σ_{11}) A = O .

Let

A_{21}

be that eigenvector. Normalize it, that is, compute

A_{2} = \frac{1}{\sqrt{A_{21}^{*} Σ_{11} A_{21}}} A_{21}

. Now, compute

{\tilde{u}}_{2} = A_{2}^{*} \tilde{X}

. Use the same

μ_{2}

and solve for B from the equation

(Σ_{21} Σ_{11}^{- 1} Σ_{12} - μ_{2} Σ_{22}) B = O

. Let

B_{21}

be one solution. Then, normalize it, that is, compute

B_{2} = \frac{1}{\sqrt{B_{21}^{*} Σ_{22} B_{21}}} B_{21}

. Now, compute

{\tilde{v}}_{2} = B_{2}^{*} \tilde{Y}

. Then,

({\tilde{u}}_{2}, {\tilde{v}}_{2})

is the second pair of canonical variables. Continue the process until

μ_{j}

falls below a preassigned limit. If there is no such preassigned limit, then compute all the pairs, that is, p if

p \leq q

; otherwise q. If

q < p

, then start with the computation of the eigenvalues of

Σ_{22}^{- 1} Σ_{21} Σ_{11}^{- 1} Σ_{12}

and proceed parallel to the steps used in the case

p \leq q

. Observe that the symmetric format of

Σ_{11}^{- 1} Σ_{12} Σ_{22}^{- 1} Σ_{21}

is

Σ_{11}^{- \frac{1}{2}} Σ_{12} Σ_{22}^{- 1} Σ_{21} Σ_{11}^{- \frac{1}{2}}

. This form is also available from the same starting Equation

(28)

. The symmetric format can always be written in the form

C^{*} C

for some matrix C, and hence, the symmetric form is either Hermitian positive definite or Hermitian positive semi-definite, and therefore, all the nonzero eigenvalues are positive. Let us assume that the eigenvalues

μ_{1}, μ_{2}, . . .

are distinct,

μ_{1} > μ_{2} > . . . > μ_{p}

if

p \leq q

. It is a known result that eigenvectors corresponding to distinct eigenvalues of Hermitian or symmetric matrices are orthogonal to each other. Hence,

{\tilde{u}}_{1}, {\tilde{u}}_{2}, . . .

are non-correlated. Similarly,

{\tilde{v}}_{1}, {\tilde{v}}_{2}, . . .

are non-correlated.

If

Σ_{11}, Σ_{12}, Σ_{22}

are not available, then take a simple random sample of size n,

n > p + q

, from

\tilde{Z} = [\begin{matrix} \tilde{X} \\ \tilde{Y} \end{matrix}]

. Then, compute the sample sum of products matrix:

\tilde{S} = [\begin{matrix} {\tilde{S}}_{11} & {\tilde{S}}_{12} \\ {\tilde{S}}_{21} & {\tilde{S}}_{22} \end{matrix}],

as for the case of principal component analysis. Now, continue with

{\tilde{S}}_{j k}

’s as with

Σ_{j k}

’s. Then, we will obtain the pairs of sample canonical variables. Some distributional aspects of sample canonical variables and the sample canonical correlation matrix

\tilde{U} = {\tilde{S}}_{11}^{- \frac{1}{2}} {\tilde{S}}_{12} {\tilde{S}}_{22}^{- 1} {\tilde{S}}_{21} {\tilde{S}}_{11}^{- \frac{1}{2}}

are discussed in [8]. Note that when

S_{11}

is

1 \times 1

, then we have the square of the multiple correlation coefficient in

\tilde{U} = \tilde{u}

, which is scalar. In this case, our starting set

S_{1}

will have only one complex scalar variable and the set

S_{2}

will have q variables. Here, the problem is to predict one variable in

S_{1}

by using the variables in

S_{2}

. The exact distributions of the canonical correlation matrix

\tilde{U}

and the square of the absolute value of the multiple correlation coefficient

\tilde{u}

are available in explicit forms in [8].

5. Covariance and Correlation in the Complex Domain

Consider scalar complex variables first. Let

{\tilde{x}}_{1}

and

{\tilde{y}}_{1}

be two scalar complex variables or scalar variables in the complex domain. Then, the mean values or expected values of

{\tilde{x}}_{1}

and

{\tilde{y}}_{1}

are, respectively,

E [{\tilde{x}}_{1}] = \int_{{\tilde{x}}_{1}} {\tilde{x}}_{1} f ({\tilde{x}}_{1}) d {\tilde{x}}_{1}, E [{\tilde{y}}_{1}] = \int_{{\tilde{y}}_{1}} {\tilde{y}}_{1} g ({\tilde{y}}_{1}) d {\tilde{y}}_{1}

, where

E [\cdot]

means the expected value of

[\cdot]

, and f and g are the densities of

{\tilde{x}}_{1}

and

{\tilde{y}}_{1}

, respectively. Let

\tilde{x} = {\tilde{x}}_{1} - E [{\tilde{x}}_{1}], \tilde{y} = {\tilde{y}}_{1} - E [{\tilde{y}}_{1}]

. Let

\tilde{x} = x_{11} + i x_{12}, i = \sqrt{(- 1)}, x_{11}, x_{12}

are real scalar variables, and let

\tilde{y} = y_{11} + i y_{12}, i = \sqrt{(- 1)}, y_{11}, y_{12}

are real scalar variables. Then,

E [\tilde{x}] = 0,

E [\tilde{y}] = 0

. Then, the variance, denoted by

Var (\cdot)

, and covariance, denoted by

Cov (\cdot, \cdot)

, are the following:

\begin{matrix} Var ({\tilde{x}}_{1}) & = Var (\tilde{x}) = E [\tilde{x} {\tilde{x}}^{*}] = σ_{11} \\ Var ({\tilde{y}}_{1}) & = Var (\tilde{y}) = E [\tilde{y} {\tilde{y}}^{*}] = σ_{22} \\ Cov ({\tilde{x}}_{1}, {\tilde{y}}_{1}) & = Cov (\tilde{x}, \tilde{y}) = E [\tilde{x} {\tilde{y}}^{*}] = σ_{12} \end{matrix}

where, for example,

{\tilde{x}}^{*} = {\tilde{x}}^{c}

, with * indicating the conjugate transpose and c indicating the conjugate only. Here, we have only scalar variables, and hence, the complex conjugate transpose is only complex conjugate. For convenience, we have used the notation

σ_{11}, σ_{22}, σ_{12}

. Then,

σ_{21} = E [\tilde{y} {\tilde{x}}^{*}] = E [\tilde{y} {\tilde{x}}^{c}] = Cov (\tilde{y}, \tilde{x})

. For a scalar complex variable

\tilde{x} = x_{11} + i x_{12},

{\tilde{x}}^{*} = x_{11}^{'} - i x_{12}^{'} = x_{11} - i x_{12} = {\tilde{x}}^{c}

. Let us examine the variances of sum and difference. Let

\tilde{u} = \frac{{\tilde{x}}_{1} - E [{\tilde{x}}_{1}]}{\sqrt{V a r (\tilde{x_{1}})}} = \frac{\tilde{x}}{\sqrt{σ_{11}}}, \tilde{v} = \frac{{\tilde{y}}_{1} - E [{\tilde{y}}_{1}]}{\sqrt{Var ({\tilde{y}}_{1})}} = \frac{\tilde{y}}{\sqrt{σ_{22}}} .

Then,

\begin{matrix} Var (\tilde{u} + \tilde{v}) & = E [(\tilde{u} + \tilde{v} - E (\tilde{u} + \tilde{v})) {(\tilde{u} + \tilde{v} - E (\tilde{u} + \tilde{v}))}^{*}] \\ = E [\frac{\tilde{x} {\tilde{x}}^{*}}{σ_{11}} + \frac{\tilde{y} {\tilde{y}}^{*}}{σ_{22}} + \frac{\tilde{x} {\tilde{y}}^{*} + \tilde{y} {\tilde{x}}^{*}}{\sqrt{σ_{11} σ_{22}}}] \end{matrix}

This can be simplified as the following:

\begin{matrix} Var (\tilde{u} + \tilde{v}) & = 2 + 2 \frac{[Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x})]}{2 \sqrt{σ_{11} σ_{22}}} \geq 0 \end{matrix}

(29)

\begin{matrix} Var (\tilde{u} - \tilde{v}) & = \frac{E [\tilde{x} {\tilde{x}}^{*}]}{σ_{11}} + \frac{E [\tilde{y} {\tilde{y}}^{*}]}{σ_{22}} - \frac{E [\tilde{x} {\tilde{y}}^{*} + \tilde{y} {\tilde{x}}^{*}]}{\sqrt{σ_{11} σ_{22}}} \\ = 2 - 2 \frac{[Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x})]}{2 \sqrt{σ_{11} σ_{22}}} \geq 0 \end{matrix}

(30)

From

(29)

and

(30)

we have

- 1 \leq \frac{Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x})}{2 \sqrt{σ_{11} σ_{22}}} \leq 1 .

(31)

Let us examine the quantity,

Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x}) = E [\tilde{x} {\tilde{y}}^{*}] + E [\tilde{y} {\tilde{x}}^{*}] .

Note that

\begin{matrix} E [\tilde{x} {\tilde{y}}^{*}] & = E [(x_{11} + i x_{12}) (y_{11} - i y_{12})] = E [x_{11} y_{11} + x_{12} y_{12} + i (x_{12} y_{11} - x_{11} y_{12})] \\ E [\tilde{y} {\tilde{x}}^{*}] & = E [x_{11} y_{11} + x_{12} y_{12} + i (y_{12} x_{11} - y_{11} x_{12})] \Rightarrow \\ E [\tilde{x} {\tilde{y}}^{*}] & + E [\tilde{y} {\tilde{x}}^{*}] = Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x}) = 2 E [x_{11} y_{11} + x_{12} y_{12}] = 2 E [ℜ (\tilde{x} {\tilde{y}}^{*})] \end{matrix}

Hence,

\begin{matrix} \frac{1}{2} [\frac{Cov (\tilde{x}, \tilde{y}) + Cov (\tilde{y}, \tilde{x})}{\sqrt{σ_{11} σ_{22}}}] & = E [\frac{x_{11} y_{11} + x_{12} y_{12}}{\sqrt{σ_{11} σ_{22}}}] \\ = E [\frac{ℜ (\tilde{x} {\tilde{y}}^{*})}{\sqrt{σ_{11} σ_{22}}}] . \end{matrix}

Therefore, we may define the correlation coefficient in the complex domain, denoted by

\tilde{r}

, as the following:

\tilde{r} = \frac{ℜ (Cov (\tilde{x}, \tilde{y}))}{\sqrt{σ_{11} σ_{22}}} \Rightarrow - 1 \leq \tilde{r} \leq 1

(32)

where

ℜ (\cdot)

denotes the real part in

(\cdot)

. Also, note that

| \tilde{x} {\tilde{y}}^{*} |^{2} = | \tilde{x} |^{2} {| \tilde{y} |}^{2}

. This will motivate us to examine the dot product of two vectors in the complex domain and the Cauchy–Schwarz inequality in the complex domain.

5.1. The Cauchy–Schwarz Inequality in the Complex Domain

Let

\tilde{X}, {\tilde{X}}^{'} = [{\tilde{x}}_{1}, . . ., {\tilde{x}}_{p}]

, and

\tilde{Y}, {\tilde{Y}}^{'} = [{\tilde{y}}_{1}, . . ., {\tilde{y}}_{p}]

be two

p \times 1

vectors in the complex domain. We will define the dot product between

\tilde{X}

and

\tilde{Y}

, and it will be denoted and defined as the following:

\tilde{X} ⊙ \tilde{Y} = {\tilde{X}}^{*} \tilde{Y}

and

\begin{matrix} | \tilde{X} ⊙ \tilde{Y} |^{2} & = | {\tilde{X}}^{*} \tilde{Y} |^{2} = | {\tilde{x}}_{1}^{c} {\tilde{y}}_{1} |^{2} + . . . + {| {\tilde{x}}_{p}^{c} {\tilde{y}}_{p} |}^{2} \\ = | {\tilde{x}}_{1} |^{2} | {\tilde{y}}_{1} |^{2} + . . . + | {\tilde{x}}_{p} |^{2} {| {\tilde{y}}_{p} |}^{2} \\ = (x_{11}^{2} + x_{12}^{2}) (y_{11}^{2} + y_{12}^{2}) + . . . + (x_{p 1}^{2} + x_{p 2}^{2}) (y_{p 1}^{2} + y_{p 2}^{2}) \\ \leq (x_{11}^{2} + x_{12}^{2} + . . . + x_{p 1}^{2} + x_{p 2}^{2}) (y_{11}^{2} + y_{12}^{2} + . . . + y_{p 1}^{2} + y_{p 2}^{2}) \\ \Rightarrow | \tilde{X} ⊙ \tilde{Y} | \leq | \tilde{X} | | \tilde{Y} | . \end{matrix}

(33)

Thus, the Cauchy–Schwarz inequality holds for the complex domain also.

5.2. Minimum-Variance Unbiased Estimators in the Complex Domain

In the class of linear estimators

A^{*} \tilde{X}

for the parametric function

g (θ)

, which linear function is the minimum-variance unbiased estimator for

g (θ)

? Let

A = [\begin{matrix} a_{1} \\ ⋮ \\ a_{p} \end{matrix}], \tilde{X} = [\begin{matrix} {\tilde{x}}_{1} \\ ⋮ \\ {\tilde{x}}_{p} \end{matrix}] \Rightarrow E [A^{*} \tilde{X}] = A^{*} E [\tilde{X}] = A^{*} \tilde{μ} = g (θ),

and

Var (A^{*} \tilde{X}) = A^{*} Σ A, E [\tilde{X}] = \tilde{μ}

where

Σ

is the covariance matrix in

\tilde{X}

. Our aim here is to minimize

A^{*} Σ A

subject to the constraint

A^{*} \tilde{μ} = g (θ)

(given). Let

λ

be a Lagrangian multiplier and let

\tilde{w} = A^{*} Σ A - λ (A^{*} \tilde{μ} - g (θ))

. Then,

\frac{\partial \tilde{w}}{\partial A} = O \Rightarrow 2 Σ A - 2 λ \tilde{μ} = O \Rightarrow Σ A = λ \tilde{μ} .

(34)

That is,

A^{*} Σ A = λ A^{*} \tilde{μ} = λ g (θ)

. Hence,

g (θ)

multiplied by the minimum value of

λ

gives the minimum of

A^{*} Σ A

. From

(34)

,

A = λ Σ^{- 1} \tilde{μ} \Rightarrow g (θ) = A^{*} \tilde{μ} = λ^{c} {\tilde{μ}}^{*} Σ^{- 1} \tilde{μ} \Rightarrow λ^{c} = \frac{g (θ)}{{\tilde{μ}}^{*} Σ^{- 1} \tilde{μ}} .

Then,

λ g (θ) = \frac{g g^{*}}{{\tilde{μ}}^{*} Σ^{- 1} \tilde{μ}} = \frac{{| g |}^{2}}{{\tilde{μ}}^{*} Σ^{- 1} \tilde{μ}}

is the minimum value of the variance of our linear estimator. Hence, the minimum-variance unbiased estimator is

T (\tilde{X}) = \frac{g}{{\tilde{μ}}^{*} Σ^{- 1} \tilde{μ}} [{\tilde{μ}}^{*} Σ^{- 1} \tilde{X}] .

Note also that

g = A^{*} \tilde{μ} = (A^{*} Σ^{\frac{1}{2}}) (Σ^{- \frac{1}{2}} \tilde{μ}) \leq \sqrt{A^{*} Σ A} \sqrt{\tilde{μ} Σ^{- 1} \tilde{μ}} \Rightarrow \sqrt{A^{*} Σ A} \geq \frac{g}{{\tilde{μ}}^{*} Σ^{- 1} \tilde{μ}} .

The first inequality follows from the inequality established in Section 5.1.

5.3. Cramer–Rao-Type Inequality in the Complex Domain

Let

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{n}

be a simple random sample from the population designated by the density

f ({\tilde{x}}_{j})

, where

{\tilde{x}}_{j}, j = 1, . . ., n

are independently and identically distributed (iid). Then, the joint density

L = \prod_{j = 1}^{n} f ({\tilde{x}}_{j})

. Let

T ({\tilde{x}}_{1}, . . ., {\tilde{x}}_{n})

, denoted as

T (\tilde{X})

, be a statistic with the expected value

g (θ)

. That is,

E [T] = \int_{\tilde{X}} T L d \tilde{X} = g (θ)

. Differentiating with respect to

θ

, we have

\frac{\partial g (θ)}{\partial θ} = \int_{\tilde{X}} T (\frac{\partial L}{\partial θ}) d \tilde{X} = \int_{\tilde{X}} T (\frac{\partial ln L}{\partial θ}) L d \tilde{X} = E [T (\frac{\partial ln L}{\partial θ})]

(35)

where we have assumed that the support of

\tilde{X}

is free of

θ

and differentiation inside the integral is valid. But, from the total integral being one, we have

\int_{\tilde{X}} L d \tilde{X} = 1 \Rightarrow \int_{\tilde{X}} (\frac{\partial ln L}{\partial θ}) L d \tilde{X} = 0 \Rightarrow E [\frac{\partial ln L}{\partial θ}] = 0 .

(36)

From

(35)

and

(36)

, we have

E [T (\frac{\partial ln L}{\partial θ})] = Cov (T, \frac{\partial ln L}{\partial θ})

because for any two scalar random variables u and v real, or

\tilde{u}

and

\tilde{v}

in the complex domain,

Cov (u, v) = E [(u - E (u)) (v - E (v))] = E [u (v - E (v))] = E [(u - E (u)) v]

and the corresponding results in the complex domain also hold. Therefore, since

E [\frac{\partial ln L}{\partial θ}] = 0, E [T (\frac{\partial ln L}{\partial θ})] = Cov (T, \frac{\partial ln L}{\partial θ})

. Then,

| Cov (T, \frac{\partial ln L}{\partial θ}) | \leq \sqrt{Var (T)} \sqrt{Var (\frac{\partial ln L}{\partial θ})} \Rightarrow Var (\tilde{T}) \geq \frac{| \frac{\partial g}{\partial θ} |^{2}}{Var (\frac{\partial ln L}{\partial θ})} .

(37)

Note that

Var (\frac{\partial ln L}{\partial θ}) = E [{(\frac{\partial ln L}{\partial θ})}^{2}] = n Var (\frac{\partial ln f}{\partial θ}) = n E [{(\frac{\partial ln L}{\partial θ})}^{2}] .

(38)

This shows that the Cramer–Rao-type inequality holds in the complex domain also.

5.4. Least Square Estimation in Linear Models in the Complex Domain

Let us examine whether the least square procedure, in the class of linear models, holds in the complex domain also. Let

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{k}

be preassigned complex numbers or observations on k random variables in the complex domain. Let

\tilde{y}

be a scalar complex variable. Then, a linear model of

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{k}

for predicting

\tilde{y}

, can be of the following form:

\tilde{y} = a_{0}^{c} + β^{*} \tilde{X}, β = [\begin{matrix} a_{1} \\ ⋮ \\ a_{k} \end{matrix}], \tilde{X} = [\begin{matrix} {\tilde{x}}_{1} \\ ⋮ \\ {\tilde{x}}_{k} \end{matrix}] .

But, in order to predict

\tilde{y}

by using linear predictors we must know the conditional distribution of

\tilde{y}

, given

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{k}

, and also the conditional expectation must be linear in

{\tilde{x}}_{1}, . . ., {\tilde{x}}_{k}

. If the conditional distribution is not known, then we may use a distribution-free procedure. One such procedure is the estimation procedure by using the least square method. In this method, we set up a corresponding model of the following form for the j-th observation on

\tilde{y}

, namely,

{\tilde{y}}_{j}

, for

j = 1, . . ., n

, where

n > k + 1

is the sample size.

\tilde{y} = a_{0} + a_{1} {\tilde{x}}_{1 j} + . . . + a_{k} {\tilde{x}}_{k j} + ϵ_{j}, j = 1, . . ., n

(39)

corresponding to the linear model in the real case, where

ϵ_{j}

is the random part or the sum total contributions coming from unknown factors, corresponding to

{\tilde{y}}_{j}

. Then, if we sum up the observations and divide by n we obtain the sample averages

\bar{\tilde{y}}, {\bar{\tilde{x}}}_{r}, r = 1, . . ., k

, where, for example,

{\bar{\tilde{x}}}_{r} = \frac{1}{n} \sum_{j = 1}^{n} {\tilde{x}}_{r j}

. Then, from

(39)

, we have

\begin{matrix} \bar{\tilde{y}} = a_{0} + a_{1} {\bar{\tilde{x}}}_{1} + . . . + a_{k} {\bar{\tilde{x}}}_{k} + 0 & \Rightarrow a_{0} = \bar{\tilde{y}} - a_{1} {\bar{\tilde{x}}}_{1} - . . . - a_{k} {\bar{\tilde{x}}}_{k} \\ \Rightarrow c_{0}^{c} = \bar{\tilde{y}} - a_{1}^{c} {\bar{\tilde{x}}}_{1} - . . . - a_{k}^{c} {\bar{\tilde{x}}}_{k} . \end{matrix}

(40)

We have taken the error sum as zero without much loss of generality. Since

a_{0}

is available from

(40)

, we may rewrite

(39)

as follows:

{\tilde{y}}_{j} - \bar{\tilde{y}} = a_{1} ({\tilde{x}}_{1 j} - {\bar{\tilde{x}}}_{1}) + . . . + a_{k} ({\tilde{x}}_{k j} - {\bar{\tilde{x}}}_{k}) + ϵ_{j} - \bar{ϵ}, j = 1, . . ., n .

(41)

We may write all the equations in

(41)

together as

\tilde{U} = \tilde{Z} β + e

, where

\tilde{U} = [\begin{matrix} {\tilde{y}}_{1} - \bar{\tilde{y}} \\ ⋮ \\ {\tilde{y}}_{n} - \bar{\tilde{y}} \end{matrix}], \tilde{Z} = [\begin{matrix} {\tilde{x}}_{11} - {\bar{\tilde{x}}}_{1} & . . . & {\tilde{x}}_{k 1} - {\bar{\tilde{x}}}_{k} \\ ⋮ & . . . & ⋮ \\ {\tilde{x}}_{1 n} - {\bar{\tilde{x}}}_{1} & . . . & {\tilde{x}}_{k n} - {\bar{\tilde{x}}}_{k} \end{matrix}], e = [\begin{matrix} {\tilde{e}}_{1} \\ ⋮ \\ {\tilde{e}}_{n} \end{matrix}], β = [\begin{matrix} a_{1} \\ ⋮ \\ a_{k} \end{matrix}] .

Note that

\tilde{U}

is an

n \times 1

matrix,

\tilde{Z}

is an

n \times k

matrix,

β

is a

k \times 1

matrix, and

\tilde{e}

is an

n \times 1

matrix. Then, the sum of squares of the absolute values of the errors is the following:

{\tilde{e}}^{*} \tilde{e} = {(\tilde{U} - \tilde{Z} β)}^{*} (\tilde{U} - \tilde{Z} β) .

(42)

In the least square procedure, we minimize this error sum of squares of the absolute values of the errors, and then, estimate the parameter vector

β

. Note that

{(\tilde{U} - \tilde{Z} β)}^{*} = {\tilde{U}}^{*} - β^{*} {\tilde{Z}}^{*}

and

\begin{matrix} \frac{\partial {\tilde{e}}^{*} \tilde{e}}{\partial β} & = O \Rightarrow O - O - 2 {\tilde{Z}}^{*} \tilde{U} + 2 {\tilde{Z}}^{*} \tilde{Z} β = O \\ β & = {({\tilde{Z}}^{*} \tilde{Z})}^{- 1} {\tilde{Z}}^{*} \tilde{U} \end{matrix}

(43)

where we have assumed that

{\tilde{Z}}^{*} \tilde{Z}

is a nonsingular matrix because

{\tilde{x}}_{r j}

’s are preassigned numbers, and hence,

\tilde{Z}

can be taken as a full-rank matrix with rank

k < n

. Then, the estimated

β

, as per the least square estimate, again denoted by

β

, is

β = {({\tilde{Z}}^{*} \tilde{Z})}^{- 1} {\tilde{Z}}^{*} \tilde{U}

and the estimated model for

\tilde{y}

is

a_{0}^{c} + β^{*} \tilde{X}, {\tilde{X}}^{'} = [{\tilde{x}}_{1}, . . ., {\tilde{x}}_{k}], β^{*} = {\tilde{U}}^{*} \tilde{Z} {({\tilde{Z}}^{*} \tilde{Z})}^{- 1}

or the estimated

\tilde{y}

is

\tilde{y} = a_{0}^{c} + β^{*} \tilde{X} = a_{0}^{c} + {\tilde{U}}^{*} \tilde{Z} {({\tilde{Z}}^{*} \tilde{Z})}^{- 1} \tilde{X}

(44)

where

β

is available from

(43)

and

a_{0}^{c} = \bar{\tilde{y}} - β^{*} \bar{\tilde{X}}

,

\bar{\tilde{X}} = {[{\bar{\tilde{x}}}_{1}, . . ., {\bar{\tilde{x}}}_{k}]}^{'}

. This shows that the least square procedure in the complex domain also runs parallel to that in the real domain. If

\tilde{e}

is assumed to have an n-variate complex Gaussian distribution, then the inference problems also runs parallel to those in the real domain.

6. Concluding Remarks

In this paper, we have introduced vector/matrix differential operators in the complex domain. These differential operators in the complex domain are believed to be new. With the help of these operators, we have examined the optimization of a linear form with Hermitian-form constraint, optimization of a Hermitian form with linear form as well as Hermitian-form constraint, and optimization of a bilinear form with Hermitian-form constraints, where the linear forms and bilinear forms involve vectors and matrices in the complex domain. As applications of these optimization problems, we have extended principal component analysis and canonical correlation analysis to the complex domain. Also extended to the complex domain are the Cramer–Rao inequality, the Cauchy–Schwarz inequality, minimum-variance unbiased estimation, and least square analysis. If we use the general definition of a density

f (X)

as a real-valued scalar function such that

f (X) \geq 0

in the domain of X and

\int_{X} f (X) d X = 1

, where the argument X may be scalar or vector or matrix or a sequence of matrices in the real or complex domains [8], then the structures of the joint density, marginal density, conditional density, etc., will be parallel to those in the real domain. Then, we will be able to extend Bayesian analysis to the complex domain. One can also explore extending other multivariate statistical techniques such as factor analysis, classification problems, cluster analysis, analysis of variance, analysis of covariance, etc., to the complex domain. These are some of the open problems. Since the likelihood function L is a product of densities at the observed sample point, in the simple random sample case, this L will be a real-valued scalar function. Then, one can extend the maximum likelihood method of estimation to the complex domain. For example, in the p-variate complex Gaussian case, in the case of a simple random sample of size n, we have

ln L = c - n p ln | \tilde{Σ} | - tr ({\tilde{Σ}}^{- 1} \tilde{S}) - \sum_{j = 1}^{n} {({\tilde{X}}_{j} - \tilde{μ})}^{*} {\tilde{Σ}}^{- 1} ({\tilde{X}}_{j} - \tilde{μ})

where c is a constant,

\tilde{S}

is the sample sum of squares and cross-products matrix,

\tilde{μ}

is the population mean value, and

\tilde{Σ} > O

is the Hermitian positive definite covariance matrix. We have already established results on vector and matrix differential operators in the complex domain, operating on the trace and determinant involving a Hermitian positive definite matrix. Hence, all the terms in

\frac{\partial}{\partial \tilde{μ}} [ln L] = O

and

\frac{\partial}{\partial \tilde{Σ}} [ln L] = O

are defined and such equations are already solved in our discussion of our operators operating on traces and determinants. Thus, we will see that

\frac{\partial}{\partial \tilde{μ}} [ln L] = O

yields the sample average as the estimate/estimator of

\tilde{μ}

, and

\frac{\partial}{\partial \tilde{Σ}} [ln L] = O

, with the estimate on

\tilde{μ}

, yields the estimate/estimator of

\tilde{Σ}

as

\frac{1}{n} \tilde{S}

. One can also examine the maximum likelihood estimation involving other scalar/vector/matrix-variate densities in the complex domain. The above are some of the open problems.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

References

Deng, X. Texture Analysis and Physical Interpretation of Polarimetric SAR Data. Ph.D Thesis, Universitat Politechnica de Catalunya, Barcelona, Spain, 2016. [Google Scholar]
Du, L.; Liu, H.; Wang, P.; Feng, B.; Pan, M.; Bao, Z. Noise robust radar HRRP target recognition based on multitask factor analysis with small training data size. IEEE Trans. Signal Process. 2012, 60, 3546–3560. [Google Scholar]
Hellings, C.; Gogler, P.; Utschick, W. Composite real principal component analysis of complex signals. In Proceedings of the 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 2216–2220. [Google Scholar]
Horel, J.D. Complex principal component anslysis:theory and examples. J. Clim. Appl. Meteorol. 1984, 23, 21660–21673. [Google Scholar] [CrossRef]
Katkovnik, V.; Ponomarenko, M.; Egiazarian, K. Complex-valued image denosing based on group-wise complex-domain sparsity. arXiv 2017, arXiv:1711.00362vl. [Google Scholar]
Liu, J.; Xu, X.; Zhang, F.; Gao, Y.; Gao, W. Modeling of spatial distribution characteristics of high proportion renewable energy based on complex principal component analysis. In Proceedings of the 2020 IEEE Sustainable Power and Energy Conference (iSPEC), Chengdu, China, 23–25 November 2020; pp. 193–198. [Google Scholar]
Morup, M.; Madsen, K.H.; Hansen, L.K. Shifted independent component analysis. In Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation: ICA2007, London, UK, 9–12 September 2007; pp. 89–96. [Google Scholar]
Mathai, A.M.; Provost, S.B.; Haubold, H.J. Multivariate Statistical Analysis in the Real and Complex Domains; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mathai, A.M. Extensions of Some Statistical Concepts to the Complex Domain. Axioms 2024, 13, 422. https://doi.org/10.3390/axioms13070422

AMA Style

Mathai AM. Extensions of Some Statistical Concepts to the Complex Domain. Axioms. 2024; 13(7):422. https://doi.org/10.3390/axioms13070422

Chicago/Turabian Style

Mathai, Arak M. 2024. "Extensions of Some Statistical Concepts to the Complex Domain" Axioms 13, no. 7: 422. https://doi.org/10.3390/axioms13070422

APA Style

Mathai, A. M. (2024). Extensions of Some Statistical Concepts to the Complex Domain. Axioms, 13(7), 422. https://doi.org/10.3390/axioms13070422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extensions of Some Statistical Concepts to the Complex Domain

Abstract

1. Introduction

2. Optimization Involving Linear Forms, Traces, and Determinants

2.1. Optimization of a Linear Form Subject to Hermitian-Form Constraint

2.2. Optimization of a Hermitian Form Subject to Linear Constraint

2.3. Differentiation of Traces of Matrix Products and Matrix Differential Operator

2.4. Differentiation of a Determinant in the Complex Domain

3. Principal Component Analysis in the Complex Domain

4. Canonical Correlation Analysis in the Complex Domain

5. Covariance and Correlation in the Complex Domain

5.1. The Cauchy–Schwarz Inequality in the Complex Domain

5.2. Minimum-Variance Unbiased Estimators in the Complex Domain

5.3. Cramer–Rao-Type Inequality in the Complex Domain

5.4. Least Square Estimation in Linear Models in the Complex Domain

6. Concluding Remarks

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI