1. Introduction
The observations of many real-world systems can be studied as multiple time series. Provided that the pairwise relationships between the time series are implicitly or explicitly defined, it is common to refer to these data models as graph signals [
1]. In most cases, only one feature representing the pairwise relationships is considered. The pairwise relationships, which are not explicitly stated can assume some implicit value, such as having a zero covariance (i.e., being uncorrelated), or these relationships should be assumed to be undefined (i.e., unknown). The graph edges can also indicate statistical or causal dependencies [
2]. Consequently, graph signals can be defined as random vectors with incomplete knowledge of their statistics. The random variables then represent nodes of the graph, and the pairwise relationships are the graph edges.
The graph variables can be arranged into a random vector or matrix in an arbitrary order. One can also assume a graph search and follow a path over the graph edges through the graph nodes to construct the random vector. The random vectors and matrices can be conveniently processed using the well-established framework of linear algebra combined with methods in statistical signal processing [
3] and machine learning. In the literature on graph signal processing, the mainstream approach assumes a frequency domain representation of graph signals using the singular value decomposition (SVD) of graph adjacency, incidence, or Laplacian matrices [
4].
In general, a random vector or matrix is fully statistically described by a joint distribution of all the constituent random variables. A comprehensive survey of multivariate distributions can be found in [
5]. Mathematical expressions of multivariate distributions may contain the pairwise correlations explicitly, as is the case of multivariate normal distribution, but in most cases, the correlations can be calculated from other distribution parameters. The main challenge in fitting multivariate distributions to observed data is that the number of required observations to achieve a certain goodness-of-fit grows exponentially with the number of dimensions (the curse of dimensionality). In addition, multivariate distributions are often difficult to sample, and numerically complex algorithms are required to perform statistical inferences [
6].
A related problem of generating random variables with defined moments was considered in the literature. In particular, a linear transformation was utilized in [
7,
8] to generate nonnormal variates with the defined univariate skewness, kurtosis, and pairwise covariances. This method was extended in [
9] to assume multivariate skewness and kurtosis.
In this paper, it is assumed that a random vector is described by the marginal distributions and the pairwise correlations of its elements. The task is to define a generative model to efficiently generate multivariate samples satisfying the given statistical constraints without resorting to fitting a multivariate distribution to the observed data. It is proposed to approximate the unknown multivariate distribution with defined statistical constraints by a multivariate mixture distribution having independent marginals. In addition, it is shown that the component distributions of the marginal mixture distributions can be conjugate distributions. This approach enables a definition of a universal procedure for constructing generative models of multivariate graph signals that can be readily sampled. The limitation of the proposed procedure is that there is a tradeoff in how accurately the marginal distributions can be approximated and the achievable pairwise correlations. However, it is likely that the proposed generative procedure could be further modified to improve the tradeoff.
The rest of the paper is organized as follows. The research problem is stated in
Section 2. The generative model of graph signals is introduced in
Section 3. Numerical examples for bivariate distributions are presented in
Section 4. The obtained results are discussed in
Section 5, and
Section 6 concludes the paper.
2. Problem Statement
Assume that a sufficient number of discrete-time observations of N stochastic time series, , , have been collected, so the following quantities can be determined with a good accuracy:
marginal densities, of , for ; and (C1)
covariances, , for (C2)
where is a random variable representing the samples in the i-th time series, and denotes expectation. Note that the existence of time-invariant densities implies stationarity as well as knowledge of all the moments of univariate random variable . Furthermore, only continuous multivariate distributions are considered in this paper, and, unless otherwise stated, .
If the constraints C1 and C2 are determined from real-world observations with sufficient accuracy, both constraints are guaranteed to be consistent, and there must exist at least one corresponding multivariate distribution. However, given a set of marginal distributions and (unnormalized) covariances, there may be, in general, no multivariate distributions satisfying these constraints. The constraint C2 may be modified to assume the normalized correlation coefficients instead of covariances.
The joint moments including covariances and the corresponding Pearson correlation coefficients can be estimated from data by the method of sample moments [
10]. The marginal densities can be efficiently estimated from data by various nonparametric methods using histograms, kernels [
11], and diffusion [
12]. However, estimating the joint probability density of all
N variates
may be problematic, since it may require a very large number of observations, especially for more complex distributions in many dimensions [
13]. Consequently, given the marginal distributions
and the pairwise covariances
, the task is to construct a generative model for generating random samples of the vector,
, whose elements satisfy the constraints C1 and C2 given above. No other constraints are adopted in this paper, although it may be desirable to require that the generative model is also numerically efficient. In addition, the generative procedure to obtain random samples satisfying the constraints C1 and C2 should be sufficiently general in order to allow for different types of marginal distributions including the case of non-identical marginal distributions.
3. Constructing a Multivariate Distribution from Its Marginals
The constraints C1 and C2 do not uniquely define the joint distribution
. In particular, the marginal distributions can be used to obtain all general and central moments of individual random variables
, whereas the pairwise covariances are the only joint statistics assumed to be known. Provided that the marginals
are of the same and more common type, the corresponding multivariate density may have been identified in the literature [
5]. However, even if the mathematical expression of the desired multivariate distribution
is available, it may be too complex to sample from, or to accurately fit to the observations using, for example, the least squares regression or other parameter estimation methods [
3]. Another strategy, which is investigated in this paper, is to construct the joint distribution
f from the known marginals
,
under the covariance constraints.
Proposition 1. The join density f with the given marginals , , under mild covariance constraints can be well approximated by the mixture distribution,of K joint component densities , and the weighting factors, and . The mixture decomposition (
1) of
f again requires that not only all the components
are identified but also that they can be effectively sampled from. In order to overcome the latter challenge, it is newly proposed to adopt an independence assumption, and express the marginals
as a product of the individual densities, i.e., the mixture decomposition (
1) is rewritten as,
The advantage of assuming the mixture decomposition (
2) is that it is generally much easier to sample from univariate than from multivariate distributions. The disadvantage of decomposition (
2) is that the independence assumption limits the achievable pairwise correlations between variables
.
Denote
, and
. The marginal distributions corresponding to decomposition (
2) are obtained as
whilst the bivariate marginal densities are computed as
The corresponding mean value is
and the second moments are computed as
Consequently and importantly, for and , the pairwise covariances are given by the mixture coefficients and the mean values of the component distributions .
Given the marginals
and the covariances
, the values of
and
must satisfy Equations (
5) and (
6). This represents a system of
equations with
unknowns, where
is a binomial coefficient. Provided that the
K coefficients
can be determined by
N univariate decompositions (
3), the number of unknowns can be reduced to
. Then the number of degrees of freedom as a difference between the number of equations and the number of unknowns, i.e.,
, versus the distribution dimension
N is shown in
Figure 1 for several different values of
K. The points above the horizontal dashed line in
Figure 1 indicate the underdetermined cases when the number of unknowns is greater than the number of constraints. The unique solution may exist, if
, i.e., when
N is odd.
Bivariate Case
For
and
, the number of degrees of freedom,
. Assuming two random variables
X and
Y, the mixture decomposition (
2) can be rewritten as,
where
. Note that, for
or
, the variables
X and
Y are assumed to be independent, and thus, uncorrelated. The corresponding marginal distributions are the mixtures
having the following first and second order statistics
where the mean and the variance of
are denoted as
and
, respectively. Similar notation is used for the other component distributions. Note that the correlation between the variables
X and
Y increases with the difference of the means of the corresponding component distributions. Conversely, the variables
X and
Y are uncorrelated, if either the means
, or
. The expression,
, is maximized for
.
Since
, the mean values of the component distributions in (
7) can be expressed as functions of one parameter. For example, choosing
as such parameter, the other means are computed as
by solving the equations for
,
and
given in (
9). Substituting the expressions in (
9), the Pearson correlation coefficient is computed as,
where
and
. Consequently, the combined variances
and
limit the achievable correlations between the variables
X and
Y in the generative model (
7). Only when
, can the correlation coefficient reach the maximum magnitude,
.
The mixture decompositions of marginals defined in (
8) can be obtained using different strategies. The marginal distributions defined by the constraint C1 can be common univariate distributions, or they can be defined as the univariate mixture distributions from the onset. The latter case can be resolved by curve fitting to the observed data, so here, we investigate the former case.
Proposition 2. The marginal distributions (8) can be approximated by conjugate mixtures. The conjugate mixtures are of the same type as the resulting marginal distributions, but they have their parameters determined by the constraints (9). Hence, assume that the mixture distributions
and
in (
8) are obtained by a linear transformation of the marginal distribution
, i.e., let [
14]
where the shifts
, whereas the scaling
and
, for some small
, to satisfy the variance constraint in (
9). The marginal distribution
is approximated similarly, and independently from
.
Substituting (
12) into (
8), the value of the mixture coefficient
can be determined to optimize the goodness of fit. In particular, the conjugate mixture distributions can be locally linearized about
, as indicated in
Figure 2. Then, for
, the distributions can be approximated as linear functions, i.e.,
where
g,
, and
are the gradients,
o is the common offset, and
are the shifts. Substituting into (
8), we obtain
which is crucially independent of the actual value of
. In the case when
, a rule of thumb for choosing the value of mixture coefficient
is obtained as
so that,
as in (
9). Hence, the value of
can be chosen somewhat arbitrarily as long as the condition (
15) and the constraints (
9) are satisfied.
Alternatively, for conjugate mixture components, the decomposition (
8) can be rewritten as,
Equation (
16) can be assumed to be a linear digital filtering of the signal
in variable
X. The filter coefficients
and
are separated by
. The approximation (
16) is then more exact, provided that the filter does not distort the filtered signal
, i.e., when the filter bandwidth is wider than the bandwidth of the signal [
15]. The signal and filter bandwidth are determined by the magnitude of the Fourier transform. In particular, since the signal
is also a distribution, we can assume the characteristic function of
, i.e.,
,
, which is known or can be obtained for most univariate distributions. The filter bandwidth is obtained by computing the magnitude of its transfer function
, i.e.,
4. Numerical Examples
The case of the following three bivariate distributions is considered: normal, gamma, and normal-exponential distributions [
14]. Although generating correlated normal samples is straightforward, which is a rare exception among multivariate distributions, the normal distribution is mainly considered to validate the proposed generative model.
The first experiment investigates approximations of the selected univariate distributions by a mixture of the two component distributions defined in (
8), i.e., the approximation,
, of
. The approximation accuracy is quantified by the Kullback–Leibler (KL) divergence between the target distribution
and its mixture approximation
. The KL divergence is defined as,
Moreover, using Jensen’s inequality for logarithm, it is straightforward to show that
In order to reduce the number of free parameters, the mixture components are assumed to be the conjugates of the target distribution, i.e.,
and
are of the same type as
, and have the means,
, and
, where
. Consequently,
in all the experiments, in accordance with (
15).
In the case of normal distribution, there are two distribution parameters, i.e., the mean
and the variance
. In order to account for the variance constraint in (
9), the variances of the component distributions
and
have been equally reduced to,
,
. The gamma distribution also has two parameters, i.e., the shape
, and the scale
. Given the scale
, the shapes of the two component distributions
and
are set to,
, respectively. The normal-exponential distribution (or, exponentially-modified normal distribution) is described by three parameters, i.e., the mean and the variance of the normal distribution and the rate of the exponential distribution. The variance of the normal distribution of both components
and
was reduced to
, and the variance of the exponential distribution was left unchanged.
The KL values for all three distributions considered are shown in a log-scale in
Figure 3. It is observed that the approximations
of
are visually accurate for the log-KL values below
. Hence, the mixture approximation
is rather accurate for some parameter values of the target distribution, and mainly for smaller displacements
, as expected.
Next, we investigate the achievable magnitudes of the correlation coefficient between the random variables
X and
Y which are both generated by the mixture approximations (
8). The same three marginal distributions are considered, i.e., normal, gamma, and normal-exponential distributions with the same parameters as in
Figure 3. Here, the benefit of defining the bivariate distributions as the mixtures (
7) to generate correlated random samples becomes evident. In particular, with the probability
, the distributions
and
are sampled independently, and with the probability
, the samples
X and
Y are independently generated from the distributions
and
, respectively. Thus, the correlated bivariate samples are generated by independently sampling from the four univariate distributions
,
,
, and
. The generation of normal samples is trivial, and readily available in many numerical software packages. For gamma distribution, the generator of gamma samples is either available (e.g., in Matlab as function
gamrnd), or the gamma random number generator can be constructed [
16]. Finally, the normal-exponential distributed samples are simply the sum of the two underlying distributions.
The achievable correlation coefficient has been measured empirically for
bivariate random samples. The results are shown in
Figure 4. The curves are in a good agreement with the theoretical values given by expressions (
11). More importantly, the following conclusion can be drawn from comparing the corresponding results in
Figure 3 and
Figure 4. The accurate approximation of the marginal distributions by the proposed generative mixture model limits the achievable values of the correlation coefficient to about
or
, depending on the specific distributions and their parameters considered.
5. Discussion
The full statistical description of multiple time series may be difficult or impossible to obtain from a limited number of the observations. The incomplete statistics give raise to interesting signal processing problems involving graph signals. The graph signals can be more formally defined as follows.
Definition 1. The graph signal is a set of random variables representing space-time observations of stochastic processes or phenomena, for which some marginal or conditional distributions and some statistical moments are known. However, the statistical description is incomplete in the sense that a full joint distribution of all random variables cannot be uniquely determined from prior knowledge and the available observations.
Provided that graph signals are represented as random vectors or matrices, many techniques of statistical signal processing or even machine learning can be used. For instance, Bayesian inference of parameters and unobserved model states requires a full statistical description of observed samples as the joint density or conditional density. Learning the model may require generating enough labeled data, which are consistent with the observations. Knowledge of the joint distribution is also important in controllability and observability of stochastic systems, for instance, to make optimum decisions under uncertainty. In these cases, constructing a generative model of graph observations and graph responses is crucial.
In this paper, the generative model of multiple random observations is constrained by the marginal distributions and the second order statistics. This does not uniquely define the corresponding joint distribution, but it can be further constrained by other higher order moments [
7,
8]. The behavior of the higher and lower order moments is described by Hölder’s inequality [
17]. An open research problem is whether there always exists at least one multivariate distribution given the set of marginal distributions and the pairwise correlations or covariances; it is guaranteed to exist if the observations have been obtained from a real-world system or a simulated model. There are multivariate distributions such as the multivariate Cauchy distribution for which the correlations cannot be defined. In addition, the marginal or conditional distributions may have some of their parameters undefined which increases the number of degrees of freedom. The unknown parameters could be then estimated from the known moments or other known statistics. Furthermore, practical problems often require generating the samples that are correlated in more than one dimension.
In general, there is a tradeoff between accurately approximating the marginals as mixture distributions and the achievable magnitude of the correlation coefficient as indicated by Equation (
11). This has been observed in numerical examples involving two random variables assuming conjugate components in the mixture approximations of marginal distributions. The main advantage of the proposed generative model is that it can be readily sampled. The accuracy–correlation tradeoff could be improved by assuming nonconjugate mixture component distributions or by considering other types of generative models, albeit at the cost of the sampling efficacy. Alternatively, the pairwise covariance of samples
and
can be reduced by simple averaging. In particular, let
, and,
, and assume that the samples
and
are otherwise uncorrelated. Thus, let
, for
, whereas
. It is then straightforward to show that the correlation coefficient defined in (
11) changes to
Another strategy worth exploring is to investigate the kernel approximations of multivariate densities [
11] and also the bounds of these densities. For instance, for any sets
A and
B, the joint probability can bounded as
Then, for
and
, the joint cumulative function
can be bounded as
For
, the bivariate joint cumulative density is obtained by integrating Equation (
2), i.e.,
Assuming Equation (
22), it can be bounded as
Moreover, in many stochastic systems, the densities evolve in time, so the discrete mixture (
1) could be rewritten for the case of continuous time
t as
where
is another probability distribution, i.e.,
and
. The expression (
25) then represents a mean-time density of the system response.
Lastly, the generative model constructed in this paper exactly fits the first and the second statistical moments whilst approximating the marginal distributions. An alternative strategy to construct a generative model may instead emphasize fitting exactly the marginal distributions and relaxing the constraints involving the statistical moments.