1. Introduction
The atmospheric motion is rich in scale. In many cases, the formation of weather and climate patterns can be attributed to the interaction between a few scale ranges such as that between synoptic eddies and the jet stream, or that among the synoptic eddies, the low-frequency variability, and the mean flow, just as the storm track in the Northern Hemisphere (e.g., [
1,
2,
3]), the blocking high (e.g., [
4,
5]), the sudden stratospheric warming, e.g., [
6,
7], and references therein), the North Atlantic Oscillation (e.g., [
8,
9,
10,
11,
12,
13]), to name but a few. This makes multiscale interaction a central issue in dynamic meteorology.
An important problem in multiscale interaction is how energy is transferred across scales; this transfer is closely related to the fundamental processes in atmospheric flows, namely, barotropic instability and baroclinic instability. The extratropical atmosphere is special in that, while on the whole the available potential energy is cascaded toward smaller scales, the kinetic energy is inversely transferred upward to larger scales (e.g., [
14,
15]). More specifically, there exists a symbiotic relationship between the synoptic-scale and planetary-scale disturbances. In a 3-scale setting, Cai and Mak [
1] found that the former are produced and maintained through extracting energy from the zonal flow via Reynolds stress; they then supply energy to the latter through upscale energy transfer, while the latter form regions of enhanced baroclinicity where the former preferentially grow. This kind of energetic cycle has been employed to interpret many weather/climate phenomena. In the case of atmospheric blockings, for example, Holopainen and Fortelius [
16] identified an enhanced transfer of eddy kinetic energy (KE) to the mean flow over the storm tracks. Hansen and Chen [
17] found that the nonlinear interaction between the cyclone-scale and planetary-scale waves is essential to the Atlantic blocking, while baroclinic amplification plays the most important role in the formation of the Pacific one. After the maintenance stage, the eddy kinetic energy is converted back to the eddy available potential energy (APE), leading to the decay of the event [
18].
Another example is the inverse relationship between the boreal wintertime Pacific jet strength and storm-track intensity. It has long been observed that, in winters when the jet stream over the North Pacific is extremely strong, the storm track is however unexpectedly weak. This observation, which is at odds with the prediction of the classical linear baroclinic instability theory, has caught enormous attention from the atmospheric community. For example, based on energy budget diagnostics, Chang [
19] and Nakamura and Sampe [
20] found that, in that case, the synoptic waves tend to be trapped in the upper troposphere and hence are less efficient to tap APE from the background baroclinicity. In some other studies, it has been argued (e.g., [
21,
22]) that the inverse KE transfer may suppress the APE release, and hence contribute to the inverse relationship. Considering that atmospheric processes, cyclogenesis in particular, tend to be localized in space and time, Liang and Robinson [
23] established, on the basis of the multiscale window transform of Liang and Anderson [
24], a methodology for local multiscale atmospheric energetics analysis. The resulting interscale energy transfers turn out to bear a Lie bracket form, reminiscent of the Poisson bracket in Hamiltonian dynamics, and, furthermore, it naturally has the property that the energy redistributed among scales remains conserved (see [
25]). To distinguish them, these energy transfers have been called “canonical transfers” ever since. With them, it is shown that, at each spatial location, there exists a local Lorenz cycle which allows one to trace the origin of the observed, if any, energy burst processes. Recently, by analyzing the local Lorenz cycles, it is found that, in boreal winters, the storms over the North Pacific actually are generated at latitudes far northward of the jet core [
26]. This greatly lowers the relevance of the jet strength to the storm-track intensity, and hence the inverse relationship is actually not much a surprise. That is to say, the linear baroclinic instability theory may still hold [
27], and the inverse relationship should be attributed to internal dynamics.
Energy and entropy are two parallel basic notions in physics. Naturally, the transfer of entropy, i.e., information transfer or information flow, across scales must make another fundamental problem in multiscale interaction. This problem, however, has been mostly overlooked in the past, mainly due to a lack of appropriate theory and research methodology. (So far, only a few studies have touched this issue, e.g., [
28,
29]). In physics, entropy may make an objective functional to be optimized to determine or regulate the distribution of energy. Without knowledge of the information flow across scales, the understanding of multiscale interaction is then incomplete. For example, in a classical energetics analysis, the emergence of a structure on another scale is driven by Reynolds stress or Reynolds stress-like quantities, which are essentially correlations between perturbation fields (e.g., [
30]). While causation implies correlation, correlation does not necessarily imply causation ([
31]; see below in
Section 2. Thus, it is likely that an interaction which has a two-way energy flow may in fact have a one-way causation (as the example in the following; see below). We hence want to give this an investigation, in the hope of shedding some light on the dark side of the multiscale interaction problem. We emphasize that this is just the first step to exploring a rather profound field; it is by no means our intention to solve all the problems. In the following we first give a brief introduction of a recently developed theory of information flow, then introduce the atmospheric model (
Section 3) and its solution. The model is reduced approximately to a low-dimensional dynamical system (
Section 4), and with it the information flows across the scales are computed through ensemble prediction (
Section 5). As we will see, remarkably, the system nearly has a bottom-up causation, consistent with how a reductionist in evolutionary biology would view the emergence of a higher-level organization out of independent, lower-level entities. This study is summarized in
Section 6.
2. A Brief Review of the Theory of Causation and Information Flow That Pertains to This Study
Causal inference is a problem lying at the heart of science. In many disciplines, it makes a direct research objective. It is also an important topic in philosophy, as it forms a “guide to higher understanding” [
32]. However, it is very challenging; in fact, it has been identified as “one of the biggest challenges” in the science of big data [
33]. During the past thirty years, it has come to a consensus that the widely applicable notion in physics, namely, information flow or information transfer as it may appear in the literature, is logically associated with causality: The latter is the key to the former, while the former provides not only the magnitude but also the direction for the latter. Realizing that information flow is a real physical notion, Liang and Kleeman [
34] put the problem on a rigorous footing, and obtained in a closed form the information flow between the components of a 2D dynamical system. This formalism was soon generalized by Majda and Harlim [
35] to a setting with two subspaces. Recently, it was successfully extended by Liang [
36] to systems of arbitrary dimensionality. The following is a brief review of the work that pertains to this study.
We begin by stating a principle or an observational fact about causality:
If the evolution of an event, say, , is independent of another one, , then the causality from to is zero.
Since it is the only quantitatively stated fact about causality, all previous empirical/half-empirical causality formalisms have attempted to verify it in applications. Considering its importance, it has been referred to as the
principle of nil causality [
36]. Recently, Smirnov [
37] systematically examined the traditional formalisms, i.e., transfer entropy analysis and/or Granger causality testing, and found that they cannot verify the principle in a wide range of situations; similar conclusions have been drawn by Lizier and Prokopenko [
38]. We will see soon below that, within our framework, this principle turns out to be a proven theorem.
Now, consider an
n-dimensional continuous-time stochastic system for state variables
where
may be arbitrary nonlinear functions of
and
t,
is a vector of white noise, and
is the matrix of perturbation amplitudes which may also be any functions of
and
t. Here, we adopt the convention in physics and do not distinguish deterministic and random variables; in probability theory, they are usually distinguished with capital and lower-case symbols. Assume that
and
are both differentiable with respect to
and
t. We then have the following theorem [
36]:
Theorem 1. For the system (1), the rate of information flowing from to (in nats per unit time) iswhere signifies , E stands for mathematical expectation, , is the marginal probability density function (pdf) of , is the pdf of conditioned on , and . If
, then
is not causal to
; otherwise, it is causal, and the absolute value measures the magnitude of the causality from
to
. For discrete-time mappings, the information flow is in a much more complicated form; see [
36].
In the absence of noise, this is precisely the result of [
34] based on a heuristic argument.
There is a nice property for the above information flow:
Theorem 2. (Principle of nil causality)
If in Equation (1) neither nor depends on , then . Note that this is precisely the principle of nil causality. Remarkably, here it appears as a proven theorem, while the classical ansatz-like formalisms fail to verify it in many problems (e.g., [
37]).
In the case with only two time series (no dynamical system is given), we have the following result [
31]:
Theorem 3. Given two time series and , under the assumption of a linear model with additive noise, the maximum likelihood estimator (mle) of the rate of information flowing from to iswhere is the sample covariance between and , and the sample covariance between and a series derived from using the Euler forward differencing scheme: , with some integer. Equation (
4) is rather concise in form; it only involves the common statistics, i.e., sample covariances. In other words, a combination of some sample covariances will give a quantitative measure of the causality between the time series. This makes causality analysis, which otherwise would be complicated with the classical empirical/half-empirical methods, very easy. Nonetheless, note that Equation (
4) cannot replace (
3); it is just the mle of the latter. A statistical significance test must be performed before a causal inference is made based on the computed
. For details, refer to [
31].
Considering the long-standing debate ever since George Berkeley in 1710 over correlation versus causation, we may rewrite (
4) in terms of linear correlation coefficients, which immediately implies [
31]:
In fact, suppose there is no correlation between
and
,
; then, by Equation (
4),
. However, from
, one cannot come up with
.
Causality can be normalized in order to reveal the relative importance of a causal relation. However, the normalization is by no means as trivial as that for covariance, considering that information flow is asymmetric in direction (
in general), and, in addition, there is no such property like a Cauchy–Schartz inequality that makes it possible for covariance to be normalized. In [
40], a way of normalization is given, but a complete solution is yet to be sought.
The above formalism has been validated with many benchmark systems (e.g., [
36]) such as Baker transformation, Hénon map, Kaplan–Yorke map, and Rössler system, to name a few. Particularly, Equation (
4) has been validated with touchstone problems where the traditional Granger causality test and transfer entropy analysis fail. An example is the highly chaotic anticipatory system problem described in [
41], which with Equation (
4) turns out not to be a problem at all.
The formalism has been successfully applied to the studies of many real world problems, among them are the causal relation between El Niño-Indian Ocean Dipole [
31], tropical cyclone genesis prediction [
42], near-wall turbulence [
28], global climate change ([
43,
44]), and financial time series analysis [
40], to name but a few. Here, we particularly want to mention the study by Stips et al. [
43] who, through examining with Equation (
4) the causality between the CO
index and the surface air temperature, identified a reversing causal relation with time scale. They found, during the past century, that CO
emission indeed drives the recent global warming; the causal relation is one-way, i.e., from CO
to global mean atmosphere temperature. Moreover, they were able to find how the causality is distributed over the globe, thanks to the quantitative nature of our formalism. However, on a time scale of 1000 years or over, the causality is completely reversed; that is to say, on a paleoclimate scale, it is global warming that causes CO
concentration to rise!
5. Information Flow between the Scales of the Model Atmosphere
As we showed above, the interactions among the first four EOF modes can be utilized to study the multiscale interactions typical in the problem of concern, as the modes occur on different time scales. In order to examine the information flow between the modes, we make random draws for
from a pool of values, and then, starting from these initial conditions, run forward the system to generate an ensemble of solutions. Assume that the initial values obey a normal distribution with a mean vector
and a
identity covariance matrix. Here, the variance is set rather small in order for the trajectories to stay under effective control. The sample space is assumed to be
, which makes sense if we do not make too long an integration, as made evident in
Figure 10, where the trajectory of a sample path is plotted. The space is discretized using a spacing
(the same for the four dimensions), and the probability density functions are then estimated at each time step by counting the bins in the coarse-grained space.
To compute the information flows among the four components, recall the deterministic version of Equation (
2)
When the system is initialized with values in a rather limited domain, the integration can be easily evaluated. In this study,
is replaced by
. The computed information flow evolutions vs. time are plotted in
Figure 11.
For a system with four components, by expectation, there are in general
information flows. As we have shown before, the four components make two pairs, i.e.,
and
, which essentially represent two scales. Thus, the cross-scale information flows are those between modes (1,2) and modes (3,4). In
Figure 11,
and
are overwhelmingly large (note the different scale range in the first two subplots); second to them are
,
. By the property of causality (ideally nonzero information flow implies causality), that is to say,
and
are mutually causal, and so are
and
. These are the information flows within their respective scales. These causal patterns are similar to that between the displacement and linear momentum of a harmonic oscillator, as is shown in Liang [
36]. From the table of
indeed to the first order, the system is like
just as the computed
and
would imply.
The other information flows are interscale. Strictly speaking, there exist flows in both directions (small-scale⟶large-scale and large-scale⟶small-scale). However, by observation, and are much larger than others. This asymmetric flow structure indicates that the causation between the scales are dominantly one-way, i.e., from higher frequency modes (modes 3 and 4) to lower frequency modes, modes 1 and 2.
It should be mentioned that what has been solved is actually the QG equation for the perturbation field; the mean flow is not included in the four components of the reduced system. However, the influence has been embedded in the system. Here, we give it an evaluation.
For notational convenience, let
denote the “mean component.” Since here the mean flow is prescribed, it does not vary in time, so there is no way to examine the influence of other components on it. That is to say, there is no base to study
, but nonetheless we can evaluate
,
. We know, from the Bayes’ rule, that
Since the mean flow is prescribed, it is certain;
is hence in fact
. Thus, the whole term is equal to 1. This substituted into (
48) yields
by the compactness of
. That is to say, the information flow between the mean flow and the higher frequency components, if it exists, cannot be toward higher modes. In other words, if existing, it must be one way, i.e., in the direction upward toward the mean.
It should be emphasized that, generally, the mean flow should also have a distribution, and hence the information flow may not be this easy to evaluate. However, in this case, as we have shown in the preceding section, the variation around the mean is so small that it can be neglected in forming the low-dimensional system. Anyhow, for this particular case, by computation the information flow, and hence causation, is essentially one-way, i.e., from high frequency modes to low frequency modes.
We want to mention that here EOF analysis has been used to reduce the model order. The advantages of using it include its orthonormality, the maximization of variance toward lowest modes, etc. The limitations of this approach are also well known. The most serious one is that the EOF modes may not be real modes in the physical sense. Here, what we are investigating is the information exchange between processes on different temporal scales, and, fortunately, the principal components of the lowest modes do reflect such temporal variabilities (
Figure 9). However, in a more general situation, this may not be true. We hope some advanced methods, such as the recently developed method by Majda and Qi [
50] to efficiently reduce models, can help here.
An alternative approach is to use Equation (
4) to estimate from data, rather than directly compute, the information flow, and hence avoid solving a large-dimensional Liouville equation (the curse of dimensionality). However, here comes another issue: Theorem 3 relies on the assumption of Gaussianity. Though (
4) has also been successfully applied to some highly nonlinear systems, e.g., the chaotic anticipatory system in [
41] (see [
31]), caution should be used, as non-Gaussianity may appear significant in realistic atmospheres. However, anyhow, these are topics for future studies; here as the first step, we only consider what we have generated with the QG model.
6. Discussion and Conclusions
How processes on different scales interact to form weather and climate patterns is one of the central issues in dynamic meteorology. Traditionally, it is studied by diagnosing the exchange of energy (such as the Lorenz cycle), or, equivalently, momentum/angular momentum, between the scales. However, it has long been realized that just multiscale energetics based on the governing equation may not be enough. In a nonlinear dynamical system, as time moves on, two highly correlated events may soon lose correlation, while, on the other hand, two completely irrelevant events could turn out to be correlated in the end. As remarked by Corning [
51], the underlying causal efficacy may actually be missing in the equations or “rules”. In addition, in the classical multiscale formalism, cyclogenesis is driven by Reynolds stress, which is essentially the linear correlation between the perturbation fields. As we proved earlier on, while causation implies correlation, correlation does not necessarily imply causation. That said, the traditional perspective on the problem may be limited.
In physics, entropy is another concept as important as energy. The transference of entropy results in a flow of information, but how information flows or transfers across scales has been overlooked in dynamic meteorology, in contrast to the extensively studied energy transfer. Recently, information flow has been rigorously formulated in the framework of dynamical systems; it proves to satisfy the “principle of nil causality” (see [
36]), an observational fact which people endeavor to verify in real applications. In this study, this formalism is applied to study the information flow among the scales within a three-dimensional quasigeostrophic (QG) circulation. The basic flow is a zonal jet mimicking the atmospheric jet stream. We chose a period when the system is in equilibrium with an energetic scenario typical of a mid-latitude atmosphere: the mean state is releasing available potential energy to eddies, while the latter feeds kinetic energy back to the mean state. We first solved the 3D QG equation; then, for the period of concern, performed a principal component analysis and obtained the EOF modes to construct a basis. It has been shown that these modes characterize the desired temporal scales. The state variable, i.e., streamfunction, is then expanded with the aid of the basis, and the expansion is truncated at the fourth term. By inverting a 3D elliptic differential operator, the QG equation is converted into a four-dimensional dynamical system. The study of the information flows among the scales is then converted into the investigation of the information flows among the components of the low-dimensional system.
Initialized with an ensemble of streamfunctions drawn randomly according to a normal distribution, the system is integrated forward and, at each step, a probability density function is estimated, which, by Formula (
2), allows us to obtain the desired information flow pairs. By computation mode 1 and mode 2, which represent the long temporal scale, are mutually causal, functioning like the components of a 2D harmonic oscillator; this is also the case for mode 3 and mode 4 that represent the motion on a short scale. These are the information flows within their respective scales. The interscale flows are significant only for that from mode 4 to mode 1 and that from mode 3 to mode 2, i.e., from modal pair (3,4) to modal pair (1,2). In addition, the possibility that the mean state has information flow to these four modes are excluded. That is to say, for this particular problem, the information flow is mostly one-way—from higher frequency modes to lower frequency modes. Hence, for this particular problem, underlying the multiscale interaction is mostly a bottom-up causation.
The bottom-up causation, or the information flow from the low levels to higher levels, is actually seen in many natural and social phenomena. In investigating the transition in biological complexity, for example, a reductionist will view the emergence of new, higher level, aggregate entities as a result of lower level entities (e.g., [
52,
53,
54]). Similarly, it is found that some simple computer networks may transit from a low traffic state to a high congestion state, entailing a flow of information from a combination of independent objects to a collective pattern representing a higher level of organization. Most of all, in statistical physics [
55,
56], bottom-up causation lays for it the theoretical foundation, based on which the macroscopic thermodynamic properties can be tracked back to random molecular motions.
However, we did not exclude the existence of information flow the other way around; it is just weak by comparison in this example. Top-down causation has been found in many fields. For example, in community ecology, it has been argued that host community-level structures may determine the disease dynamics and hence control the constituent populations (e.g., [
57]). Nonetheless, here we showed that a prescribed mean flow seems to be unlikely to have information flow to the anomalies.
Of course, the result here is just for a particular case with a reduced-order model; in reality, the problem could be very complicated, depending on the stage where the evolving state is. In addition, for simplicity, we have adopted a rigid-lid assumption on the top, and an idealized boundary condition (
, i.e., no density perturbation) at the bottom, although the simplified model does reproduce the desired downward transfer of available potential energy and upward kinetic energy. Nonetheless, the resulting interaction scenario is encouraging, in agreement with those in complex systems, although it is quite different from the corresponding energetic cycle. This result, though preliminary at this stage, may help better understand the mean flow–eddy interaction, gain deeper insight into the phenomena such as cyclogenesis, atmospheric blocking, sudden stratospheric warming, to name a few. On the other hand, the asymmetric causation (mostly bottom-up) provides an observational basis for the parameterization of the subgrid processes in numerical models, such as the stochastic closure scheme of Majda et al. [
58]. All of these are interesting and deserve further investigation. We want to emphasize that information flow is a large field in atmospheric research, and this present study makes only a first attempt; much is yet to be explored in the future.