1. Introduction
The Shannon entropy plays an important role as a descriptive statistic in various disciplines linked to the spatial domain, e.g., ecology, social sciences, urban planning [
1,
2,
3,
4] but often without entirely taking into account all the characteristics of the spatial or the spatio-temporal dimension as already proposed [
5,
6,
7,
8,
9,
10]. Nevertheless, the focus and motivation are often intended for the quantification the spatial or spatio-temporal structuring of the information provided by a categorical variable of interest. Entropy, as measuring the level of homogeneity and randomness, has been seen in the literature as a good candidate. There are many different alternative approaches to entropy, for example see [
11,
12] in the context of spatio-temporal clustering which can provide ways of understanding the structuring of the data, though, not necessarily with a direct way of quantifying it. Our purpose in this paper is to propose a framework that would reconcile classical approaches involving entropy as a metric with more recent literature [
5,
6,
7,
8,
9,
10]. The goal is how to better take into account the spatio-temporal embedding of the information that would accommodate an entropy approach.
In the classical approach, an underlying spatial ’structural support’ is usually considered, using a categorical variable
S identifying a set of sub-regions of the whole studied region. For socio-economic studies
S is often a fixed set of administrative units often linked to population sizes. In geographical studies it can be preferred to use either regular grids or elaborated units based for example on land conditions or climate, e.g., agro-ecological zoning systems, [
13,
14]. Then, for a given statistic that can be mapped to each sub-region (i.e.,
), such as fraction of coverage of a specific land cover, frequency of unemployment, number of public buildings, it is possible to quantify the spatial structuration of that category
c by the Shannon entropy:
where
is the proportion of
c’s that are in the sub-region
s, i.e.,
is the number of entities with the characteristic
c found in sub-region
s among
in the whole population of entities
N (e.g., persons). For land cover
is the fraction of land occupation in
s but relative to the whole studied area; one might be also interested in the
, the fraction of
c within
s and so the entropy
. Note the notation
is without ambiguity referring to the joint distribution of
S and
C,
(a matrix), as well as
to the distribution of variable
C, i.e., the vector
and idem for
. So, Equation (
1) is the entropy of the conditional distribution of the categorical variable
S, knowing
. The categorical variable
C expresses
c as one of its category, e.g.,
C corresponds to a land cover classification, to a socio-professional indicator, to a building typology or a simple dichotomy between cases and at risk of a specific disease.
It provides a quantification describing the repartition of each single category
c, spatially across the sub-regions, e.g., the entropy is maximum if the distribution is uniform, or, reaching very small values when segregation in a few sub-regions occurs (0 if
c is concentrated in only one sub-region
s). However, the spatial organisation of the region in
s sub-regions is not taken into account. Any permutation of the values would give the same entropy, only the semantic attached to the sub-regions is rooting down a spatial understanding for
c. Nonetheless, a sub-region system as such, often represents a level of aggregation of the observations within each sub-regions
s. The number of sub-regions may be too small to convey sufficient statistical information about
topological information between the sub-regions and multiple scale integration may be looked for in the regioning system. Here, ‘topological’ is understood as spatial organisation and configuration, e.g., proximity, connection, homogeneity, between and within observations or units where the observations are in. The interplay of proximities of categories and multiple co-occurrences have been proposed to define spatial entropy and spatio-temporal entropy measures [
15] but do not pertain easily spatial or spatio-temporal graphical representations even though local indices are possible. From (
1) a further decomposition of the bivariate information
(see
Section 2) expresses the role of the spatial support
S. Despite not being able to fully provide a spatial entropy measure for
C, it is a useful tool when focusing on characterising a regional system
S or comparing two regional systems
S and
, say encompassing a change of scale, for a range of economical and socio-demographic variables (i.e., a series of variables like
C). Questions such as “which spatial scale provides the most or least disparity?” can then be approached. It is not particularly useful when no
a priori spatial regional system makes sense as in landscape ecology but the scale aspect does. So, the decomposition approach constitutes a basis for a framework to the spatial or spatio-temporal information related to
C when using appropriate spatial or spatio-temporal descriptors that leads to a range of spatio-temporal entropy measures (see
Section 3).
Landscape ecology has provided a range of spatial and topological descriptors, e.g., richness, adjacency, patchiness, connectivity, that help to describe how the spatio-temporally information from a categorical variable
C is organised and its role into understanding associated ecological processes [
1,
16], including the role of entropy [
17]. The temporal evolution of sizes and shapes of patches
per categories of a variable
C are the consequences of the underlying spatio-temporal processes involved. Therefore, depicting the information structuring of these spatial-temporal descriptors in interaction, using the entropy, would contribute to this endeavour. Instead of using external spatial descriptors, linked to a fixed spatial support as with the above description of
S, this paper proposes to use the variables patch-size,
and patch-shape
to be combined with the information from
C in order to decompose their joint entropy.
A spatial patch can be defined as an homogeneous zone according to a category c and can be also understood as a cluster. When observations are recorded per elementary units with proportions falling in that unit (also known as compositional data), a patch may be defined using a minimum proportion for the same category c, i.e., enough observations with a category c in one unit then considered as a patch or part of a bigger patch. For compositional data, the patch can take into account a fuzziness (as a degree of membership of a patch) due to decreasing values of the proportion of the category c. Note that with such compositional data, patches of different categories may then overlap. Depending on the modelling choice, separation of the patches can be operated, for example using the dominant category among the categories in the overlapping patches.
Similarly to the spatial structuration, Equation (
1) can also be written for
T a time structure of the observations, with
t being a sub-period of the whole time period of observations defined by the categorical variable
T. Order and proximities of the
ts allow to define a patch as homogeneous temporal zone according to a category
c from
C. A temporal patch is then also associated with variable descriptors such as
for a temporal patch size and temporal patch shape. With compositional data, a temporal ’shape’,
can take the form of a pattern of increasing and decreasing proportion values which becomes close to the notion of motifs, i.e., succession of specific categories. The latter can be also achieved from borrowing concepts involved in permutation entropy [
18,
19,
20] to integrate time flow in the dynamic of the categories, e.g., increase of a proportion of a category
c from past to future, motif as increase followed by a decrease, motifs due to pre-defined possible successions of categories. Therefore, Size and shape of patches of
C are seen here as the basics of the spatio-temporal structuration of
C applicable in various domains, e.g., physical geography, social geography, demography etc. For a land cover data, knowing the different sizes and shapes for a particular vegetation configuration will help understanding its ecology, e.g., invasive species; in urban planning these sizes and shapes will contribute to analyse social segregation and in epidemiology, sizes and shapes may relate to contagion paths and outbreak mechanisms.
The paper proposes a framework approach integrating the Shannon decomposition theorem (
Section 2) using these spatio-temporal descriptors. The
modus operandi of this framework is detailed in
Section 7 and illustrated with a land cover evolution data in
Section 8. The three major steps are:
(i) defining patches rules,
(ii) extracting the multiway information crossing spatio-temporal patch characteristics and C, and,
(iii) quantifying and mapping the spatio-temporal information from entropy decomposition and related methods. This framework, termed the
patch size and shape entropy (PsishENT) framework, is based on the Shannon entropy and existing spatio-temporal approaches of the Shannon entropy itself [
6,
8,
15] on the rendered information in (ii) (
Section 3). As part of (iii) a multiway correspondence analysis can be used [
21,
22] (
Section 5) which is related to the concept of mutual information reminded in
Section 2. This multiway analysis provides a decomposition for which, each part has an interpretation similar to a product of the spatial, temporal and categorical distributions, therefore providing after a transformation a simple entropy decomposition (see
Section 2 and
Section 5). These three major steps of the framework are detailed within their potential sub-steps in the next few sections before summarising the approach in
Section 7.
2. Using Shannon’s Multivariate Decomposition Entropy
Equation (
1) shows that the historical approach into a spatial entropy introduced the conditional entropy as a natural way forward. When considering all the categories of the variable of interest
C, so expending for all
c’s of
C and using the joint entropy, (
1) becomes:
known as the entropy decomposition theorem [
23,
24], where the roles of
S and
C in the bivariate distribution can be swapped as expressed by Equations (
3) and (
4). Note that this presentation is not limited to the spatial context and
S or
C can be any categorical variables.
is then the entropy for the overall distribution of the
c categories of the variable
C in the considered region, without explicit integration of the role of the spatial dimension.
is the mathematical expectation of Formula (
1) over all
c values and expresses the role of
C in the potential structuration of the sub-regions, i.e., if
is small then
C contributes substantially in highlighting differences (non uniformity) in
S. It implies a spatial configuration due to
C in the sub-regions but without knowing which categories are the most involved. The decomposition involving
, expressing how
S contributes in describing
C distribution, or, how
S influences
C non-uniformity, might be more interesting in representing spatially the impact of the variable
C for example by visualising the
S sub-regions using the statistic
, for each sub-region
s. This normalisation called from now on,
conditional entropy ratio, is a normalisation adapted to the analysis of parts of the conditional entropy.
A normalisation of the Shannon entropy such as
, allows to get a span between 0 and 1, i.e., 1 for ’completely’ uniform (
u) distribution. If the former normalisation (
) has the advantage of being self-referring, mapping
is independent of the number of categories used and allows sub-regions comparisons and the above statistic is the same:
Using the normalisation respective to uniform distribution, Equations (
3) and (
4) become:
The decomposition theorem of the entropy is not specific to
S and
C, only a bivariate imformation is required. Recently [
10] used the entropy decomposition theorem with a bivariate information referring to the categories,
C and spatially adjacent categories by then allowing a decomposition of the entropy of the spatial contiguity of categories from the adjacency distribution, i.e., similarly to co-occurrences of order 2, [
6].
Using Equation (
3), one gets
which from Equation (
4) is also
therefore:
defining the Mutual Information (MI) between the two variables
c and
S. Then from Equation (
3) or (
4):
leads to another way of defining the mutual information that is by the Kullback-Leibler divergence between
and
, i.e., the joint distribution and its approximation under the hypothesis of independence,
From Equations (10) and (11), if S and C are statistically independent, i.e., , or similarly the c profiles in different sub-regions are all the ’same’ (proportionals), then we have additivity of their respective entropy when considering the joint information. It does not mean that C is not structured spatially, only that the structuration S is expressing a common spatial structure (irrespective to c’s). Another structuration might reflect otherwise.
With Spatial and Temporal Supports
The entropy decomposition theorem, in the form of Equation (
10), is easily extendable to multivariate situations, within a spatial or non-spatial context:
for
p categorical variables
, with the conceptually easily generalisable mutual information of the
p variables:
. Within a spatio-temporal context for one categorical variable
C, this takes the form:
generalising Equation (
4) or (
3).
These different formulations provide ways of decomposing and representing graphically each component as patterns, e.g., a map of the for all s at chosen t (intervals or sub-periods) or as time series plot at chosen sub-regions s.
3. Taking into Account Spatio-Temporal Relative Proximities
The structuration of the observations from knowing their distribution jointly for
S,
T and
C leads to the multivariate decomposition theorem of the classical Shannon entropy but again no topological properties are really involved. However, as only the three-way data table
containing the distribution of occurrences of observations is used, it is also possible to use a distribution co-occurrences instead [
6,
15]. By then, the decomposition theorem will be framed within a spatio-temporal entropy measure. For a chosen order of co-occurrence
k, counting the number of co-occurrences among the observations
with
is made from considering the observations in a manifold
within an Euclidean space, e.g.,:
From this three-way table of counts of co-occurrences, a three-variate distribution of co-occurrences [
6] is achieved, i.e., a spatio-temporal distribution of
C that can be used with the Shannon entropy decompositions, i.e., Equations (
13) to (
16). For each cell
of the three-way data table
, any non-negative indicator positively correlated, across
, with count of observations can also lead to a three-variate distribution-like table that can be used with the Shannon entropy decompositions formula, e.g., a local version of the distance-ratio weight used in [
5]:
The local computation within each
, of co-occurrences distributions, or of distance-ratio weights are subject to a border effect that is not encountered with the occurrences distributions. However, it is easy to modify formulations (
17) or (
18) to allow overlaps but enforcing at least one of the
to be in
and the others within a small distance,
, to the border. That distance needs to be smaller than
, by then minimising the over-count of co-occurrences, and, if
is relatively smaller than the average distance between two observations in
, the estimation of
will not be too affected, i.e., proximities across the border will be taken into account without smoothing too much the values across neighbouring
’s. Without these overlaps, there could be under-estimation for the co-occurrences or distance-ratio statistics when a large number of observations are made close to borders.
With a Symmetric or Non-Symmetric Spatio-Temporal Approach
In integrating the spatio-temporal approach of co-occurrences, the approach taken in the previous sub-section has been non-symmetric. Multiple observations were identified first with their category
c, then their geolocation, spatio-temporally were taken into account within a
, i.e., a semantic bias was focusing on the
c’s observations scattered spatio-temporally. So, in definitions (
17) or (
18) the distances were spatial distances at time
t within the sub-region
s,
. To be fully symmetric the co-occurrence definition needs to be:
In the definition (
19),
is the spatial dimensional space in which the regional system
S is embedded, similarly for
as a temporal dimensional space and
a variable space where categorical variables can be expressed. The distances in
and
are the natural Euclidean distances and in
, proximities can be expressed as 0 or 1 or using a dissimilarity taking into account closeness between categories. Then, a distance in
has to be chosen, e.g., sum of the distances in each dimension, their product, their maximum?
The equivalence of this definition to the former definition in (
17) for particular settings highlights in fact the substantial conceptual difference. Implicitly, in definition (
17) there was no distance
per se for time
T,
being a snapshot of the spatial sub-region
s at time
t, neither for categories
C, i.e., implicit infinite distance for different categories or times, making the two definitions equivalent. Combining arithmetically distances in each sub-space or building a multidimensional distance is not straightforward due to the different scales and semantics involved. Therefore, it might be more appropriate to use a distance-rule across the three spaces
,
,
, such as:
instead of a distance in
. Noticeably, the definition (
19) establishes now a co-occurrence not just for
c but
s and
t too, as a joint category
, then from (
20), the criterion
is enough to record a co-occurrence of observations, here of order
. However, the co-occurrence "of what?" can take different forms. The first line in definition (
19) is modulated with set of chosen rules, i.e., the set of strict values in (
19) are complemented by another distance-rule based criterion, allowing to adopt multiple categorisations of the co-occurrence, therefore multiple co-occurrences at once. For example, if for each pairs of observations in the co-occurrence (of order
),
, then
,
and
are valid spatial categorisations (
S) for this co-occurrence, idem with
T and
C. This sort of fuzzy characterisation effectively removes the problem of the ’border effect’ mentioned in the previous section. The majority across each categorical variable could also characterise a co-occurrence, e.g.,
satisfying definition (
20) and
with
,
with
,
with
giving a categorisation of the co-occurrence as
, so not necessarily reflecting any of these observations.
Similarly, the local distance-ratio weight definition is asymmetric by essence but
S or
T can be focused on, not just
C. A fully symmetric version, looking at categories defined as
, leads to indicators that can take various forms depending on the choice of distances, e.g., closer to its definition as global indice [
5], or to its spatio-temporal version [
25,
26]:
From playing symmetrical roles in the data table
, as it does for the occurrence distribution used for the joint Shannon entropy, Equations (
13) to (
16) can be fully expressed within the spatio-temporal entropy approaches of
k-co-occurrences or localised indices such as the distance-ratio. As a consequence when replacing
S and
T, the structural framework of sub-regions and calendar chunks, by topological descriptors of
C such as patches size or shapes, allows the framework to study directly spatio-temporal topological interactions of
C, i.e., topological relations between a labelling from
C with a spatial labelling from
C and a temporal labelling from
C.
4. Constructing the Spatial and Temporal Patches Characteristics
Considering of spatial and temporal patches as embedding the spatio-temporal structuring context for
Section 2 has a twofold outcome. First, from categorising spatio-temporally the variable of interest
C, it enables to relate different parts of the entropy decomposition to the spatial or the temporal or the spatio-temporal processes involved with
C. Second, it allows a topological interpretation compatible with the spatio-temporal entropy approaches with proximities from
Section 3.
The data structure concerning the spatio-temporal distribution for the categorical variable C is either a compositional data per areal units or a set of single observations, each available at a point or areal unit. For a compositional data, a vector of the counts for each category represents the distribution of C in each unit. In the case of single observations only a single value from is an attribute of that observation. In the following of the paper, these will be termed compositional data and observational data respectively; without further description an observation will refer to both types.
The spatial or temporal patch criteria once established, patch size and patch shape can be defined accordingly. The categorical variables and will identify spatial and temporal patches across all c’s. As defined in the introduction, the generic definition of a patch is about connected observations of the same category. For compositional data, a chain or group of adjacent units will make a patch with a minimum proportion of c in each unit. For observational data, the connection of the observations with c have to be established using distance threshold (spatially, temporally or spatio-temporally). Then a patch is the set of points (or basic geometries) that encapsulate the observations which can be identified as the graph of the connected observations or by the convex hull of the observations or any other shape containing these observations. For both types of data, overlaps of patches may occurs. The patch size is defined by the count observations being part of, or falling into, the patch. Those remarks are valid for spatial and temporal patches and and define and as patch size categorical variables. Note that if the range of sizes values is too large, groups of sizes may be defining the categories in and .
With this generic definition of patches, shapes will be referring to the geometry of the patch for spatial aspects and geometry for time. When fuzziness of the patch is taken into account, for example with a proportion above a minimum required to be qualified as patch of c’s for compositional data or with a semantic distance across c categories for observational data, geometry and geometry are describing the shape. The reflects the degree of membership. They can be referred as flat patterns ( or ) or profile patterns ( and ). If for no specific shape categorisation can be made, with and , clustering the shapes from geometric measures such as perimeter, volume, principal axes compactness, etc. can be used to further categorise the shape to be used as .
Motifs, defined when the patch criterion includes the possibility of having more than one category
c in the patch, from proximity relations, define other types of shapes. A spatial motif may be for example, the shape of a patch with two categories,
and
with
being dominant (related to size), the motif with
dominant being more likely to be included as well. It can also involves a topological relation, e.g.,
most often in the North of
, or
’s surrounded by
’s. It can corresponds to a patch composite as suggests the latter examples. A temporal motif may be a sequence of first
observations for a number of time units followed by a number of time units with
, etc. The definition of the categories of shapes, as pattern, as motifs or both is of course a matter of the application in ecology, in economy, or epidemiology, as well as the level of complexity desired [
20,
27].
Focusing on the temporal dimension, the permutation entropy can be modulated by a distance, a meaningful difference, between observations when assessing their order, and so the occurrences of specific permuted patterns. This fuzzy assessment of the order is important when willing to separate really meaningful changes from smaller random changes. A similar refinement of the patterns or motifs has been proposed in [
20] with an example on distance to the mid point within a pattern of length 3. For a given time series, the members of a permutation class
can be defined as:
where
refers to one of the
permutations of the triple (1,2,3), implicitly referring to the length of the pattern
l with a lag
. For example, if three values are ranked like
then the triplet belongs to the pattern or motif of the permutation
. It is a sequence with an increase between
and (
and a decrease between
and
to a value higher than
. In [
20], two groups for any permutation are differentiated, if
or not, making
into a
representing a larger increase followed by a smaller (relatively to mid point) and
representing a smaller increase followed by a larger. For categorical variables, this presentation supposes either there is a predefined ordinal relationship between the categories or a compositional data where the motifs are worked on the proportions of a given category
c. The permutation approach ensuring that all alternatives motifs are to be used in the entropy is not necessary or always welcome. Rules to define a range of specific patterns can replace the full permutation approach. Besides varying the parameter
and
l one may be interested in simple patterns of increase or decrease with
but also allowing the patches to join up for various length of increases or decreases, i.e., when
becomes prominent.
The categorical variables , , and are replacing spatial and temporal categorisation of S and T. They are not used any more to pinpoint an observation in the time flow of space but characterise spatio-temporally the ’locality’ of where and when the observation occurred. The goal of this ’locality’ will be to encompass the local ’topology’ in space and time that is induced by the observations of C in the neighbourhood. Then, the spatio-temporal support exogenous to C processes disappears to become an inherent part of C. Note that the categorical variables and can also be considered as background information, a the spatio-temporal ’support’ similar to what S and T were providing but with the fundamental difference that is changing across time and across the space. Therefore they can be used directly only for entropy decomposition only at a specific time for or specific spatial unit for but also within a ’cumulative’ approach, e.g., describing all the set of all spatial patches at given times.
Once a set of specific topological characteristics linked to the spatio-temporal distribution of
C are chosen, the joint distribution is established, from occurrences along with various choices of ’counting’ statistics leading to the three-variate distribution of interest (
Section 3) and the entropy decomposition theorem(s) (
Section 2) can be used. The next section proposes an alternative decomposition setting on which entropy can apply.
5. Using Multiway Correspondence Analysis
An important part of the PsishENT framework comes from the fact that the Shannon decomposition theorem(s) of
Section 2 is based on working out a joint distribution to produce from the observations the multiway contingency table before using conditional probability properties. Equations (
12) and (
13), involving the mutual information, reflect the role played by the statistical independence of the categorical variables involved to build the joint distribution. Therefore, analysing the structure of independence of the multiway contingency table representing this joint distribution contributes to the spatio-temporal characterisation induced by
C. The correspondence analysis of a two-way contingency table [
28,
29] provides a decomposition of the
statistic of independence using a Singular Value Decomposition (SVD) of a specific matrix:
, where the
s are the singular values of the matrix of the
, using the vectors
and
as weights in the sum of squares and inner product for each dimensional variables
I and
J [
21]. In [
21], this presentation has been extended to analysing a multiway table using tensor algebra as an extension of matrix calculus. The decomposition, say for a generic three-way data contingency table, of the
for
(where
S,
T and
C are here taken as generic categorical variables in the PsishENT framework, e.g.,
,
,
C), and
being a normalised measure correlated to the proportion of occurrences for the observations with categories
s,
t and
c can be written:
where
and similarly for the other component vectors. Equation (
23) can be written:
where
,
, and
are the vectors of 1’s with corresponding dimensions, e.g.,
of length the number of categories in
S, and
. As in the SVD, the
are the maximum weighted sum of squares of a projection of the tensor
onto rank-one tensors
. The rank-one tensors
are the one reaching maximum singular values according to the PTA
k algorithm used for the multiway correspondence analysis [
21], the FCA
k method.
If the vectors
,
, and
were non-negative, a simple normalisation would make Equation (
24) a decomposition like a weighted sum of latent joint distributions of independent variables. This is already the case for
, as
is the joint distribution of
S,
T,
C as if they were independent and,
. For any given
, with a non-negative tensor
, with
and
,
idem for the other components, then
. So, the FCA
k method, after providing the tensor decomposition of the statistic
with
the weighted distance to 1 of the ratio to independence (
), would provide an interpretation of the associations expressed in each optimal rank-one tensors, in terms of additive entropy across the dimensions. Multiway correspondence analysis proposes then an alternative to the mutual information as a metric measuring associations between involved variables. From its set of latent variables, each rescaled rank-one tensor would express a spatio-temporal structuring in interaction with
C extracted for the initial multiway data table within an independence paradigm. Ratios such as,
or
would highlight the entropic contribution from
to the information structuring extracted from the rank-one tensor.
However, the PTA
k algorithm used in the FCA
k method is not a non-negative tensor decomposition, but has the property of providing a nested decomposition (within a hierarchical system) similarly to SVD, which existing non-negative tensor decomposition algorithms (NNTF) do not possess [
30]. So besides for
, the
,
, and
will have negative entries, just because of orthogonality constraints set up in the algorithm. However, for each rank-one tensor
, the tensor:
termed the CTR-tensor, satisfies the positivity and corresponds to a product of distributions as
, from Equation (
23),
idem for the other components. Each
is a relative contribution (CTR) of the category
s to the component
of the
rth rank-one tensor, which contributes at
of the whole decomposition or
to the departure from complete independence used in 2-way correspondence analysis [
28] and multiway correspondence analysis (FCAk) [
21]. Therefore,
quantifies the role of each combination
within the rank-one tensor and is expressing its spatio-temporal structuring in interaction with
C. Ratios such as,
or
highlight the entropic contribution to the relative importance from
S in the information structuring extracted from the rank-one tensor. Linked the
is the rank-one tensor itself for which a non-negative approximation would allow a similar entropy decomposition.
Instead of using an NNTF, analytic solutions to extract meaningful positive rank-one tensors from an optimal decomposition such as SVD or Equation (
24) have been proposed [
31,
32], mostly used as initialisation of an NNTF algorithm though with optimality on their own. Following the approach in [
31] a rank-one tensor of order
can be decomposed as:
where
and
are respectively the positive and negative parts of a vector
u, i.e.,
with
and
otherwise,
and
otherwise. From this definition,
, so
. Because of the tensor product and non-overlaps of
and
, it is easy to see that each non-zero cell in
comes from exactly one term in the right hand side of Equation (
28), so in one term either in
or
by then defined. Moreover, as
,
, all rank-one tensors involved Equation (
28) are orthogonal by construction. The orthogonality occurs for two vectors of the tensor product in between two rank-one tensors in either
or
, and at least once between rank-one tensors from these two groups. Therefore,
and
have a minimal non-negative decomposition of maximum
rank-one tensors. For example if
,
and
.
Now, each rank-one tensor
in Equation (
24) can be analytically decomposed as
and
with their respective non-negative rank-one tensors decomposition that can be interpreted similarly to a
above.
8. Illustrative Example of Land Cover Forecasts
The PsishENT framework offering a range of analyses based on entropy decomposition to highlight spatio-temporal information structuring, the purpose of this example is to show the most simple and illustrative aspects and its flexibility. The data comes from a climate simulation using a Land Surface Model (LSM) predicting the plant functional types (
pfts) between 2014 and 2100 [
33]. Plant functional types describe the vegetation that constitutes the land cover, e.g., boreal broadleaf shrubs, C3 grass. The LSM is driven under a climate forcing scenario, here the RCP8.5 defined by the Intergovernmental Panel on Climate Change (IPCC). RCP8.5 represents a trajectory of concentration of greenhouse gas that would occur for a targetted radiative forcing in 2100, here of 8.5 W/m
; this would mean a global average warming of +3.7
C in 2050 [
34].
For each spatial grid cell (here with a resolution of
of latitude and longitude) a fraction of occupation of each
pft is estimated within the forecasting at each simulation time step. So, the data used here corresponds to a compositional data. The full list of
pfts used in the LSM ORCHIDEE (ORganizing Carbon and Hydrology in Dynamic EcosystEms), with the version ORCHIDEE_HLveg [
33,
35] is given in Appendix. Note that
’bare ground’ is also taken as a
pft. To come back to an observational data one can transform the data such as considering the dominant
pft in each of the single grid cell with its fraction as a weight or considering each grid cell as an observation for each
pft with a weight, i.e., multiple observations for a given
c (a
pft), a
pft, with common spatio-temporal positions. To determine the patches the description using weights was used but dominant categories as summary was also used to represent the data graphically.
Figure 2 displays the distribution (as proportion of cells over a year) of the dominant
pfts. From the year 2025, the already higher spatial proportion of
pft9 dominance, boreal needleleaf summergreen, than most
pfts, keeps increasing from 25% to almost 40% in 2099 and in the meantime
pft13, boreal broadleaf shrubs, decreases from 25% to 10%. From 2039 to 2099,
pft10 dominance, C3 grass, halved, while in the meantime
pft4 doubles and
pft6 increases from 1% to 7%, temperate needleleaf evergreen and temperate broadleaf summergreen respectively. The boreal needleleaf evergreen,
pft7, shows a sudden drop in 2059 from 5% to 2%, after a drop of 5% between 2014 and 2025 (halved). In
Figure 3, the exact evolution of the proportions of occupation for
pft9,
pft13, and
pft4 are coherent with what has been described, so far, but the information is not quantified.
Figure 4 confirms spatially the changes observed in
Figure 2, from looking at the dominant
pft per spatial grid cell at the three years 2020, 2050 and 2100.
pft9 is increasing mostly in Russia;
pft13 is disappearing from the Fennoscandia region and southern Russia to appear in northern Russia replacing
pft10 there;
pft4 and
pft6 are replacing
pft10,
pft7 and
pft13 in the Fennoscandia area.
Spatial patches of size 1, i.e., one grid cell, were created for fractions of a
pft category greater than 15%. Grid cells belonging to more than one patch (i.e., more than one
pft category with an occupation greater than 15%) occurred every year with on average a grid cell belonging to 2.2 patches (median is 2, maximum is 6). Then adjacent patches of size 1 for the same
pft generated spatial patches of various sizes for a given year and a given
pft. In
Figure 5, the temporal evolution of the distribution of patch sizes are displayed where sizes have been grouped into 7 classes: 1, 2,
,
,
,
,
, with for example
grouping patches of sizes 26, 27, …, 50. From 2050, patches of class size 1 have an important increase with a bump between 2060 and 2080, classes 2, >2 and >25 show a steady increase whilst the number of patches from classes >7 and >50 are relatively decreasing; >100 relatively stable.
The variation in vertical spread at years 2020, 2050 and 2100 in
Figure 5 can be linked to the results in
Table 1. Indeed in 2020 the curves can be grouped in three: size 1, sizes 2 to >7 and sizes >25 to sizes >100, in 2050 the spread appears less structured and in 2100, size 1 group is important as the grouping sizes >50 and >100. However, much care is needed here as in
Table 1 it is the frequencies of grid cells involved in
and only the number of patches in
Figure 5.
For temporal patches the distribution of sizes have a median of 68 a mean of 59 and a third quartile of 87 out of a potential of 87 successive points from 2014–2100 (the total length). pft1, bare ground is the pft with the most uniform distribution in temporal patches . Pfts 4, 10, 12, 14, were represented equally in medium range patch size and high range patch size (very little in small range patch sizes); Pfts 6, 7 and 8 were more in medium range patch size than high range patch size (very little in small range patch sizes) whilst pfts 5, 9 and 13 were concentrated in high range temporal patch sizes.
In
Table 1 is reported at years 2020, 2050 nd 2100 the decomposition of the Shannon entropy using the normalisation relative to a uniform distribution and given in Equation (
8). The closer to 1
is, the more uniform the distribution is. Due to the normalisation lines 1 and 2, for example, add up to give line 5, once applied the coefficients e.g.,
. The spatial patch sizes
as well as
C alones show an increase of entropy while
shows a decrease highlighting the increasing effect of
C in determining the sizes
. However, the conditional entropy
is already quite low in 2020, highlighting the dependence of
classes of the sizes of spatial patches from the
pfts categories in
C.
From this table (
Table 1) and parts involved in the conditional entropies one can represent spatially the heterogeneity due to spatial sizes
that are revealed by the occurring patch sizes per spatial grid. In
Figure 6 are geographically mapped the parts contributing to the conditional entropy for
C knowing local spatial sizes
, i.e., the sum of the
for all the sizes
. The closer to 0 the more homogeneous the distribution of
C is as due to the spatial sizes involved in the local patches. The theoretical maximum heterogeneity is the value given in
Table 1 if all sizes
were involved, so
Figure 6 is mapping the % of that maximum value as indicated in Equation (6). Where there was no patches mapped values are missing and can be interpreted as uniformity in
C because of no patches found. Changes in homogeneity given the local spatial patch sizes are quite dramatic and shows more changes than only the dominant
pft recorded per spatial grid as in
Figure 4. The two figures are indeed complementary. Over the 2020-2100 period, one obverses in
Figure 6 a loss of homogeneity given the patch sizes in the Fennoscandia area whilst a slight increase in homogeneity is seen in western and southern east Russia between 2020 and 2050 followed by a slighter decrease at 2100. Northern Russia shows a decrease in homogeneity between 2020 and 2050 followed by an increase at 2100.
Similarly, in
Figure 7 is represented the conditional entropy ratio for
where local patches of
C values were used to map the local effect. Overall over the 2020-2100 period, there was an increase in homogeneity as the conditional entropy is decreasing (see
Table 1). Spatially there is an increase in homogeneity of patch sizes given the involved
ptfs (
C) in all areas, so either less variation in
pfts or in their patch sizes.
Integrating time patches sizes
can be done in various ways using the PsishENT framework, e.g., decompositions as in
Section 3 or using the multiway correspondence analysis (
Section 5). The latter one is enabling an additive entropy decomposition of modelled spatio-temporal interaction of
C from each rank-one tensors. Then, for a chosen time, e.g., 2020, 2050 and 2100, and a chosen class in
C or the local dominant
C category (
pft), a map of a score built at each grid cell from rank-one components weights for
C,
and
can be used to render the information structuring provided by selected rank-one tensors from the multiway correspondence analysis.The score can be also the corresponding CTR-tensor to render the contributing influence at a grid cell.
Using multivariate occurrences of
gives a
contingency table analysed by the multiway correspondence analysis. The rank-one tensor of independence of the three variables
,
and
C, i.e., corresponding to
in Equation (
24) has its components from the multiway table margins, in
Table 2 along with other rank-one tensors CTRs also in
Table 3.
It represents
of variability of the data, i.e.,
as expressed in Equation (
24). Large spatial patches,
and
, are most frequent as well as long time patches,
, but recording the count of grid cells involved
per patch size creates an expected monotonic increases.
pft9,
pft13,
pft10 and
pft1 are the most frequent patches. Associations across the 3 dimensions (
,
,
C as
pfts) are linked to the CTRs and the signs of the coordinates, in the decomposition (
24)) and Equation (
27). Signed CTRs for the rank-one tensors are reported in
Table 2 and
Table 3. For example, for the rank-one tensor representing
of variability (or
within the
left after complete independence captured by the first rank-one tensor,
),
is mostly associated with
pft13 and
pft9 whilst
is with
pft12 and to a less extent with
pft5, all with mostly very long and long time patches,
and
(the time component is the same as for complete independence). For the rank-one tensor of
of variability (with the same spatial patch size component),
pft1 is associated with small time size patches 1, 2 and medium sizes
, opposed to
pft9 with large time patches
.
For the complete independence, rank-one tensor with
of variability, which is also the CTR-tensor, the normalised entropy is
with
,
and
. Therefore, within this
of variability where large spatio-temporal patches of mostly of
pft9,
pft13 but also
pft10 or
pft1, temporal patches are more structuring than spatial and distinction of
pfts. CTRs entropy decomposition for the first best tensors of the FCA3 optimisation are in
Table 4.
Note for the last two tensors (
and
of variability), the structuring due to
pfts becomes more important as the entropy for
C becomes smaller. In
Figure 8, maps of the first three CTR-tensors are complementing the quantifications of the information from
Table 2 and
Table 4. For each grid cell, the geometric mean of the product of the component weights (for the local
,
and
C) as each score from Equation (
27) were signed with the local
C component weight in order to highlight the differences in
pfts. This gives a spatial
intensity of the patterns of spatial sizes
, temporal patches
and the categorical variable
C (here the
pfts). The differentiation due to the sign of
C weights is useful here but multiple maps per
pfts could be used instead which would allow not to focus only on the dominant
pft.
If the
Figure 2,
Figure 3 and
Figure 4 are very informative on the land cover evolution for this data, they do not allow quantification of the different roles of
C and the spatio-temporal embedding. The PsishENT framework provides this type of information as well help to characterise each influence from other graphical representations. First of all,
pfts categories have variant patch sizes (time and space). Some
pfts categories are, along time, increasingly explaining the patch sizes distributions which are related to increased homogeneity e.g.,
pft9 (boreal needleleaf summergreen) evolution to larger patches. A tendency to increase of a spatial fragmentation is also quantified (see
Table 1) which are localised in
Figure 6 and
Figure 7. Using correspondence analysis (FCA
k) or the spatio-temporal multiway table with spatial and time patch sizes with
C (pfts) enabled a double quantification (
Table 2 and
Table 4) in specific patterns of associations (each rank-one tensor) and using entropy to evaluate the structuring aspect of the components in the tensors. Spatio-temporal intensity of the effects could be mapped (
Figure 8). If fortunately the PsishENT approach allows to retrieve some tendencies seen using simple graphics, the quantification are useful and some hidden patterns can be also detected such as
pft12 (mosses) disappearing the north of Fennoscandia and Russia (see
Table 2 and
Figure 8).