Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology

Favretti, Marco

doi:10.3390/e20010011

Open AccessArticle

Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology

by

Marco Favretti

Dipartimento di Matematica “Tullio Levi-Civita”, Università degli Studi di Padova, 35122 Padova, Italy

Entropy 2018, 20(1), 11; https://doi.org/10.3390/e20010011

Submission received: 1 November 2017 / Revised: 27 November 2017 / Accepted: 21 December 2017 / Published: 27 December 2017

(This article belongs to the Special Issue Entropy and Its Applications across Disciplines)

Download Versions Notes

Abstract

:

In the first part of the paper we work out the consequences of the fact that Jaynes’ Maximum Entropy Principle, when translated in mathematical terms, is a constrained extremum problem for an entropy function

H (p)

expressing the uncertainty associated with the probability distribution p. Consequently, if two observers use different independent variables p or

g (p)

, the associated entropy functions have to be defined accordingly and they are different in the general case. In the second part we apply our findings to an analysis of the foundations of the Maximum Entropy Theory of Ecology (M.E.T.E.) a purely statistical model of an ecological community. Since the theory has received considerable attention by the scientific community, we hope to give a useful contribution to the same community by showing that the procedure of application of MEP, in the light of the theory developed in the first part, suffers from some incongruences. We exhibit an alternative formulation which is free from these limitations and that gives different results.

Keywords:

Maximum Entropy principle; Maximum Entropy Theory of Ecology; Shannon entropy; Boltzmann counting

1. Introduction

Maximum Entropy Principle MEP (E.T. Jaynes, 1957, see [1,2]) is a powerful inference principle which allows to determine the probability distribution that describes a system on the basis of the information available, usually in the form of averages of observables (random variables) of interest for the system. It is based on: (i) the enumeration of the system states

i = 1, \dots, N

; (ii) the introduction of one or more functions that translate the information available of the system in the form of constraints on the probability i.e., as

f (p) = c

where c is a vector of average values; and (iii) a function measuring the uncertainty associated to a probability distribution candidate to describe the system. We consider here only systems with a finite number of states since the extra generality of considering an infinite number of states is not necessary for the aims of this work. The sought distribution is the one for which uncertainty is maximal on the set of distributions for which

f (p) = c

. This distribution thus represents the least biased estimate on the basis of the given information whether or not its prescriptions are empirically confirmed by experiments. Each the above outlined Steps (i)–(iii) has a profound meaning and impact on the implementation of the MEP procedure. For example, if the states of the system are a priori equally probable, the uncertainty function to use is Shannon entropy

H (p)

, while if they are statistically described by a prior distribution q the uncertainty function is (minus) the relative entropy

D (p | q)

. If the systems states are the result of a coarse-graining procedure from a bigger set of a priori equiprobable states, then a prior distribution q which counts the number of coarse-grained states has to be used for

D (p | q)

in order to render the MEP procedure invariant with respect to the coarse graining as explained in [3]. See also [4] where a number of different probability distribution used in ecology are derived for a system of individuals arranged in different cells by simply making different assumptions on the individuals (indistinguishable or not), on the constraints and on the coarse graining. The variety of MEP responses simply reflects the facts that different pieces of information are considered; if the previsions fail to be confirmed by the experiment, this is the signal that relevant information for the statistical description on the system under study has been neglected.

A basic tenet of the Maximum Entropy Principle is that two observer which are given the same information expressed by Steps (i) and (ii) above must obtain the same result upon application of the MEP, i.e., the solution to the constrained extremum problem has to be unique. Notice that the MEP procedure does not specify the probability

p = (p_{1}, \dots, p_{N})

to use; thus if one observer uses p and another uses

p^{'}

which is related to p by a one-to-one transformation

p = g (p^{'})

, provided that they set up the constraints in the form

f (p) = c

and

f (g (p^{'})) = c

respectively, they are translating in mathematical terms the same information on the system. This represents the same degree of freedom of using different (but related by an invertible transformation) systems of coordinates to describe the position of a point in physics. Therefore, to ensure that the results of the application of the MEP procedure are independent of the choice of the p used the uncertainty function has to be defined accordingly. Proposition 1 in Section 2 below is the main tool to determine the form of the uncertainty function using g-related distributions.

The analysis of this subtile point of the application of MEP is the main aim of this paper. We argue that since there seems not to be a preferred choice for p they have to be considered all equivalent; therefore the related uncertainty functions are also equivalent and can be derived by a single given one. The question is: which is the translation in mathematical terms of the initial information which is entitled to use as uncertainty function the Shannon entropy? We give an answer by resorting to the Boltzmann statistical derivation of the entropy function, which has the same form of Shannon one. This statistical procedure, based on independent tosses of indistinguishable particles in bins, can be applied to different models and allows determining from a sort of “first principle” the form of the entropy function to use. We think that there is room for further investigation of this subtile issue of the application of MEP.

In the second part of the paper we apply the above findings to a specific application of MEP to ecology called Maximum Entropy Theory of Ecology (M.E.T.E.) as exposed in [5]. M.E.T.E. is a purely statistical model of an ecological community. Using the Maximum Entropy Principle the theory aims at inferring the form of some of the most used distributions in ecology from the knowledge of macroscopic information on the system: number of individuals, number of species and total metabolic requirement. MEP has been applied in the field of statistical ecology mainly to derive one of the most important probability distributions, the Species Abundance Distribution (SAD) but not exclusively (see [6] or [7] where it is used for modeling species geographic distributions with presence-only data). Several papers ([3,8,9]) have revised the application of the MEP inference principle to ecological problems, stressing the importance of the choice of the system’ coordinate [10,11,12], of the prior distribution [4,8] and of the entropy function [3,6]. The interest of [5] is that it considers simultaneously information on the distribution of individuals among species (SAD) and on the distribution of metabolic energy rate among individuals. In this richer scenario a number of probability distributions for different observables of interest are derived. The output of the work has grown to become a comprehensive theory of application of MEP to ecological problems, called Maximum Entropy Theory of Ecology (see the book [13,14]) which has been extensively tested with real ecosystem data in [15,16,17]. In Section 4.1 we review the assumptions of the model in [5] and the related MEP solution; in Section 4.2 we propose an alternative application of the MEP procedure starting from the same initial information but giving a different result. In Section 5 we discuss the non equivalence of the two procedures using the Proposition 1 below and motivate why the MEP solution in [5] is flawed by some inconsistences.

2. MEP Formulation Using Different Variables

The Maximum Entropy Principle is generally expressed as: p is the distribution that maximizes Shannon entropy

H (p)

over the set of distributions allowed by the constraints. Let

p^{'}

defined by

p^{'} = g^{- 1} (p)

with g invertible be a different choice of the variable to be used to express the probability distribution and the constraints and suppose that for the solution of this constrained extremum problem we use Lagrange multipliers method. Then, the following Proposition holds (see Appendix A for a proof).

Proposition 1.

\hat{p}

is the constrained extremum of

H (p)

over the set

f (p) = c

if and only if

{\hat{p}}^{'} = g^{- 1} (\hat{p})

is the constrained extremum of

H (g (p^{'}))

over the set

f (g (p^{'})) = c

.

From the above Proposition 1 we learn that if we are given the entropy function and the constraints as

(p, f (p), H (p))

and we want to use another variable

p^{'} = g^{- 1} (p)

, to find the same MEP solution we have to transform the constraints and the entropy function accordingly using

(p^{'}, f \circ g (p^{'}), H \circ g (p^{'}))

. In this way, changing variable and finding the maximum are operations whose order can be interchanged. A natural question arises in this setting: which one is the couple

(p, f (p))

translating in mathematical terms the initial information on the system that “is justified” in using the entropy function in Shannon form? We will give a partial answer to this question in Section 3 below.

Note that the invariance requirement with respect to the choice of the independent variable is different from the requirement, introduced in the axiomatic derivation of MEP in [18], of invariance in the form of

H (p)

with respect to a transformation of the type

y = Γ (x)

, where

x \in D

are the (possibly infinite) states of a system and

p = p (x)

. For a system with finite states the transformation

Γ

(called coordinate transformation in [18]) reduces to a relabeling of the states. Shannon entropy

H (p) = - \sum_{i = 1}^{N} p_{i} \ln p_{i}

being additive is clearly invariant in form, expressing the fact that different observers may use different labels for the N system states. This relabeling transformation

p_{k}^{'} = g_{k} (p_{1}, \dots, p_{N}) = p_{φ_{k} (1, \dots, N)} k = 1, \dots, N

where

φ_{k}

is a permutation of the N state labels is thus a particular case of the more general g considered in the above Proposition 1. It is clearly an invertible transformation in which

H (p)

and

H (g (p^{'}))

do coincide, but this is no longer assured for more general transformations g; see Section 3 below.

3. A Simple, Ecologically Oriented Example

To throw light on the questions considered above we take a very simple system used in a number of physical models that for our purposes can be considered as a minimal example of an ecological community. Consider a set

I

of

N_{0}

identical individuals (balls) which belong to S species (indistinguishable urns). Suppose that each species contains at least one individual so that the maximal number of individuals allowed to one species is

N = N_{0} - (S - 1)

. Suppose that the species labels can be interchanged so that the observable quantity is not the number of individuals

n_{α}

of a given species

α

but the number

S_{n}

of species which have exactly n individuals, for

n = 1, \dots, N

.

Suppose that the set of species

S

is equipped with the uniform probability

1 / S

and consider the random variable

ν : S \to N

,

α \mapsto ν (α) = n_{α}

,

N = {1, \dots N}

that assigns to a species its abundance. Let

ϕ (n)

be the density associated to the random variable

ν

(in the sequel P denotes probability)

ϕ (n) = P (ν = n) = P (α \in ν^{- 1} (n)) = \frac{| ν^{- 1} (n) |}{S} = \frac{S_{n}}{S}, n \in N

(1)

where

| A |

is the number of elements of A. Set for later use

S_{n} = ν^{- 1} (n) \subset S

, therefore

| S_{n} | = S_{n}

.

Let the set of individuals

I

be equipped with the uniform probability

1 / N_{0}

, let

α : I \to S

,

i \mapsto α (i)

be the assignation of each individual to a species and consider the random variable

ν \circ α : I \to N

,

i \mapsto ν (α (i))

. Moreover, consider the following subset

I_{n}

of

I

obtained by grouping together all the individuals belonging to the species having n individuals (we call this subset the “multispecies” n)

I_{n} = {i \in I : ν (α (i)) = n} \subset I, I_{n} = | I_{n} | .

(2)

As a straightforward consequence, we have that

I_{n} = n S_{n}

. Let

ϕ_{I} (n)

be the density associated to the random variable

ν \circ α

that is

ϕ_{I} (n) = P (ν \circ α = n) = P (i \in I_{n}) = \frac{I_{n}}{N_{0}}, n \in N .

(3)

Both

ϕ

and

ϕ_{I}

are probability distributions on

N

; it is easy to see that since

I_{n} = n S_{n}

, they are related by an invertible transformation

ϕ_{I} = g (ϕ)

defined as

ϕ_{I} (n) = g_{n} (ϕ) = \frac{n S}{N_{0}} ϕ (n) n = 1, \dots, N .

(4)

In ecology the distribution

ϕ

is called Species Abundance Distribution (SAD) and is probably the most important (and investigated) indicator of how an ecological community is structured. While

ϕ (n)

gives the probability of extracting a species of abundance n from the set of species,

ϕ_{I} (n)

gives the probability that an individual extracted from the set

I

belongs to a species of abundance n. Real communities are composed of many species with few individuals (rare species) and few ones which contain the majority of the individuals (common species). As a consequence, the two probability distribution are very different.

The information on the system resumed by the numbers

N_{0}, S

sets the following equalities

N_{0} = \sum_{n = 1}^{N} n S_{n} = \sum_{n = 1}^{N} I_{n} and S = \sum_{n = 1}^{N} S_{n} = \sum_{n = 1}^{N} \frac{1}{n} I_{n} .

(5)

If we divide the above two equalities by S or

N_{0}

respectively, we obtain the following equations to be satisfied by

ϕ

\frac{N_{0}}{S} = \sum_{n = 1}^{N} n ϕ (n), 1 = \sum_{n = 1}^{N} ϕ (n)

(6)

and

ϕ_{I}

respectively

1 = \sum_{n = 1}^{N} ϕ_{I} (n), \frac{S}{N_{0}} = \sum_{n = 1}^{N} \frac{1}{n} ϕ_{I} (n) .

(7)

Constraints (6) and (7) are g-related, that is

ϕ

satisfies Equation (6) if and only if

g (ϕ)

satisfies Equation (7). Note that in first case we we are constraining the weighted arithmetic average of

1, \dots, N

while in second one we are constraining the weighted harmonic average. Note also that the actual information being used is the ratio

N_{0} / S

or equivalently

S / N_{0}

, so that the MEP description of the system is independent of the size

N_{0}, S

of the system provided that the ratio

N_{0} / S

is kept constant.

Suppose that two different observers translate in mathematical terms the information contained in the data

N_{0}, S

using the two above introduced variables

ϕ

and

ϕ_{I}

and that both invoke the MEP to find the least informative probability distribution allowed by the constraints. The application of MEP with entropy function

H (ϕ) = - \sum_{n = 1}^{N} ϕ (n) ln ϕ (n)

and Constraint (6) gives, upon application of the Lagrange multipliers methods, the geometric probability distribution

\hat{ϕ} (n) = \frac{e^{- λ n}}{Z (λ)}, Z (λ) = \sum_{n = 1}^{N} e^{- λ n}

(8)

while the MEP procedure with the same entropy function

H (ϕ_{I}) = - \sum_{n = 1}^{N} ϕ_{I} (n) \ln ϕ_{I} (n)

and Constraint (7) gives the solution

{\hat{ϕ}}_{I} (n) = \frac{e^{- γ \frac{1}{n}}}{Z (γ)}, Z (γ) = \sum_{n = 1}^{N} e^{- γ \frac{1}{n}}

(9)

whose corresponding SAD is

\tilde{ϕ} (n) = g^{- 1} ({\hat{ϕ}}_{I}) (n) = \frac{N_{0}}{S n} \frac{e^{- γ \frac{1}{n}}}{Z (γ)} .

(10)

It is easy to see that

\hat{ϕ} \neq g^{- 1} ({\hat{ϕ}}_{I})

, therefore with this example we have shown that changing independent variables and using the MEP with the Shannon entropy are non commuting operations in the general case. Had we used Constraint (7) with the transformed entropy

H^{'} (ϕ_{I}) = H (g^{- 1} (ϕ_{I})) = - \sum_{n = 1}^{N} ϕ_{I} (n) \frac{N_{0}}{n S} \ln [ϕ_{I} (n) \frac{N_{0}}{n S}]

(11)

we would have obtained the solution

g (\hat{ϕ})

with

\hat{ϕ}

in Equation (8). Note that the transformed entropy in Equation (11) does not have the form of a relative entropy function therefore it can not be obtained by introducing a suitable prior probability distribution. Note also that the non commutativity considered above is different from the non commutativity of MEP procedure with respect to coarse graining of the system state considered in [3,4].

Which One is the Correct Application of MEP?

Both

ϕ (n)

in Equation (8) and

\tilde{ϕ} (n)

in Equation (10) result from the application of the MEP starting from the same initial information

N_{0}, S

on the system and provide non equivalent answers to the question of how the abundance of individuals is distributed among the species. Since the set of distributions allowed by the constraints (feasible set) in the two formulations Equations (6) and (7) are related by an invertible transformation

D_{I} = g (D)

, the non equivalence must reside in the choice of the form of the entropy functions which are not g-related according to Proposition 1. On the basis of the same Proposition, once the form of the entropy function is established in a given formulation of MEP

(p, f (p), H (p))

, it is also determined for all g-related formulations. It seems thus necessary to invoke a sort of first principle to establish the correct form of the entropy function in the initial MEP formulation.

A possible approach could be to compare the form of the resulting probability distribution against field data. This is not entirely satisfactory since the discrepancy between the MEP solution and the empirical curve could be due to the fact that we have neglected (or we are not aware of) some information on the system which is relevant for the question addressed. In this case the discrepancy between the model and the empirical pattern is a signal that other factors shaping the probability distribution are acting on the system. By the way, for the problem considered above, it is generally acknowledged in the ecological literature (see e.g., [5]) that the geometric SAD in Equation (8) has a poor fit with the empirical SAD, while the one introduced in (10) could provide a better fit since is more flexible (it can be monotonically decreasing or display an interior mode depending on the value of the Lagrange multiplier).

At least for systems with a discrete number of states another criterion can be introduced, which uses the combinatorial argument providing one of the justifications for the form of Shannon entropy historically preceding the axiomatic derivation in [18].

This is the celebrated Boltzmann problem of statistical mechanics [19]. It is useful to briefly review its solution here because it tells us how to determine the correct form of the entropy function from first principles in cases where its determination is not obvious. It deals with the distribution of K indistinguishable particles (individuals) among M equally spaced energy levels

ϵ = 1, \dots, M

. Boltzmann original formulation dealt with arbitrarily spaced energy levels but this extra generality is not needed here. If

n_{ϵ}

is the number of particles having energy

ϵ

and

n = (n_{1}, \dots, n_{M})

is the vector of occupation numbers of the energy levels, the logarithm of the probability of

n

is

ln W (n) = ln \frac{1}{M^{K}} \frac{K!}{n_{1}! \dots n_{M}!}

(12)

which using Stirling approximation

\ln n! \approx n \ln n

can be rewritten as

\ln W (n) = K \ln K - \sum_{ϵ} n_{ϵ} \ln n_{ϵ} .

(13)

The maximization of

\ln W (n)

under the constraints

\sum_{ϵ} n_{ϵ} = K

and

\sum_{ϵ} ϵ n_{ϵ} = E_{0}

gives the most common, i.e., least informative vector of occupation numbers. The solution of the constrained maximization problem is the celebrated Boltzmann-Gibbs distribution

n_{ϵ} = \frac{K e^{- β ϵ}}{\sum_{ϵ = 1}^{M} e^{- β ϵ}}, ψ (ϵ) = \frac{n_{ϵ}}{K} = \frac{e^{- β ϵ}}{\sum_{ϵ = 1}^{M} e^{- β ϵ}}

(14)

Note that writing

n_{ϵ} = ψ (ϵ) K

,

\ln W

in Equation (13) coincides up to a constant with

H (ψ) = - \sum_{ϵ} ψ (ϵ) \ln ψ (ϵ) .

(15)

In the same manner, if we have S indistinguishable species to be assigned in the N levels (abundances) of

N

let

s = (S_{1}, \dots, S_{N})

be the occupation vector with constraints

\sum_{n} S_{n} = S

and

\sum_{n} n S_{n} = N_{0}

. As before, setting

S ϕ (n) = S_{n}

the least informative probability distribution is

ϕ (n) = \frac{e^{- λ n}}{\sum_{n} e^{- λ n}} = \frac{e^{- λ n}}{Z (λ)}

(16)

and the associated entropy is

H (ϕ) = - \sum_{n} ϕ (n) \ln ϕ (n)

(17)

Therefore the form of the entropy function for the formulation given by Equation (6) of the MEP problem using the variable

ϕ

can be derived by Boltzmann combinatorial argument.

Let us see which is the resulting entropy function if we apply Boltzmann argument to the formulation given by Equation (7) using

ϕ_{I}

. Remember that

(A) ϕ (n) = P r o b (α \in S_{n}), (B) ϕ_{I} (n) = P r o b (i \in I_{n})

(18)

so basically in (A) we are “tossing” S species in the N bins

S_{n}

while in (B) we are “tossing”

N_{0}

individuals in the N multispecies

I_{n}

. In the Boltzmann argument the tosses are supposed to be independent and this is true for (A) but not for (B). Indeed a multispecies

I_{n}

is the grouping of

S_{n}

species each having n individuals so the number of individuals in

I_{n}

can be updated only by multiples of n, i.e., adding or subtracting a species of n individuals. Hence the tosses of single individuals does not represent independent tosses. If we want to support the choice of the entropy function by Boltzmann argument we have to consider tosses of groups of n individuals. Let

i_{n}

be the number of individuals tossed in

I_{n}

. Therefore the related occupation vector has the form

i = (\frac{i_{1}}{1}, \dots \frac{i_{n}}{n}, \dots \frac{i_{N}}{N}), W (i) = \frac{1}{N^{N_{0}}} \frac{N_{0}!}{\frac{i_{1}}{1}! \dots \frac{i_{N}}{N}!}

(19)

which leads to an entropy (recall that

i_{n} = N_{0} ϕ_{I} (n)

) of the form

H (ϕ_{I}) = - \sum_{n} \frac{N_{0}}{n} ϕ_{I} (n) \ln [\frac{N_{0}}{n} ϕ_{I} (n)]

and under the Constraints (7) above to the solution

ϕ_{I} (n) = n \frac{S}{N_{0}} \frac{e^{- λ n}}{Z (λ)} = g (\hat{ϕ} (n)) .

(20)

We have shown that in the case where the probability distribution being used can be linked at least conceptually to independent repetitions of an experiment, we have a criterion to derive the correct form of the entropy function from a first principle. Of course to build a general theory this simple example deserves further generalization.

4. Maximum Entropy Theory of Ecology

4.1. Review of M.E.T.E. Assumptions

Let us review the assumptions of M.E.T.E. as exposed in [5]. In [5], various metrics (probability distributions) are derived but here we will content ourselves with the analysis of the derivation of two of them, namely the Species Abundance Distribution

ϕ (n)

and the distribution of metabolic energy

ψ (ϵ)

between the individuals.

Following [5], we consider a community made of

N_{0}

individuals living on an area

A_{0}

, subdivided into S species and having total metabolic energy rate

E_{0}

. Assume that each species has at least one individual and that the individual metabolic rate is a discrete quantity which ranges from 1 to a maximum M in a suitable energy unit. With these assumptions, the abundance of a species

α

is

ν (α) = 1, \dots N

with

N = N_{0} - S + 1

and the individual metabolic rate is

m (i) \in M = {1, \dots, M}

, with

M = E_{0} - S_{0} + 1

. Each individual is labelled by a double label

(α, ϵ)

indicating the species

α = 1, \dots, S

and the metabolic rate

ϵ = 1, \dots, M

. Here we depart from [5] by assuming for the sake of simplicity that the individual metabolic rate is a discrete quantity while in [5] it is assumed to be a continuous one. This change does not affects our argument since for dealing with the continuous case it is enough to substitute the sums with integrals.

The information on the system is limited to the values

N_{0}

, S,

E_{0}

, and

A_{0}

, but for our aims the knowledge of

A_{0}

is not relevant. Therefore we are dealing with a non spatial model of a community of S non interacting species. On the basis of this sole knowledge we would like to answer to two questions: (

Q_{1}

) How is the energy distributed among the individuals? (

Q_{2}

) How are the individuals distributed among the species? Note that, since the species’ label can be exchanged, the correct formulation of (

Q_{2}

) is how many of the species have exactly n individuals for

n = 1, \dots, N

, which is the information contained in

ϕ (n)

.

Both questions can be answered in a non deterministic manner; for (

Q_{2})

, we use the density

ϕ

introduced in Equation (1), while, for (

Q_{1})

, we introduce the map

m : I \to M

,

i \mapsto m (i)

that we will consider as a random variable and its density

ψ (ϵ)

on

M

ψ (ϵ) = P (m = ϵ) = P (i \in m^{- 1} (ϵ)) = \frac{I_{ϵ}}{N_{0}}, \forall ϵ \in M

(21)

where

I_{ϵ} = m^{- 1} (ϵ) \subset I

and

I_{ϵ} = | I_{ϵ} |

. For later use, consider also the joint probability distribution

P (n, ϵ) = P r o b (i \in I_{n}, i \in I_{ϵ}) = \frac{1}{N_{0}} | I_{n} \cap I_{ϵ} |

(22)

having marginals, respectively,

ϕ_{I} (n)

and

ψ (ϵ)

, since the space

I

is doubly partitioned in multispecies and energy classes.

4.2. Our Solution of M.E.T.E. Problem

The following constraints contain all the information

(N_{0}, S, E_{0})

available on the system

\sum_{ϵ} ψ (ϵ) = 1, \sum_{ϵ} ϵ ψ (ϵ) = \frac{E_{0}}{N_{0}}

(23)

\sum_{n} ϕ (n) = 1, \sum_{n} n ϕ (n) = \frac{N_{0}}{S}

(24)

and a straightforward application of MEP with the constraints in Equation (23) and the entropy in Equation (15) [with the constraints in Equation (24) and the entropy in Equation (17) respectively] gives the answers

\hat{ψ}

in Equation (14) [

\hat{ϕ}

in Equation (16)] to the above questions (

Q_{1})

and (

Q_{2})

, i.e.,

\hat{ψ} (ϵ) = \frac{e^{- β ϵ}}{Z (β)} = P (i \in I_{ϵ}), \hat{ϕ} (n) = \frac{e^{- λ n}}{Z (λ)} = P (α \in S_{n}) .

Basically, we are solving twice Boltzmann problem of statistical mechanics. Now, if one is interested in the joint probability

Q (n, ϵ) = P (α \in S_{n}, i \in I_{ϵ})

(25)

we proceed as before, introducing the vector

c = (C_{1, 1}, \dots C_{N, M})

where

C_{n, ϵ}

counts the number of times the object

(α, i)

is assigned to the bin

S_{n} \times I_{ϵ}

. Hence

W (c) = \frac{1}{{(N M)}^{N_{0} S}} \frac{(N_{0} S)!}{C_{1, 1}! \dots C_{N, M}!}

(26)

with

\ln W (c) = - N_{0} S \ln N_{0} S - \sum_{n, ϵ} C_{n, ϵ} \ln C_{n, ϵ}

(27)

to be maximized under the decoupled constraints (

\sum_{ϵ} C_{n, ϵ} = S_{n}, \sum_{n} C_{n, ϵ} = n_{ϵ}

)

\sum_{n, ϵ} C_{n, ϵ} = N_{0} S, \sum_{n, ϵ} n C_{n, ϵ} = N_{0}, \sum_{n, ϵ} ϵ C_{n, ϵ} = E_{0} .

(28)

The related entropy is, setting

C_{n, ϵ} = Q (n, ϵ) N_{0} S

H_{N \times M} (Q) = - \sum_{n, ϵ} Q (n, ϵ) \ln Q (n, ϵ)

(29)

and the maximum entropy distribution with Constraint (28) can be shown to be

Q (n, ϵ) = ϕ (n) ψ (ϵ) = \frac{e^{- λ n}}{Z (λ)} \frac{e^{- β ϵ}}{Z (β)} .

(30)

We claim that Equation (30) is the correct solution of the MEP problem dealt with in [5,13] in the sense that the choice of the entropy function adopted is supported by the Boltzmann argument. We do not claim that this solution has a better fit with empirically derived patterns with respect to others but only that is derived in a consistent way. In this solution the two random variables

ν

and m are uncorrelated, therefore the energy and species constraints can be dealt with separately giving Equations (14) and (16) or at the same time. This is a well known property of MEP: if the constraints concern only the marginals of a joint probability, then the MEP solution is the product of two unidimensional distributions.

5. Analysis of the Application of MEP in M.E.T.E.

The starting point of M.E.T.E. theory is the following probability distribution in Equation (31) on

N \times M

introduced in [5], formula (4c). Quoting [5], page 2702 above formula (1a) : “

R (n, ϵ)

is the probability that if a species is picked at random form the species list, then it has abundance n, and if an individual is picked at random from that species with abundance n, then its metabolic rate is

ϵ

” (is between

[ϵ, ϵ + d ϵ]

in the original continuum energy formulation, italics is ours)

R (n, ϵ) = ϕ (n) Θ (ϵ | n) .

(31)

This a delicate point: we think that using “that species” makes the above definition logically inconsistent because if there are more than one species with abundance n it is impossible to know which species is being picked and so the probability of picking an individual with a given metabolic rate is not uniquely defined. One can readily convince oneself of this by considering a toy model of the systems along the lines of example in Box 7.1 in [13], page. 144, but with at least two species having equal abundance and containing individuals with the same metabolic rate. The only way for us to be logically consistent is to substitute “that species” with “a species”. This is equivalent to picking an individual of given metabolic rate from the multispecies

I_{n}

. In this paper we have thus intended the definition of

R (n, ϵ)

as amended in this way.

Note however that in the sequel of [5], below formula (4c), it is (correctly) written “a species”. Note also that the same ambiguity is present in the book [13], Chapter 7, pages 142 and 143. In our framework, we thus rephrase the definition as

R (n, ϵ) = ϕ (n) P (i \in I_{ϵ} | i \in I_{n})

(32)

and using the definition of conditional probability

P (A, B) = P (A) P (B | A)

and (4), (22) also as

R (n, ϵ) = ϕ (n) \frac{P (i \in I_{ϵ}, i \in I_{n})}{P (i \in I_{n})} = ϕ (n) \frac{P (n, ϵ)}{ϕ_{I} (n)} = \frac{N_{0}}{S n} P (n, ϵ)

(33)

Therefore, the probability distributions

P (n, ϵ)

and

R (n, ϵ)

are related by the invertible transformation g (we use the same symbol for the sake of simplicity)

P (n, ϵ) = g_{n} (R (n, ϵ)) = \frac{n S}{N_{0}} R (n, ϵ) .

Moreover, the information

N_{0}, S, E_{0}

sets the following constraints on

R (n, ϵ)

, see

(1 a)

–

(1 c)

in [5]

\sum_{n, ϵ} n R (n, ϵ) = \frac{N_{0}}{S}

(34)

\sum_{n, ϵ} ϵ n R (n, ϵ) = \frac{E_{0}}{S}

(35)

\sum_{n, ϵ} R (n, ϵ) = 1

(36)

hence also

R (n, ϵ)

is properly normalized. In [5], the MEP procedure is applied as follows: the probability distribution

R (n, ϵ)

is the one that maximize the Shannon entropy

H (R) = - \sum_{n, ϵ} R (n, ϵ) \ln R (n, ϵ)

(37)

on the set defined by Constraints (34)–(36) above. It is therefore the least informative probability distribution on the basis of the sole knowledge of the macroscopic ratios

E_{0} / S

,

N_{0} / S

. Note that, as observed in [5], if the previous ratios are known,

E_{0} / N_{0}

is also known.

Standard application of the MEP gives the probability distribution depending on the unknown multipliers

λ

and

β

(see [5], formula (6))

R (n, ϵ) = \frac{e^{- λ n - β ϵ n}}{Z (λ, β)} .

(38)

The Lagrange multipliers

λ

and

β

have to be determined by inserting Equation (38) into the constraints in Equations (34) and (35) but an analytical solution of the resulting equations does not exist. Adopting some approximations in [5] we are lead to the following explicit formulae for the marginal (the SAD distribution)

Φ (n) = \sum_{ϵ} R (n, ϵ) \approx \frac{1}{\ln λ^{- 1}} \frac{e^{- λ n}}{n}

(39)

which is the celebrated Fisher log-series ([20]) and to

Ψ (ϵ) = \frac{S}{N_{0}} \sum_{n} R (n, ϵ) \approx λ β \frac{e^{- λ - β ϵ}}{{(1 - e^{- λ - β ϵ})}^{2}}

(40)

for the energy marginal. Moreover, the conditional probability is

Θ (ϵ | n) \approx β n e^{- β n ϵ} .

(41)

What is striking in the above result, is that it prescribes a non zero correlation (Equation (41)) between the distribution of energy among individuals and the distribution of abundance among species, even if in the initial information nothing seems to hint at a possible correlation. This is for us the signal of a flaw in the above application of MEP.

The Correct Form of Entropy Function of M.E.T.E. Problem

Our aim now is to derive the form of the entropy function for the MEP problem in the

R (n, ϵ)

variable with constraints on energy and species in Equations (34)–(36) using the Boltzmann argument. Since

R (n, ϵ)

is not a joint probability distribution, we derive the entropy function for the distribution

P (n, ϵ) = P (i \in I_{n}, i \in I_{ϵ})

in (22) which is related to

R (n, ϵ)

by the change of variable (33) and use the Proposition 1. The g-related constraints of Equations (34)–(36) for

P (n, ϵ)

are

\sum_{n, ϵ} P (n, ϵ) = 1, \sum_{n, ϵ} \frac{1}{n} P (n, ϵ) = \frac{S}{N_{0}}, \sum_{n, ϵ} ϵ P (n, ϵ) = \frac{E_{0}}{N_{0}} .

Note that using

P (n, ϵ)

the constraints are only on the marginals of

P (n, ϵ)

hence unlike for

R (n, ϵ)

the MEP solution is the product of the marginals, see (48) below. Given the occupation vector

i = (i_{1}, \dots, i_{N})

of the

N_{0}

individuals in the N multispecies, for every n we have to distribute

i_{n}

individuals among the M energy groups

I_{ϵ}

and each arrangement

i_{n} = (i_{n, 1}, \dots, i_{n, M})

with

i_{n} = \sum_{ϵ} i_{n, ϵ}

has probability

W_{n} (i_{n}) = \frac{1}{M^{i_{n}}} \frac{i_{n}!}{i_{n, 1}! \dots i_{n, M}!}

(42)

Therefore, taking into account that we have

i_{n} / n

independent tosses in the bin

I_{n}

, we have to use the multiplicity factor

W (i)

in Equation (19)

W = W (i) \prod_{n = 1}^{N} W_{n} (i_{n})

(43)

hence

\ln W = \ln W (i) + \ln \prod_{n = 1}^{N} W_{n} (i_{n}) .

(44)

The maximum of

\ln W

is reached when both terms in the right hand side are maximized, but the first term

\ln W (i)

is independent of the arrangement in the energy levels. Therefore we can maximize the second term in the right hand side of Equation (44) for fixed values of

i

and then maximize the first term with respect to

i

. Now

\ln \prod_{n = 1}^{N} W_{n} (i_{n}) = \sum_{n} \ln W_{n} (i_{n}) = - \sum_{n, ϵ} i_{n, ϵ} \ln i_{n, ϵ} + f (i)

(45)

is to be maximized under the

N + 1

constraints (note that the constraint

\sum_{n, ϵ} i_{n, ϵ} = N_{0}

is already enforced since

\sum_{n} i_{n} = N_{0}

)

\sum_{n, ϵ} ϵ i_{n, ϵ} = E_{0}, and \sum_{ϵ} i_{n, ϵ} = i_{n}, n = 1, \dots, N .

(46)

The unconstrained extremum problem for the Lagrange function

G = - \sum_{n, ϵ} i_{n, ϵ} \ln i_{n, ϵ} - β (\sum_{n, ϵ} ϵ i_{n, ϵ} - E_{0}) - \sum_{n} μ_{n} (\sum_{ϵ} i_{n, ϵ} - i_{n})

(47)

gives the solution

i_{n, ϵ} = i_{n} \frac{e^{- β ϵ}}{\sum_{n} e^{- β ϵ}} = i_{n} \frac{e^{- β ϵ}}{Z (β)} .

Since

i_{n, ϵ} = P (n, ϵ) N_{0}

and

i_{n} = ϕ_{I} (n) N_{0}

we have

P (n, ϵ) = ϕ_{I} (n) \frac{e^{- β ϵ}}{Z (β)}

(48)

and to get the complete solution we have to maximize

W (i)

, which has already been dealt with in the previous section, Equations (19) and (20). Therefore, the MEP procedure for the distribution

P (n, ϵ)

with the constraints gives the solution

P (n, ϵ) = \frac{n S}{N_{0}} \frac{e^{- λ n}}{Z (λ)} \frac{e^{- β ϵ}}{Z (β)} .

(49)

Using the change of variables in Equation (33) and Proposition 1, we find the solution of the MEP problem in [5] with respect to the variables

R (n, ϵ)

R (n, ϵ) = \frac{N_{0}}{S n} P (n, ϵ) = \frac{e^{- λ n}}{Z (λ)} \frac{e^{- β ϵ}}{Z (β)}

(50)

which coincides with

Q (n, ϵ)

in (30).

6. Discussion

A non trivial task in the application of the MEP is the translation in mathematical terms (i.e., as constraints on the sought probability distribution) of the information available on the system. In some cases one may hesitate between mathematically equivalent formalizations. What is not so well known is that the use of Shannon entropy

H (p)

may not be justified in all cases. We have provided a criterion based on Boltzmann method of the most probable occupation vector to derive the correct form of the entropy function in a given formulation. Note that the problem addressed here is different from the search of a suitable prior distribution for the relative entropy

D (p | q)

. In the second part of the paper we have applied this analysis to critically examine the Maximum Entropy Theory of Ecology (M.E.T.E.). Even if the application of MEP contained in M.E.T.E. seems flawed, the resulting SAD proposed by M.E.T.E. is the Fisher log-series, which is widely accepted in the ecological community and considered as giving a best fit for many ecological communities (although the use of log-series SAD has been questioned recently, see [21]). Therefore, the previsions of species abundance based on M.E.T.E. may well be in good agreement with field data. What seems to be hardly giving a sound base in M.E.T.E. is the claim that the information contained in

N_{0}, S, E_{0}

produces a coupling between the distribution of abundance between species and the distribution of metabolic energy between individuals.

Acknowledgments

We thank A. Maritan, S. Suweis, A. Tovo, M. Formentin and M. Pavon for fruitful discussions on the subject of this paper.

Conflicts of Interest

The authors declare no conflict of interest

Appendix A

Proof of Proposition 1.

To seek for the constrained extrema of

H (p)

over the set

f (p) - c = 0

,

c \in R^{k}

, we introduce the Lagrange function

G (p, λ) = H (p) + \sum_{α = 1}^{k} λ_{α} (f_{α} (p) - c_{α})

\hat{p}

is a constrained extremum if and only if for all i

{(\nabla_{p} G (\hat{p}, λ))}_{i} = {(\nabla H (\hat{p}))}_{i} + \sum_{α = 1}^{k} λ_{α} \frac{\partial f_{α} (\hat{p})}{\partial p_{i}} = {(\nabla H (\hat{p}) + d f^{T} (\hat{p}) λ)}_{i} = 0 .

Let

H^{'} (p^{'}) = H (g (p^{'}))

and

f^{'} (p^{'}) = f (g (p^{'}))

where

p = g (p^{'})

is invertible and

G^{'} (p^{'}, μ) = H^{'} (p^{'}) + \sum_{α = 1}^{k} μ_{α} (f_{α}^{'} (p^{'}) - c_{α}) .

p^{'}

is a constrained extremum of

G^{'}

over the set

f^{'} (p^{'}) = c

if and only if for all i

{(\nabla_{p^{'}} G (p^{'}, μ))}_{i} = {(\nabla H^{'} (p^{'}))}_{i} + \sum_{α = 1}^{k} μ_{α} \frac{\partial f_{α}^{'} (p^{'})}{\partial p_{i}^{'}} = 0

that is

\sum_{j} \frac{\partial H (g (p^{'}))}{\partial p_{j}} \frac{\partial g_{j} (p^{'})}{\partial p_{i}^{'}} + \sum_{j, α} μ_{α} \frac{\partial f_{α} (g (p^{'}))}{\partial p_{j}} \frac{\partial g_{j} (p^{'})}{\partial p_{i}^{'}} = 0

or, in compact notation

d g^{T} (p^{'}) (\nabla_{p} H (g (p^{'})) + d f^{T} (g (p^{'})) μ) = 0

Since g is invertible,

det g \neq 0

, and

\nabla_{p} H (g (p^{'})) + d f^{T} (g (p^{'})) μ = 0

if and only if

\hat{p} = g (p^{'})

is a constrained extremum for

H (p)

and

λ = μ

. ☐

References

Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Banavar, J.R.; Maritan, A.; Volkov, I. Applications of the principle of maximum entropy: From physics to ecology. J. Phys. 2010, 22, 063101. [Google Scholar] [CrossRef] [PubMed]
Haegeman, B.; Rampal, S.E. Entropy maximization and the spatial distribution of species. Am. Nat. 2010, 175, E74–E90. [Google Scholar] [CrossRef] [PubMed]
Harte, J.; Zillio, T.; Conlinsk, E.; Smith, A.B. Maximum entropy and the state-variable approach to macroecology. Ecology 2008, 89, 2700–2711. [Google Scholar]
Dewar, R.C.; Porte, A. Statistical mechanics unifies different ecological patterns. J. Theor. Biol. 2008, 251, 389–403. [Google Scholar]
Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model. 2006, 190, 231–259. [Google Scholar] [CrossRef]
Pueyo, S.; He, F.; Zillio, T. The maximum entropy formalism and the idiosyncratic theory of biodiversity. Ecol. Lett. 2007, 10, 1017–1028. [Google Scholar] [CrossRef] [PubMed]
Frank, S.A. Measurement scale in maximum entropy models of species abundance. J. Evolut. Biol. 2011, 24, 485–496. [Google Scholar] [CrossRef] [PubMed]
Haegeman, B.; Loreau, M. Limitations of entropy maximization in ecology. Oikos 2008, 117, 1700–1710. [Google Scholar] [CrossRef]
Shipley, B. Limitations of entropy maximization in ecology: A reply to Haegeman and Loreau. Oikos 2009, 118, 152–159. [Google Scholar] [CrossRef]
Haegeman, B.; Loreau, M. Trivial and non trivial applications of entropy maximization in ecology: A reply to Shipley. Oikos 2009, 118, 1270–1278. [Google Scholar] [CrossRef]
Harte, J. Maximum Entropy and Ecology. A Theory of Abundance, Distribution, and Energetics; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
Harte, J.; Newman, A. Maximum information entropy: A foundation for ecological theory. Trends Ecol. Evol. 2014, 29, 384–389. [Google Scholar] [CrossRef] [PubMed]
White, P.E.; Thibault, K.M.; Xiao, X. Characterizing species abundance distributions across taxa and ecosystems using a simple maximum entropy model. Ecology 2012, 93, 1772–1778. [Google Scholar] [CrossRef] [PubMed]
McGlinn, D.J.; Xiao, X.; Kitzes, J.; White, E.P. Exploring the spatially explicit predictions of the Maximum Entropy Theory of Ecology. Glob. Ecol. Biogeogr. 2015, 24, 675–684. [Google Scholar] [CrossRef]
Xiao, X.; McGlinn, D.J.; White, E.P. A strong test of the maximum entropy theory of ecology. Am. Nat. 2015, 185, E70–E80. [Google Scholar] [CrossRef] [PubMed]
Shore, J.; Johnson, R. Axiomatic derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
Schrodinger, E. Statistical Thermodynamics; Courier Corporation: North Chelmsford, MA, USA, 1989. [Google Scholar]
Fisher, R.A.; Corbet, A.S.; Williams, C.B. The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 1943, 12, 42–58. [Google Scholar] [CrossRef]
Tovo, A.; Suweis, S.; Formentin, M.; Favretti, M.; Volkov, I.; Banavar, J.R.; Maritan, A. Upscaling species richness and abundances in tropical forests. Sci. Adv. 2017, 3, e1701438. [Google Scholar] [CrossRef] [PubMed]

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Favretti, M. Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology. Entropy 2018, 20, 11. https://doi.org/10.3390/e20010011

AMA Style

Favretti M. Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology. Entropy. 2018; 20(1):11. https://doi.org/10.3390/e20010011

Chicago/Turabian Style

Favretti, Marco. 2018. "Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology" Entropy 20, no. 1: 11. https://doi.org/10.3390/e20010011

APA Style

Favretti, M. (2018). Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology. Entropy, 20(1), 11. https://doi.org/10.3390/e20010011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remarks on the Maximum Entropy Principle with Application to the Maximum Entropy Theory of Ecology

Abstract

1. Introduction

2. MEP Formulation Using Different Variables

3. A Simple, Ecologically Oriented Example

Which One is the Correct Application of MEP?

4. Maximum Entropy Theory of Ecology

4.1. Review of M.E.T.E. Assumptions

4.2. Our Solution of M.E.T.E. Problem

5. Analysis of the Application of MEP in M.E.T.E.

The Correct Form of Entropy Function of M.E.T.E. Problem

6. Discussion

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI