1. Introduction
Directed distances—particularly known as divergences—
between probability vectors (i.e., vectors of probability frequencies)
are widely used in statistics as well as in the adjacent research fields of information theory, artificial intelligence and machine learning. Prominent examples are, e.g., the Kullback–Leibler information distance/divergence (also known as relative entropy), the Hellinger distance, the Jensen–Shannon divergence and Pearson’s chi-square distance/divergence; those are special cases of the often-used wider class of the so-called Csiszár–Ali–Silvey–Morimoto (CASM)
–divergences
(cf. [
1,
2,
3]). For some comprehensive overviews on CASM
–divergences, the reader is referred to, e.g., the insightful books [
4,
5,
6,
7,
8,
9], the survey articles [
10,
11,
12,
13], and the references therein. It is well known that the optimization of such CASM
–divergences plays an important role, in obtaining estimators (e.g., the omnipresent maximum likelihood estimation method can be equivalently seen as a minimum Kullback–Leibler information distance estimation method), as well as in quantifying the model adequacy in the course of a model-search (model-selection) procedure; for the latter, see, e.g., [
14,
15,
16,
17].
In the literature, one can also find a substantial amount of special cases of CASM
–divergences
between other prominent statistical quantities
,
(other than probability frequencies), see, e.g., [
18] for a corresponding recent survey. In contrast, there also exist special cases of CASM
–divergences
between other basic objects
B,
A for the quantification of uncertain/imprecise/inexact/vague information such as
fuzzy sets (cf. [
19]) and
basic belief assignments from Dempster–Shafer evidence theory (cf. [
20,
21]). Indeed, as far as the former is concerned, for instance, ref. [
22] employ (a variant of) the Kullback–Leibler information distance between two fuzzy sets
B and
A (which they call
fuzzy expected information for discrimination in favor of B against A), ref. [
23] investigate the Jensen–Shannon divergence between two intuitionistic fuzzy sets
B and
A (which they call
symmetric information measure between B and A), whereas [
24] deal with the Jensen–Shannon divergence between two extended representation type (i.e., hesitancy-degree including) Pythagorean fuzzy sets
B and
A. As far as CASM
–divergences
between basic belief assignments (BBAs)
B,
A is concerned, for instance, refs. [
25,
26] employ the Jensen–Shannon divergence for multi-sensor data fusion, whereas [
27] use the Hellinger distance for characterizing the degree of conflict between BBAs.
In view of the above-mentioned illuminations, the main goals of this paper are as follows:
- (M1)
to define—dissimilarity-quantifying—
generalized CASM φ–divergences between fuzzy sets, between
–rung orthopair fuzzy sets in the sense of [
28] (including intuitionistic and Pythagorean fuzzy sets), between extended representation type
–rung orthopair fuzzy sets, between those fuzzy set types and vectors, between (rescaled) basic belief assignments as well as between (rescaled) basic belief assignments and vectors;
- (M2)
to present how one can tackle corresponding constrained minimization problems by appropriately applying our recently developed dimension-free
bare (pure) simulation method of [
29].
This agenda is achieved in the following way: in the next
Section 2, we recall the basic definitions and properties of
generalized CASM φ–divergences between vectors. The follow-up
Section 3 explains the basic principles of our above-mentioned bare-simulation optimization method of [
29] (where, for the sake of brevity, we focus on the minimal values and not on the corresponding minimizers). In
Section 4, we achieve the main goals (M1) and (M2) for the above-mentioned types of fuzzy sets, whereas
Section 5 concerns with (M1) and (M2) for (rescaled) basic belief assignments. The conclusions are discussed in the final
Section 6.
3. Optimization of Generalized –Divergences
via the Bare Simulation Solution Method
In the following we deal with
Problem 1. For pregiven , positive-entries vector (or from some subset thereof), and subset with regularity propertiesfindprovided that Remark 1. (a) When Ω is not closed but merely satisfies (2), then the infimum in (3) may not be reached in Ω although it is finite; additionally, Ω is a closed set, then a minimizer exists. In the subsetup where Ω is a closed convex set and , (2) is satisfied and the minimizer in (3) is attained and even unique. When Ω is open and satisfies (2), then the infimum in (3) exists, but is generally reached at some
of on Ω (see [33] for the Kullback–Leibler divergence case of probability measures, which extends to any generalized φ–divergence in our framework). However, in this paper, we only deal with finding the in (3) (rather than a corresponding ). (b) Our approach is predestined for non- or semiparametric models. For instance, (2) is valid for the appropriate tubular neighborhoods of parametric models or for more general non-parametric settings, such as, e.g., shape constraints. (c) Without further mentioning, the regularity condition (2) is supposed to hold in the full topology. According to our work [
29], the above-mentioned Problem 1 can be solved by a new dimension-free precise
bare simulation (BS) method to be explained in the following. We first suppose
Condition 1. With , the divergence generator in (3) is such that its multiple satisfies (G1) to (G4’) (i.e., ) and additionally there holds the representationfor some probability measure
on the real line such that the function (y) is finite on some open interval containing zero (notice that the latter implies that (y) = 1 and that has light tails). A detailed discussion on the representability (5) can be found in [
34]. By means of this
cornerstone Condition 1, we construct in [
29] a sequence
of
–valued random variables/random vectors (on an auxiliary probability space
)) as follows: for any
and any
, let
where
denotes the integer part of
x and
. Thus, one has
Moreover, we assume that
is large enough, namely,
, and decompose the set
of all integers from 1 to
n into the following disjoint blocks:
,
, and so on until the last block
, which, therefore, contains all integers from
to
n. Clearly,
has
elements (i.e.,
, where
denotes the number of elements in a set
A) for all
, and the last block
has
elements, which, anyhow, satisfies
. Furthermore, consider a vector
where the
’s are i.i.d. copies of the random variable
whose distribution is associated with the multiple divergence-generator
through (5), in the sense that
[·]. We group the
’s according to the above-mentioned blocks and sum them up blockwise, in order to build the following
K–component random vector
For such a context, in [
29], we obtain the following solution of Problem 1:
Theorem 1. Under Condition 1, there holds the “bare-simulation (BS) minimizability”for any with regularity properties (2) and finiteness property (4). Theorem 1 provides our principle for the
approximation of the solution of the deterministic optimization problem (
3). Indeed, by replacing the involved limit by its finite counterpart, we deduce for given large
n
it remains to estimate the probability on the right-hand side of (7). The latter can be performed either by a
naive estimator of the frequency of those replications of
which hit
, or, more efficiently, by some improved estimator, see [
29] for details, where we give concrete construction methods as well as numerous solved cases; for the latter, for the sake of brevity, we mention only two important special cases. The first one deals with the class of power-divergence generators
(with arbitrary multiplier
) defined by
which—by (
1)—generate (the vector-valued form of) the
generalized power divergences given by
for a corresponding detailed literature embedding (including applications and transformations), the reader is referred to [
29]. For any fixed
, Condition 1 is satisfied for
for all
and all
, and thus the BS-minimizability concerning Theorem 1 can be applied. Notice that the case
has to be left out for technical reasons. The corresponding crucial simulation distributions
[·] (cf. (5)) are given by the following:
- (DIS1)
a tilted stable distribution on for the case ;
- (DIS2)
the “ distribution” for ;
- (DIS3)
the “ – distribution” for ;
- (DIS4)
the “–fold of distribution” for ;
- (DIS5)
the “ distribution” for ;
- (DIS6)
a distorted stable distribution on for .
for details see our paper [
29].
The second important special case to be mentioned here, deals with
which leads to
For any fixed
, Condition 1 is satisfied for
for all
, and thus the BS-minimizability concerning Theorem 1 can be applied. The corresponding crucial simulation distribution
[·] (cf. (5)) is given by the “
–fold of
” (cf. [
29]). For the special subcase
we derive
which means that in such a situation the divergence (
11) can be rewritten as a sum of two generalized Kullback–Leibler divergences (cf. (
9)). For the important subsetup that
, and thus both
as well as
are probability vectors, the divergence
in (
12) is the well-known (cf. [
35,
36,
37,
38,
39,
40,
41])
Jensen–Shannon divergence (being also called symmetrized and normalized Kullback–Leibler divergence, symmetrized and normalized relative entropy, and capacitor discrimination).
For further examples the reader is referred to our paper [
29]. In the latter, we also derive bare-simulation optimization versions for constraint sets
in—a strictly positive multiple
of—the probability simplex, to be explained in the following. First, we denote by
to be the simplex of probability vectors (probability simplex), and
. For better emphasis, (as already performed above) for elements of these two sets we use the symbols
instead of
,
, etc., but for their components we still use our notation
,
. Moreover, subsets of
or
will be denoted by
instead of
, etc. As indicated above, in the following we deal with constraint sets of the form
for some arbitrary
, which automatically satisfy
in the
full topology and thus the regularity condition (
2) is violated (cf. Remark 1 (c)). Therefore, we need an adaption of the above-mentioned method. In more detail, we deal with
Problem 2. For pregiven , positive-components vector , and subset with regularity properties—in the relative topology (!!) —findprovided thatand that divergence generator φ additionally satisfies the Condition 1. For the directed distance minimization Problem 2, we proceed (with the same notations) as above and construct the following
K–component random vector (instead of
in (
6))
By construction, in case of
, the sum of the random
K vector components of (
15) are now automatically equal to one, but—as (depending on
) the
’s may take both positive and negative values— these random components may be negative with a probability strictly greater than zero (respectively, non-negative with a probability strictly less than one). However,
since all the (identically distributed) random variables
have and expectation of one (as a consequence of the assumed representability (5)); in case of
, one has even
. Summing up things, the probability
is strictly positive and finite at least for large n, whenever
is finite.
As mentioned right after (
9) above, the required representability (5) is satisfied for all (multiples of) the generators
of (
8) with
and
(cf. [
29]). Within this context, for arbitrary constants
and
we define the auxiliary functions
we obtained (with slight rescalation) in [
29] the following solution of Problem 2:
Theorem 2. Let with , , and be arbitrary but fixed. Moreover, let , and be a family of independent and identically distributed –valued random variables with probability distribution
[·]:= being connected with the divergence generator via the representability (5) (cf. (DIS1)-(DIS6)). Then there holds the “bare-simulation (BS) minimizability”for all sets satisfying the regularity properties (13) in the relative topology. Here, for , respectively, for . Analogously to (7), Theorem 2 provides our principle for the
approximation of the solution of the deterministic optimization problem (
14). Indeed, by replacing the involved limit by its finite counterpart, we deduce for given large
n
the probability in the latter can be estimated either by a
naive estimator of the frequency of those replications of
which hit
, or more efficiently by some improved estimator; see [
29] for details, where (for the case
, with straightforward adaption to
) we give concrete construction methods as well as numerous solved cases.
By means of the straightforward deterministic transformations, Theorem 2 carries straightforwardly over to the BS optimizability of, e.g., the following important quantities
(provided that all involved power divergences are finite), which are (by monotonicity and continuity) thus BS-minimizable on
for all
:
for all sets
satisfying the regularity properties (
13)
in the relative topology; here, the simulation distribution
[·] (cf. (5)) is given by (DIS1), (DIS3), (DIS5) and (DIS6), respectively. The special subcase
,
in (
18) (and thus,
,
are probability vectors
) corresponds to the prominent
Renyi divergences/distances [
42] (in the scaling of, e.g., [
4] and in probability vector form), see, e.g., [
43] for a comprehensive study of their properties. Notice that
may become strictly negative, e.g., in the case that
, however, in this case the variant
always stays non-negative and leads basically to the “same” optimization
For the cases
and
, important transformations are the
modified Kullback–Leibler information (modified relative entropy)
and the
modified reverse Kullback–Leibler information (modified reverse relative entropy)
notice that
can become negative if
and
can become negative if
(see [
30] for counterexamples). Nevertheless, in general, we obtain immediately from
in Theorem 2 that
for all sets
satisfying the regularity properties (
13)
in the relative topology; here, the simulation distribution
[·] (cf. (5)) is given by (DIS4). Moreover, by employing
in Theorem 2 we deduce
for all sets
satisfying the regularity properties (
13)
in the relative topology; here, the simulation distribution
[·] (cf. (5)) is given by (DIS2).
Remark 2. By taking the special case P:= to be the probability vector of frequencies of the uniform distribution on (and thus ) in the above formulas (16) to (24), we can deduce many bare-simulation solutions of constrained optimization of various different versions of entropies ; for details, see Sections VIII and XII in [29]. From (19), (
21), (23) and (24) we deduce the approximations (for large
)
where for the involved
one can again employ either a
naive estimator of the frequency of those replications of
which hit
, or an improved estimator, see [
29] for details.
4. Minimization Problems with Fuzzy Sets
Our above-mentioned BS framework can be applied to the—imprecise/inexact/vague information describing —
fuzzy sets (cf. [
19]) and optimization problems on divergences between those. Indeed, let
be a finite set (called the
universe (of discourse)),
and
be a corresponding
membership function, where
represents the degree/grade of membership of the element
to the set
C; accordingly, the object
is called a
fuzzy set in (or fuzzy subset of
). Moreover, if
and
are two unequal sets, then the corresponding membership functions
and
should be unequal. Furthermore, we model the
vector of membership degrees to C by
, which satisfies the
key constraint for all
and, consequently, the
aggregated key constraint (as a side remark,
is called
power of the fuzzy set ). For divergence generators
in
with (say)
and for two sets
we can apply (
1) to the corresponding membership functions and define the
generalized φ–divergence between the fuzzy sets and (on the same universe
) as (cf. [
44])
(depending on
, zero degree values may have to be excluded for finiteness). For instance, we can take
for
(cf. (
8)) to end up with a generalized Kullback–Leibler divergence (generalized relative entropy) between
and
given by (cf. (
9))
this contrasts the choice
of [
22]
which they call
fuzzy expected information for discrimination in favor of B against A, and which
may become negative (cf. see the discussion after (
22) and [
30] in a more general context). Returning to the general case (
29), as a special case of the above-mentioned BS concepts, we can tackle optimization problems of the type
where
is a collection of fuzzy sets (on the same universe
) whose membership degree vectors form the set
satisfying (
2) and (
4). Because of the inequality type key constraint
which is incorporated into
, and which implies that
is contained in the
K–dimensional unit hypercube and in particular
, Theorem 1 (and thus, (7)) applies correspondingly (e.g., to the generalized power divergences
given by (
9) and the generalized Jensen–Shannon divergence
given by (
11))—unless there is a more restrictive constraint that violates (
2) such as, e.g.,
with
for which Theorem 2 (and hence, (17))—as well as its consequences (19), (
21), (23), (24) (and thus, (25)–(28)) can be employed.
The above-mentioned considerations can be extended to the recent concept of
ν–rung orthopair fuzzy sets (cf. [
28]) and divergences between those. Indeed, for
, besides a membership function
one additionally models a
nonmembership function , where
represents the degree/grade of nonmembership of the element
to the set
C. Moreover, if
and
are unequal sets, then the corresponding nonmembership functions
and
should be unequal. For fixed
, the
key constraint
is required to be satisfied, too. Accordingly, the object
is called a
ν–rung orthopair fuzzy set in (or … subset of ). The object
is called
intuitionistic fuzzy set in (cf. [
45]) in case of
, and
Pythagorean fuzzy set in (cf. [
46,
47]) in the case of
. For the choice
together with
, the object
can be regarded as an extended representation of the fuzzy set
in
.
For any
–rung orthopair fuzzy set
in
, we model the corresponding
vector of concatenated membership and nonmembership degrees to C by
, which, due to (
30), satisfies the
aggregated key constraint
in other words,
lies (within the
–dimensional Euclidean space) in the intersection of the first/positive orthant with the
–norm ball centered at the origin and with radius
. Analogously to (
29), we can define the
generalized φ–divergence between the ν–rung orthopair fuzzy sets and (on the same universe
) as (cf. [
44])
respectively, as its variant (cf. [
44])
For the special choice
(cf. (
10)) and
, the definition (
32) leads to the
generalized Jensen–Shannon divergence between and given by
this coincides with the
symmetric information measure between and of [
23]. For the special choice
with
(cf. (
8)),
,
,
and thus (
31) turning into
, one can straightforwardly see that the outcoming generalized Kullback–Leibler divergence (generalized relative entropy) between
and
given by
coincides with
where
; the latter divergence was used, e.g., in [
22] under the name
average fuzzy information for discrimination in favor of B against A.
Returning to the general context, in terms of the divergences (
32) and (
33), we can tackle—as a special case of the above-mentioned BS concepts—optimization problems of the type
where
is a collection of
–rung orthopair fuzzy sets whose concatenated membership–nonmembership degree vectors form the set
satisfying (
2) and (
4) as well as (
30) for
. Because of the latter, Theorem 1 (and thus, (7)) applies correspondingly (e.g., to the generalized power divergences
given by (
9) and the generalized Jensen–Shannon divergence
given by (
11))—unless there is a more restrictive constraint that violates (
2) such as, e.g.,
with
for which Theorem 2 (and thus, (17)) — as well as its consequences (19), (
21), (23), (24) (and thus, (25)–(28)) can be employed; such a situation appears, e.g., in the case
together with
, which leads to
.
For the
–rung orthopair fuzzy sets
in
, we can also further “flexibilize” our divergences by additionally incorporating the
hesitancy degree of the element
to
C, which is defined as
(cf. [
28]), and which implies the
key constraint
Accordingly, the object
can be regarded as an extended representation of the
–rung orthopair fuzzy set
in
. For
, we model the corresponding
vector of concatenated membership, nonmembership and hesitancy degrees to C by
which, due to (
34), satisfies the
aggregated key constraint
in other words,
lies (within the
–dimensional Euclidean space) in the intersection of the first/positive orthant with the
–norm sphere centered at the origin and with radius
. Analogously to (
32) and (
33), we can define the
generalized φ–divergence between the extended representation type ν–rung orthopair fuzzy sets and (on the same universe
) as (cf. [
44])
for technical reasons, we do not deal with its variant
For instance, by taking the special choice
and
(cf. (
10)) in (
35), we arrive at the
Jensen–Shannon divergence between and of the form
, which—by the virtue of (
35) and (
11) — coincides with the
(squared) Pythagorean fuzzy set Jensen–Shannon divergence measure between and of [
24].
To continue with the general context, as a particular application of the above-mentioned BS concepts, we can tackle the general optimization problems of the generalized power divergence type
where
is a collection of extended representation type
–rung orthopair fuzzy sets whose concatenated membership–nonmembership-hesitancy degree vectors form the set
, satisfying (
34) for
(for each member) as well as the regularity properties (
13)
in the relative topology. Thus, for the minimization of (
36) we can apply our Theorem 2 (and, consequently, (17)) by choosing there (with a slight abuse of notation)
. Of course, we can also apply our BS optimization method to the corresponding Renyi divergences
(via (19)),
(via (19) and (
21)) as well as
(via (23)),
(via (24)), and employ the correspondingly applied approximations (25)–(28). For instance, by applying (
20) with
(with a slight abuse of notation) we arrive at the non-negative
γ–order Renyi divergence between ν–rung orthopair fuzzy sets given by
depending on
, zero degree values may have to be excluded for finiteness. As a side remark, let us mention that our divergence (
37) contrasts to the recent (first) divergence of [
48] who basically uses a different scaling, the product
instead of the sum
, as well as
instead of
. By appropriately applying (19) and (
21), we can tackle with our BS method for
the minimization problem
, where
is a collection of extended representation type
–rung orthopair fuzzy sets
whose concatenated membership–nonmembership-hesitancy degree vectors form the set
satisfying (
34) for
(for each member) as well as the regularity properties (
13)
in the relative topology.
We can also apply our BS optimization method to “crossover cases”
and
(instead of (
29),
and
(instead of (
32)),
and
(instead of (
33)),
and
(instead of (
35)),
and
,
and
,
and
,
and
,
and
, where
(respectively,
,
,
or
) is a general vector (not necessarily induced by a fuzzy set) having the same dimension
(namely,
K,
or
) as the fuzzy set induced vector to be compared with. For instance, if we apply Remark 2 to
P:=
and
and employ the corresponding straightforward application of the general results of Sections VIII and XII of [
29], then we end up (via the appropriately applied Theorem 2) with the BS optimization results of
),
),
) and
), which can be deterministically transformed into the BS optimization results of various different versions of entropies
of
–rung orthopairs fuzzy sets
, where
is any entropy in Chapter VIII of [
29].
As a final remark of this section, let us mention that we can carry over the above-mentioned definitions and optimizations to (classical, intuitionistic, Pythagorean and –rung orthopair) L–fuzzy sets, where the range of the membership functions, nonmembership functions and hesitancy functions is an appropriately chosen lattice L (rather than ); for the sake of brevity, the details are omitted here.
5. Minimization Problems with Basic Belief Assignments
Our BS framework also covers—imprecise/inexact/vague information describing—
basic belief assignments from Dempster–Shafer evidence theory (cf. [
20,
21]) and optimization problems on the divergences between those. Indeed, let
be a finite set (called the
frame of discernment) of mutually exclusive and collectively exhaustive events
. The corresponding power set of
is denoted by
and has
elements; we enumerate this by
, where for convenience we set
. A mapping
is called a
basic belief assignment (BBA) (sometimes alternatively called basic probability assignment (BPA)) if it satisfies the two conditions
Here, the belief mass
reflects, e.g., the trust degree of evidence to proposition
. From this, one can build the belief function
by
and the plausibility function
by
. Moreover, we model the
–dimensional
vector of (M–based) BBA values (vector of (
M–based) belief masses) by
, which satisfies the
key constraint for all
and, by virtue of (
38), the
aggregated key constraint . Hence,
lies formally in the
–dimensional simplex
(but generally not in the corresponding probability vector-describing
).
For divergence generators
in
with (say)
and for two BBAs
on the same frame of discernment
, we can apply (
1) to the corresponding vectors of BBA values and define the
generalized φ–divergence between the BBAs and (in short,
Belief–φ–divergence) as
This definition (
39) of the Belief–
–divergence was first given in our paper [
44]; later on, [
49] used formally the same definition under the different (but deficient) assumptions that
is only convex and satisfies
, which leads to a violation of the basic divergence properties (take, e.g.,
for all
t, which implies
even if
; such an effect is excluded in our setup due to our assumption (G4’) being part of
); as a technical remark let us mention that (as already indicated in
Section 2) depending on
, zero belief masses may have to be excluded for finiteness in (
39). For instance, we can take in (
39) the special case
(cf. (
10)) to end up with the
Belief-Jensen–Shannon divergence of [
25,
26] who applies this to multi-sensor data fusion. As another special case we can take
(cf. (
8)) to end up with the 4–times square of the
Hellinger distance of BBAs of [
27], who use this for characterizing the degree of conflict between BBAs. To continue with the general context, as a particular application of the above-mentioned BS concepts, we can tackle general optimization problems of the type
where
is a collection of BBAs whose vectors of BBA-values form the set
satisfying the regularity properties (
13)
in the relative topology; hence, we can apply our Theorem 2 (and thus, (17)) for the minimization of (
40), by taking
instead of
K as well as (with a slight abuse of notation)
. Of course, we can also apply our BS optimization method to the corresponding Renyi divergences
(via (19)),
(via (19) and (
21)) as well as
(via (23)),
(via (24)), and employ the correspondingly applied approximations (25)–(28). For instance, by applying (
18) and (19) we arrive at
and
where
is chosen according to Theorem 2 with
instead of
K as well as
.
We can also apply our BS optimization method to the divergences between the rescaling of BBAs. For instance, let
(
) with the convention that
, and denote the corresponding vector
. Accordingly, we define the
generalized φ–divergence between the rescaled BBAs and (in short,
rescaled Belief–φ–divergence) as (cf. [
44])
where for
we have used the convention that
(depending on
, other zero rescaled belief masses may have to be excluded for finiteness). For the corresponding minimization problem
where
is a collection of rescaled BBAs whose vectors of rescaled BBA values form the set
satisfying (
2) and (
4), we can apply Theorem 1 and (7) (with
instead of
K)—unless there is a more restrictive constraint that violates (
2) such as, e.g.,
, for which Theorem 2 (and thus, (17)) can be employed.
We can also apply our BS optimization method to “crossover cases”
and
(instead of (
39)),
and
(instead of (
41)),
and
,
and
,
and
,
and
,
and
, where
(respectively,
,
,
or
) is a general vector (not necessarily induced by a (rescaled) BBA) having the same dimension (namely,
) as the (rescaled) BBA-induced vector to be compared with. For instance, if we apply Remark 2 to
P:=
and
and employ the corresponding straightforward application of the general results of Sections VIII and XII of [
29], then we end up (via the appropriately applied Theorem 2) with the BS optimization results of
),
),
) and
)
, which can be deterministically transformed into the BS optimization results of various different versions of entropies
of BBAs
, where
is any entropy in Chapter VIII of [
29].
To give a more explicit example concerning the preceding paragraph, our BS method of Theorem 2 and (17) (with
instead of
K as well as
) applies to the crossover case
where
is a vector of
M–based BBA values and
is as above, and
is a vector whose sum of components may not necessarily be one. For instance, for the special choice
, i.e.,
(cf. (
8)),
,
with
employing the cardinality
of
, and the usual convention
, we end up with (cf. (
9))
where
is nothing but (a multiple of) Deng’s entropy of the BBA
M (cf. [
50], see also, e.g., [
51]). Of course, we can also deal with the optimization of corresponding Renyi divergences
by applying (
18) and (19). For the “reverse-argument” crossover case
one can apply Theorem 1 (and thus, (7))—unless there is a more restrictive constraint, which violates (
2) such as, e.g.,
, for which Theorem 2 (and hence, (17))—as well as its consequences (19), (
21), (23), (24) (and thus, (25)–(28)) can be employed.