1. Introduction
Given
n discrete random variables, for a fixed joint distribution, all their
(joint) entropies define an entropic vector in the entropy space
. By varying over all possible joint distributions, the set of all entropic vectors defines the entropic region. The entropic region plays a fundamental role in information theory and network coding. The closure, known as the almost-entropic region, yields rate regions for multi-source coded networks [
1,
2]. However, the complete characterization of the entropic region encounters challenges even when
[
3,
4,
5,
6,
7], where only the closure, i.e., the almost-entropic region, is fully characterized [
3]. When
, the characterization of the almost-entropic region also becomes extremely difficult and remains open [
8].
In the pursuit of the characterization of the entropic region, one crucial problem is to
determine whether an arbitrary vector in the entropy space is entropic or not. If all entropic vectors in the entropy space are verified, the entropic region is fully characterized. Additionally, given a point in the rate region of a coded network, the existence of achievable codes relies on the underlying entropic vector [
9].
This problem has been tackled from different perspectives in the literature. Information inequalities, which fully characterize all almost-entropic vectors, are shown to be infinitely many [
10] and extremely hard to characterize [
8,
10,
11,
12,
13,
14,
15,
16,
17]. The construction of entropic vectors, however, is more feasible and studied through probability distributions, e.g., [
6,
7,
16,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28], groups, e.g., [
29,
30,
31,
32,
33,
34,
35], and matroids, e.g., [
36,
37,
38,
39,
40]. Regarding the construction of entropic vectors from probability distributions, remarkable research has attempted to numerically obtain probability mass functions (PMFs) for entropic vectors [
16,
19,
20,
21,
22,
23,
24,
25,
26,
27], particularly when the alphabet sizes of the involved random variables are bounded. In this way, the entropic nature of a given vector in the entropy space is immediately verified following the definition if the underlying joint PMF is obtained.
Among these approaches, some are interested in quasi-uniform PMFs [
19,
27] or PMFs supported on small atoms [
22,
23,
24,
25], while more general PMFs are considered in [
16,
20,
21,
26]. For example, an algorithm is given in [
20] to verify binary entropic vectors by recursively constructing PMFs. In [
16], the PMFs are numerically obtained with Newton’s method. In [
21], the study of the almost-entropic region when
is reduced to a three-dimensional tetrahedron and visualized by various maximization procedures with different methods and strategies [
21] (Section VII). Recently, a random search algorithm has been introduced [
26] to find the nearest entropic vector to a given target vector. This algorithm iteratively tries a few random perturbations on PMFs, striving to decrease the normalized distance between an entropic vector and the target vector.
However, the limitations of the above methods cannot be overlooked. The method in [
20] is only feasible for binary entropic vectors. In [
16], and [
21], formal algorithms and performances of described methods are not systematically discussed in detail. Although [
26] presents more analytical and empirical results, the limitation of the randomized algorithm persists, and the convergence performances are not empirically guaranteed. Furthermore, a critical issue in above methods is that only PMFs with relatively small alphabet sizes can be handled. This limitation is vital, since the verification of an underlying entropic vector in the rate region of a coded network requires large alphabets when subpacketizations are necessary for achievable codes.
With these methods, which construct entropic vectors from PMFs, some interesting optimization problems over the entropic region when
can be experimented. It is known that the Ingleton inequality [
41] completely characterizes the part of the almost-entropic region achieved by linearly representable entropic vectors [
13]; thus, it reduces the study to the entropic vectors violating it. To measure the degree of violation from the Ingleton inequality for an entropic vector, the
Ingleton score [
16,
42] and
Ingleton violation index [
19] are proposed from different angles. Although remarkable efforts have been made to minimize the Ingleton score [
16,
21,
26,
35,
42] and maximize the Ingleton violation index [
19,
26], their optimal values remain open. In [
16], the Ingleton score is numerically optimized and conjectured to be
, known as the four-atom conjecture [
16] (Conjecture 1). In [
21], a technique is proposed to transform an entropic vector into another entropic vector, and the transformed vector is optimized to the Ingleton score
, which refutes the four-atom conjecture by [
16]. By experimenting with groups [
35], the best-known Ingleton score is currently
. The Ingleton violation index is optimized in [
19] and, subsequently, in [
43], and its best known value is currently
, according to [
26].
As the complete characterization of the almost-entropic region for
is difficult, inner bounds can be constructed by taking the convex hull of certain acquired entropic vectors, as previously investigated in [
16,
22,
26]. The quality of such an inner bound is measured by the volume ratio, i.e., the percentage of inner-bound polytope volume to the volume of the outer-bound polytope. In [
16], several entropic vectors are optimized to form an inner bound with a volume ratio of
. Using distributions supported on small atoms, [
22] finds entropic vectors that yield an inner bound with the volume ratio
. In [
26], with their proposed grid method, the current best-known volume ratio is
, while the largest volume ratio, which requires the complete characterization of the almost-entropic region when
, is still open.
In this paper, we develop
a novel architecture for optimizing PMFs for associated entropic vectors via convolution neural networks (CNNs). Recently, applications of neural networks (NNs) can be found in information theory [
44,
45,
46] and control theory [
47,
48]. Our motivation arises from the fruitful research on applying NNs to problems in information theory [
44,
45,
46]. More specifically, in [
45], a unique mapping that generates conditional PMFs is defined and approximated using NNs. In [
46], a model which generates a capacity-achieving input PMF for a given channel in discrete input spaces is proposed. Consequently, we are especially interested in approximating another mapping, which generates joint PMFs for multiple random variables. Sharing similar ideas with [
26], the problem of verifying an entropic vector can be performed by minimizing the normalized distance through NN training. Furthermore, we are motivated to modify NNs due to the special structures and complexities of joint PMFs. Additionally, the developed method can be applied immediately to the optimizations of the Ingleton score and Ingleton violation index, and the construction of inner bounds for the almost-entropic region.
The major contributions of this paper are summarized as follows.
A novel algorithm is proposed to optimize distributions for entropic vectors with NN training. For each target vector, an NN is trained such that the output PMF produces an entropic vector as close to target as possible. In practice, we implement the algorithm with CNNs, which accelerate and enable the algorithm to generate PMFs with large alphabets.
The effectiveness of our proposed method is verified by empirical results. More specifically, smaller normalized distances and improved convergence performances are achieved by our proposed method, compared to [
26]. In addition, with derived theoretical guarantees, by exploiting the proposed algorithm, the state-of-the-art Ingleton score is reconfirmed, and a new tighter lower bound of the Ingleton violation index is obtained. Furthermore, by utilizing the proposed algorithm, we develop another algorithm to construct a new inner bound of the almost-entropic region (
), yielding the current best inner bound measured by the volume ratio.
This paper is organized as follows.
Section 2 introduces preliminaries, notations, and the problem statement. The proposed method and algorithm with derived theoretical guarantees are presented in
Section 3, and the implementation with CNNs is demonstrated in
Section 4. In
Section 5, empirical results exploiting the proposed method for several problems when
are presented.
Section 6 summarizes the paper in general, and discusses the potential of the proposed method to construct achievable schemes for network coding problems.
2. Preliminaries and Problem Statement
In this section, we provide the preliminaries, notations, and the problem statement of this paper.
2.1. Convex Cones and Convex Polytopes
Given a set , C is a pointed cone if implies that for all real , and is a ray of C. Pointed cones are unbounded except , where is the origin of Euclidean space. In the rest of the paper, all cones are assumed to be pointed and unbounded. For all real , of C is an extreme ray if cannot be expressed as the positive linear combination of any . A cone C is convex if C is a convex set.
Given a convex set , C is a convex polyhedron if it is the intersection of finitely many halfspaces (i.e., linear inequalities). A convex cone C is a special polyhedron when the halfspaces that define C contain simultaneously, and, in this case, the convex cone C is called polyhedral. A convex polyhedron is a convex polytope if it is bounded. In the rest of the paper, all polytopes are assumed to be convex. For a convex cone (or convex polytope), if we specify one extreme ray (or vertex) as the top and others as the bases, then the convex cone (or convex polytope) is often called the pyramid.
For a convex polyhedron
C, there are two equivalent types of representations, i.e., the
H-representation and the
V-representation [
49] (Chapter 1). The H-representation is the set of all halfspaces defining
C, and the V-representation is the set of all vertices (if
C is a convex polytope) or extreme rays (if
C is a polyhedral convex cone) defining
C. The transformation between the H-representation and the V-representation for a given convex cone or convex polytope can be numerically performed with the implementation [
50,
51,
52] of the double description method [
53]. More details about convex cones and convex polytopes can be found in [
49].
2.2. Entropic Vectors and Entropic Region
Let
, consider a discrete random vector
with a finite index set
, and
takes values in the finite alphabet
of size
. The realization of
is denoted as
. Let
denote the probability of
such that
. The joint probability mass function (PMF) of
is denoted as
, where
is the
m-dimensional probability simplex defined as
A vector in
is a
valid PMF if it belongs to the set
.
We consider a subset of components of the random vector
; the above quantities can be denoted accordingly. More specifically, we consider the set of random variables
, which takes values in the finite alphabet
. The realization of
is denoted as
. Let
denote the probability of
such that
. The marginal PMF of
is denoted as
. Then, the Shannon entropy of
is
where
. We define
, which is the
entropy function. The vector consisting of entropy functions for all
is the
entropic vector, as formally defined in the following definition.
Definition 1 (Entropic vectors [
54], Chapter 13)
. For the random vector and index set , given a joint PMF , the entropic vector is defined asand is the associated entropic vector of , denoted as . We note that, by varying
, there are total
(joint) entropies for the random vector
. Thus, each
can be viewed as a vector in the
-dimensional Euclidean space, which is defined as the
entropy spaceFor example, given a random vector
, let each random variable in
be uniformly i.i.d.-distributed on alphabet
; then, the associated entropic vector in the entropy space
is
Allowing infinite alphabets, the region in
consisting of all entropic vectors is the
entropic region ([
54], Chapter 13)
The closure of
is the
almost-entropic region ([
54], Chapter 15).
Given a set of random variables,
basic inequalities form the set of inequalities implied by the nonnegativity of all Shannon’s information measures. The nonredundant subset of basic inequalities consists of
elemental inequalities ([
54], Chapter 14), which are defined as the following two types of inequalities for the random vector
The inequalities implied by elemental inequalities are
Shannon-type inequalities. The region in
that consists of vectors satisfying all elemental inequalities is ([
54], Chapter 14)
Sometimes
is called the
polymatroidal region because of the equivalence between the elemental inequalities for vectors in the entropy space and the polymatroidal axioms for polymatroidal rank functions [
55].
Although both
and
are regions within
, the former stands for the associated entropic vectors defined by valid PMFs, while the latter is formed by vectors satisfying all Shannon-type inequalities. Hence, it is reasonable to question the identity between
and
. It was first discovered by [
3,
8] that
is a loose outer bound of
. More specifically, it is known that
,
but
, and
when
[
3,
8]. These relations reveal that there are inequalities tighter than Shannon-type inequalities as the outer bound of the entropic region, i.e., the inequalities hold for all entropic vectors but cannot be implied by elemental inequalities, and these inequalities are often referred to as the
non-Shannon-type inequalities ([
54], Chapter 15).
We briefly introduce some known structures of
,
and
. Both
and
are pointed convex cones in the nonnegative orthant of
[
54] (Chapter 13). The region
is polyhedral, while
is not when
[
10], i.e., the convex cone
(
) is represented with finitely many (infinitely many) extreme rays or finitely many (infinitely many) linear inequalities. For
, although the almost-entropic region
is a convex cone, the region
is not convex and
. The complete characterization of
for
is difficult and open. For more details of
,
, and
, please refer to [
54] (Chapter 13–15).
Given the difficulty of characterizing with infinite alphabets, in order to optimize finite PMFs numerically, it is practical to consider when the random variables have finite alphabet sizes.
Definition 2 (Alphabet-bounded entropic region [
54], Chapter 21)
. Given a random vector with an index set , taking values on the finite alphabet of size such that , the alphabet-bounded entropic region is defined aswhere is the entropic vector associated with the PMF , and is the probability simplex defined on the finite alphabet of size m. The region is the collection of entropic vectors associated with PMFs defined on the finite alphabet , and is a compact and closed set.
To characterize the closeness of two given vectors, the most straightforward measurement is the angle between the rays determined by them. Here, we adopt the measure of
normalized distance, proposed in [
26].
Definition 3 (Normalized distance)
. Given vectors , the normalized distance between and is defined aswhere is the norm, and . We note that the normalized distance is the tangent of the angle between the rays determined by
and
. Thus, we also have
2.3. Neural Networks
In general, an NN is a computational model, with significant expressive capability for desired functions [
56]. Following convention, we now formally define fully connected feedforward multilayer NNs as a family of functions.
Definition 4 (Neural networks [
56])
. With hidden layers of sizes , fixed input dimension and output dimension , where , let ∘ denote the composition of functions, a fully connected feedforward multilayer NN is defined as the following family of functionswhere, for , is a linear operationwith the weight matrices , the data vectors , and the bias vectors . The nonlinear activation function is performed on vectors element-wise, such that, for ,For an arbitrary number of hidden layers and sizes, fully connected feedforward multilayer NNs with the same fixed input and output dimensions are defined as The set of all possible weights and biases of an NN is the
parameter space, which is denoted as
(here,
d is defined as the dimension of the parameter space). We sometimes denote an NN function
with parameters
as
. Exploiting NNs, desired functions can be
parameterized with
, and
approximated by optimizing
with gradient descent methods [
57]. Multilayer feedforward NNs are known to be universal approximators for any measurable function as long as the scale of the model is sufficiently large [
56].
Activation functions of NNs may vary in forms for different tasks. More specifically, let the input be
with elements
; the most common activation functions include the
sigmoid activation function
and the
ReLU activation function
. There are many variations of ReLU, including the
ELU activation function
A special activation function is the
softmax activation function or the
softmax layer, which is defined as
, where
We note that the output vector of the softmax layer has special properties, i.e.,
and
, which make the output vector a valid PMF in (
1).
For details on NNs, please refer to [
58] (Chapter 20).
2.4. Problem Statement
In this paper, when a target vector is given, we are interested in finding an entropic vector in , such that is relatively small. If is entropic, we aim to give the underlying PMF to verify and realize . If is not entropic, we aim to give an entropic vector close to and, hopefully, on the boundary of .
We recall that is the tangent of the angle between the corresponding rays, but the tangent is only strictly increasing when the angle is limited in . Restricting the target vector to be in can satisfy this requirement, since belongs to the nonnegative orthant of and .
3. Optimizing PMFs for Associated Entropic Vectors via Neural Networks
In this section, we propose the methodology that tackles the problem stated in
Section 2.4. More specifically, given a target vector in
, we propose to train a corresponding NN to identify the entropic vector closest to the target, and, at the same time, provide the corresponding underlying PMF.
In particular, the proposed NN is configured with a final softmax layer to output PMFs, with the input fixed to a specific constant (similar settings can be found in [
45,
46]). We claim that, for a given target vector, there always exists an NN such that the entropic vector associated with the produced PMF is arbitrarily close to the target in terms of normalized distance. This statement is formalized in the following theorem.
Theorem 1. For the random vector with a fixed and finite alphabet , given a target (entropic or not), we consider the entropic vector that yields the minimal normalized distance to . Then, for , there exists a corresponding NN with parameters (where is the parameter space) which outputs the PMF valid in from a specific constant input , such thatwhere is the associated entropic vector of . The proof of Theorem 1 is straightforward based on the observation that the softmax layer can produce any desired PMF, and both the entropic vector and the normalized distance are continuous functions of the PMF. However, for the completeness of the paper, we provide the proof of Theorem 1 in
Appendix B.1. The proof can be described as follows. For a given target vector
, let
be one of the underlying PMFs that achieves
, i.e., the entropic vector with minimal normalized distance to
; then, we can find an NN which outputs
from a fixed constant, and
approximates
in
norm. By the continuity of the Shannon entropy ([
54],
Section 2.3), for fixed finite alphabets, we can prove that, as
approximates
in
norm, the associated entropic vector
approximates
in normalized distance as well.
Algorithm
In the rest of the section, we give a practical algorithm, i.e., Algorithm 1, to minimize the NN loss function, which is defined as the normalized distance interpreted in (
12), i.e.,
Algorithm 1 trains parameters
with a gradient descent method to find an NN that achieves
in Theorem 1, i.e., identify the entropic vector closest to the target and provide the corresponding underlying PMF. The overall architecture of the proposed method is depicted in
Figure 1. More specifically, at each iteration, the NN takes a constant as the input, and outputs a joint PMF
. By configuring the last layer of NN as the softmax layer in (
17), i.e.,
where
is the layer input, the resulting PMF
is valid in
. Subsequently, the associated entropic vector
is computed from
with (
2), (
3), and (
5). The normalized distance between the associated entropic vector and target vector is then evaluated with (
19). To optimize NN parameters
, the gradient descent method is performed in turn. The training procedures are iterated for a sufficiently large number
N.
Algorithm 1: Optimize PMFs for the associated entropic vector closest to the target vector via NN training |
|
We further implement Algorithm 1 and the NN model in
Figure 1 with CNNs, and details are presented in
Section 4. With the proposed implementation techniques, the convergence performances of Algorithm 1 for different targets are demonstrated in
Section 5.1. Other empirical results on various problems related to the entropic region are demonstrated in
Section 5.2 and
Section 5.3.
Before ending this section, in the following remark, we discuss the connections between our work with existing literature [
44,
45,
46], which focus on the neural optimization of distributions for problems in information theory as well.
Remark: Recently, the optimization of probability distributions utilizing generative NNs in information theory has been studied and applied to plenty of problems in [
44,
45,
46]. In [
44,
46], NNs are designed to generate the input distributions of given channels, and optimized to produce a distribution that achieves channel capacity, serving as a part of the joint estimation–optimization architecture for channel capacity. More specifically, over continuous spaces [
44], a model named neural distribution transformer (NDT) is proposed; the optimized NDT transfers uniformly distributed samples into data that are distributed according to the desired capacity-achieving distribution. Over discrete alphabets [
46], the PMF generator is proposed to optimize the PMF numerically, and capacity-achieving channel input data can be sampled from the optimized PMF. We note that [
46] focuses on channels with feedback and memory, and the PMF generator is based on a time-series deep reinforcement learning model. For channels without feedback and memory, the PMF generator in [
46] degenerates to a similar architecture as the one proposed in this paper. However, different from sampling data from generated PMFs, the method proposed in this paper directly computes entropic vectors from PMFs, and theoretical guarantees are provided in the context of this task. In [
45], NNs are also used to optimize conditional distributions that achieve the rate-distortion function for any given source distribution, in both continuous and discrete spaces. The PMF generator in [
45] takes source samples as input, and produces corresponding conditional PMFs. If the source is deterministic in [
45], the conditional PMF generator degenerates to the proposed model in this paper as well.
5. Empirical Results
In this section, empirical results of the method proposed in
Section 3 are presented, focusing on various problems related to
and
. Firstly, we compare our method with the existing random search algorithm [
26]. Secondly, we tackle the problems of optimizing the Ingleton score and Ingleton violation index. Lastly, we exploit the proposed method to obtain many entropic vectors that approximate the boundary of
, and use the convex hull of these entropic vectors to form an inner bound of
.
Before presenting these results, we first provide a brief introduction of regions within
. When
, the Ingleton inequality [
41] plays an essential role in the characterization of
, and one permutation instance of the Ingleton inequality is defined as follows.
Definition 5 (Ingleton inequality)
. For a vector , one specific permutation instance of the Ingleton inequality [41] is denoted as the inner product between the coefficients and Permuting , there exist six instances of the Ingleton inequality in total, which are denoted as . When
, the
linear rank region is the intersection of
with
(all six instances of Ingleton inequality), i.e.,
We consider vectors that represent rank functions of linear subspaces (i.e., linearly representable entropic vectors, or entropic vectors that yield rate regions achieved by linear network codes); then, the linear rank region is the closure of the linear hull of such vectors. The region
is a polyhedral convex cone, and is a tight linear inner bound of
[
8,
13,
60]. However,
is only completely characterized for
[
15,
17]. Nevertheless, when
, the relation
holds.
Hence, one is interested in characterizing the part of
excluding
(i.e., the almost-entropic vectors that cannot be linearly representable or yield rate regions, which cannot be achieved by linear network codes). We recall that
represents all six instances of the Ingleton inequality. We define
and the entropic part of
, i.e.,
We denote the closure of
as
; then, it is clear that
. By [
60] (Lemma 4), any ray in
exactly violates one instance of the Ingleton inequality, i.e.,
is the union of six symmetric regions, each region corresponds to and violates one instance of the Ingleton inequality, and the six regions are disjointed and share boundary points only. Thus, we may consider
and
when only one instance of the Ingleton inequality does not hold. More specifically, when
, i.e., the reversed version of (
21) holds, we denote
and
as
and
, respectively. Analogously, we can consider different instances of the Ingleton inequality to obtain other five outer regions symmetric to
and five regions symmetric to
.
We denote the closure of as ; then, in a word, the characterization of is reduced to .
The only extreme ray of
violating
(as well as the Zhang–Yeung inequality [
8]) is denoted as
, where
The region
is often called the
pyramid and is represented by 15 extreme rays, with
as the top and the other 14 extreme rays as bases (a complete list of these extreme rays is available in [
16] (Section XI)).
In the rest of this section, we focus on the bounded regions as Definition 2 illustrates with , i.e., each random variable has an equal alphabet size k. The above discussions still hold, as we denote the bounded versions as , and .
In the following remark, we present the implementation details of Algorithm 1 for experiments in this section.
Remark 1. In this section, we utilized Algorithm 1 with hyperparameters . Overall, the NN model followed from the configuration presented in Table 1 with and either one or three hidden layers. The NN parameters set ϕ was randomly initialized with the default method of PyTorch [59] (the initial parameters can be fixed by fixing the random seed), the optimizer was selected with Adam optimizer [61] with a learning rate γ tuned in the range of to (based on the number of hidden layers), and we selected the constant . The number of iterations N were tuned to be (however, the algorithm can converge with significantly smaller iterations in most tasks of this section). 5.1. Comparison with the Random Search Algorithm in [26]
When evaluating the performance of our method, the target was set as
in (
25), which is trivially
not entropic due to the Zhang–Yeung inequality [
8], i.e.,
in Theorem 1. We aimed to compare the returned normalized distance by Algorithm 1 with that obtained by the random search algorithm in [
26], which is the only existing method for verifying entropic vectors with general form of PMFs. Intuitively, if Algorithm 1 is capable of obtaining entropic vectors closer to
, we are able to approximate finer boundaries of
.
Figure 3 depicts the empirical results of our method. In addition to
(Target 1), two
entropic vectors in [
13] (Theorem 4, Target 2) and [
16] (Conjecture 1, Target 3), which belong to the region
, are set as targets, and the results are compared to [
26] (the numerical data of the results in [
26] are available in [
62]). The alphabet size of every random variable, i.e.,
k, is set to 4. In
Figure 3, the curves illustrate the convergence performances as the iterations of Algorithm 1 grow, while the shaded regions represent the range of final results obtained by several experiments in [
26].
From
Figure 3, it is evident that our method outperforms the random search algorithm proposed by [
26]. More specifically, for the non-entropic target (Target 1 in
Figure 3), the returned normalized distance using our method (
) is smaller than the smallest value obtained by [
26] (
). For entropic targets (Targets 2 and 3 in
Figure 3), our method returns negligible normalized distance as expected, thus successfully verifying these entropic targets. In addition, the normalized distances obtained by [
26] vary across different experiments, and even increase with larger iterations. For example, with Target 1 in
Figure 3, the method in [
26] returns a normalized distance of
with 1226 iterations; however, with 10,091 iterations, a larger normalized distance
is obtained [
26] (
Figure 1).
In contrast, the results obtained by our method are consistent within multiple experiments, empirically presenting better convergence performances than [
26].
5.2. Optimizing Ingleton Score and Ingleton Violation Index
When optimizing entropic vectors for
, it is of great interest to optimize the
Ingleton score and
Ingleton violation index. The Ingleton score was first proposed in [
42] (Conjecture 4.1) and later rigorously defined in [
16] (Definition 3).
Definition 6 (Ingleton score [
16] (Definition 3))
. Given a random vector , and an entropic vector , the Ingleton score induced by (21) is defined asThe Ingleton score is defined as due to permutation symmetry, and the optimization problem of is Similarly, the Ingleton violation index [
19] is defined as follows.
Definition 7 (Ingleton violation index [
19])
. Given a random vector , and an entropic vector , the Ingleton violation index induced by (21) is defined asDue to permutation symmetry, the Ingleton violation index is defined as , and the optimization problem of ι is One is interested in minimizing
or maximizing
over
, because their optimal values reveal the largest violations from (
21) among entropic vectors in
, and, thus, are crucial characterizations of
and
.
There is a rich history of optimizing
[
16,
21,
26,
35,
42] and
[
19,
26]. More specifically, for
, [
21] proposes a decomposition and two linear transformations on
. If we denote the composition of these techniques on
as
, then it is shown that
, i.e.,
is still almost entropic for any almost-entropic
. In addition, the Ingleton score optimized with
in [
21] is smaller than the one optimized with
, and refutes the four-atom conjecture in [
16] (please refer to [
21] (Section VIII) for details). Since
plays an important role in optimizing the Ingleton score and Ingleton violation index in this paper, for both completeness and clarity, we state the details of the techniques that compose
in
Appendix A for interested readers.
Thus, we are motivated to exploit our method to obtain a better Ingleton score and Ingleton violation index, when the objective
in Algorithm 1 is replaced by
,
, and
,
. Although Theorem 1 does not directly imply that the proposed method can be exploited to optimize the Ingleton score and Ingleton violation index, we claim that there always exists an NN that optimizes the Ingleton score and Ingleton violation index to the optimal values to any desired of accuracy as indicated by Theorems 2 and 3. The proofs of Theorems 2 and 3 are provided in
Appendix B.2 and
Appendix B.3, respectively.
Theorem 2. For the random vector with a fixed and finite alphabet , we recall the Ingleton score for entropic vectors defined in Definition 6. We consider the entropic vector , such that yields the optimal Ingleton score . Then, there exists an NN with parameters , such that is the associated entropic vector of valid in , and yields the corresponding Ingleton score , which is consistent with within any desired degree of accuracy , i.e.,where is the absolute value for scalars. Theorem 3. For the random vector with a fixed and finite alphabet , we recall the Ingleton violation index for entropic vectors defined in Definition 7. We consider the entropic vector , such that yields the optimal Ingleton violation index . Then, there exists an NN with parameters , such that is the associated entropic vector of valid in , and yields the corresponding Ingleton violation index , which is consistent with within any desired degree of accuracy , i.e.,where is the absolute value for scalars. The optimized numerical results are compared with the state-of-the-art values in
Table 2, and the alphabet size of every random variable is set to 4. In
Table 2, our results are rounded to ten decimal digits, and the last four digits are significant since the perturbations for small probability values may lead to large absolute errors of entropy functions and score results.
As seen in
Table 2, firstly, for the Ingleton score
, the upper bound
, which is optimized with
, is returned (with negligible improvement compared to
due to numerical stability). After the first discovery of this value [
34,
35], the method in [
26] obtained a close value of
as well, and, thus, we reconfirm this upper bound for the third time. Secondly, for the Ingleton violation index
, we particularly optimize it with
as well, and a new lower bound
is obtained, which beats the current best value of
. We observe that
actually measures the sine of the angle between the hyperplane
and
; thus, the obtained new lower bound finds an entropic vector with a larger violation angle from the Ingleton hyperplane, bringing us closer to the optimal
and the complete characterization of
and
.
The corresponding convergence performances for results in
Table 2 are presented in
Figure 4. From
Figure 4, one can verify the obtained results listed in
Table 2, and observe that the proposed method successfully converges to the best known Ingleton score when optimizing
, and converges to the Ingleton violation index, which exceeds the best-known results when optimizing
.
Remark 2. The pursuit of the Ingleton score appears to be difficult with no alternative strategies in the literature. In [26], the search area is constrained using a strategy called hyperplane score reduction with the minimization of normalized distance with relatively large iterations. Additionally, [35] reports that most experiments yield the value , while returning very occasionally, even with millions of searches with massive computation resources in parallel. Similarly, with our method, the direct minimization of yields the value . However, by selecting the entropic point from [13] as the initial point of Algorithm 1 (by setting it as the target and minimizing the normalized distance first), we reproduced the value in a single experiment, with less than 10,000 iterations (which approximately take 6 s). 5.3. Inner Bounding the Almost-Entropic Region
The complete characterization of
is difficult and open. However, as we find many entropic vectors, a linear inner bound is immediately obtained. For example, for an entropic vector
, if we regard
as the top vertex, then the convex hull of the top
and other 14 base vertices (refer to [
16] (Section XI) for a complete list of these extreme rays) yields an inner bound for
and
.
In this paper, following [
16] (Section XI), the quality of the inner bound is evaluated by the volume ratio. Let
denote the numerical inner bound obtained, and
denote the polytope volume computation operation; the volume ratio is defined as
Intuitively, needs to be closer to the boundary of to obtain a larger volume ratio R.
Because Algorithm 1 easily produces entropic vectors with large
and better
, we are inspired to exploit it to develop a new algorithm, i.e., Algorithm 2, to construct an inner bound with a larger volume ratio, by optimizing entropic vectors that violate other specific hyperplanes.
Algorithm 2: Construct inner bounds for the almost-entropic region with Algorithm 1 |
|
Algorithm 2 can be interpreted as follows. First, starting from the 14 base entropic vectors of , i.e., , we construct an initial inner bound with (an) optimized entropic vector(s), i.e., , the initial inner bound can be represented by the set of vertices . By transferring the V-representation of the initial inner bound with H-representation, we obtain many hyperplanes with the set of coefficients . Then, for each of these hyperplanes, we minimize their violation score similar to the Ingleton score using Algorithm 2, i.e., we find entropic vectors with large violations from the initial inner bound (violating as well). Now, we have extended the vertices of the initial inner bound from to , and expanded the initial inner bound to an inner bound with a larger volume ratio. This procedure can be repeated for inner bounds with larger volume ratios.
Remark 3. Since , by Theorem 2 and its proof in Appendix B.2, the theoretical guarantee for Algorithm 2, i.e., minimizing the hyperplane score exploiting Algorithm 1, immediately follows. We present the volume ratio of the inner bound obtained with Algorithm 2 in
Table 3, comparing it with all existing results. For reference, the volume ratio of the outer bound obtained with the non-Shannon-type inequalities in [
16] (Section VII) (not tight) was
, and the volume of the trivial inner bound obtained with
was 0. From
Table 3, one can see that the volume ratio of the inner bound obtained by our method is larger than existing results. Comparing with the state-of-the-art result [
26], the ratio
is improved to
, leading to the current best inner bound of
measured by volume ratio. Thus, our proposed method takes another step forward in the complete characterization of
.
Remark 4. Here, we give details on how we obtained the two results listed in Table 3. For the result, the set consisted of three entropic vectors that yielded the Ingleton score , the Ingleton violation indices and in Table 2, respectively, and the entropic vector that yielded the Ingleton score , as discussed in Remark 2. Furthermore, after iteration of Algorithm 2, we obtained 152 vertices. After the convex hull operation, there were 121 entropic vectors, which gave us an inner bound with a volume ratio of . For the result, the set consisted of one entropic vector that yielded the Ingleton score in Table 2. Furthermore, after iterations of Algorithm 2, we obtained 2585 vertices, and this inner bound yielded an estimated (as illustrated in Remark 5) volume ratio of . Remark 5. There are several algorithms available for polytope volume computation, as listed in [63]; we chose “Qhull” [64,65] and “lrs” [66,67], which only require the V-representation of a polytope, while providing the convex hull operation as well. However, all existing numerical methods become computationally intractable with high dimension and a large number of vertices (details of their computation complexities can be found in [65,67]). In our case, “Qhull” was able to compute the result from 152 entropic vectors in , but became intractable when computing another result where the number of entropic vectors was 2585. Nevertheless, “lrs” provides a function called “Estimate”, which allowed us to estimate the computational time and volume result, without diving into the complete computation process. In this manner, it is estimated that the volume ratio computed from 2585 entropic vectors is . The estimation process took several days, and it is suggested that the complete computation will take more than one year. 6. Conclusions and Discussion
This paper introduced a novel architecture to parameterize and optimize PMFs to verify entropic vectors with NNs. Given a target vector, an algorithm is proposed to identify the entropic vector closest to it via NN training. Empirical results demonstrate smaller normalized distances, and improved convergence performances. Optimized Ingleton scores are presented, and a new lower bound on the Ingleton violation index was obtained. A state-of-the-art inner bound of
was constructed, which was measured by the volume ratio. However, there exist computation burdens in the proposed algorithms. Although Algorithm 1 achieves a larger feasible alphabet size than previous works, its efficiency is still constrained by the alphabet size, as
Figure 2 shows. Algorithm 2, which requires auxiliary algorithms to calculate the polytope volume, is limited by the auxiliary algorithms due to high dimensionality and the large number of entropic vectors, as Remark 5 discusses.
Future work includes developing
a computer-aided approach to construct achievable schemes for network coding problems using the proposed method. The
Linear Programming bound [
68], which yields Shannon outer bounds for rate regions of network coding problems, is not tight in general due to
. However, with Algorithm 1, the corner points obtained by the Linear Programming bound can be verified to be entropic or not. Furthermore, if a corner point is verified to be entropic, the returned PMF indicates the coding scheme. Similar arguments have been raised in [
26] as well, where the network instances are simple, and the largest alphabet size is small. Nevertheless, due to smaller and more consistent normalized distances depicted in
Figure 3 and the feasibility for large alphabets discussed in
Section 4.2, we believe that our proposed method can be applied to network coding problems with larger alphabet sizes, and/or more random variables.