1. Introduction
This paper concerns the union-closed sets conjecture which is described in the information-theoretic language as follows. For that purpose, every set is uniquely described by an n-length sequence with such that if and otherwise. So, a family of subsets of uniquely corresponds to a subset . Denote the (element-wise) OR operation for two finite -valued sequences as with , where ∨ is the OR operation. The family is closed under the union operation (i.e., ) if and only if the corresponding set is closed under the OR operation (i.e., ).
Let
be closed under the OR operation. Let
be a random vector uniformly distributed on
A and denote
as its distribution (or probability mass function, PMF). We are interested in estimating
where
is the distribution of
and, hence,
is the proportion of the sets containing the element
i among all sets in
. Frankl made the following conjecture.
Conjecture 1 (Frankl Union-Closed Sets Conjecture). for any OR-closed set A.
This conjecture equivalently states that, for any union-closed family
, there exists an element contained in at least a proportion
of the sets of
. Since the union-closed conjecture was posed by Peter Frankl in 1979, it has attracted a great deal of research interest; see, e.g., [
1,
2,
3,
4,
5]. We refer readers to the survey paper [
6] for more details. Gilmer [
7] made a breakthrough recently, showing that this conjecture holds with constant
. His method used a clever idea from information theory in which two independent random vectors were constructed. It was conjectured by Gilmer that his method can improve the constant to
, which is now confirmed by several groups of researchers [
8,
9,
10,
11]. This constant is shown to be the best for an approximate version of the union-closed sets problem [
9]. Moreover, Sawin [
8] further developed Gilmer’s idea by allowing the two random vectors to depend on each other. In fact, the same idea was previously used by the present author in several works [
12,
13,
14]. By this technique, Sawin [
8] showed that the constant can be improved to a value that is strictly larger than
. However, without cardinality bounds on auxiliary random variables, Sawin’s constant is difficult to compute, hence the accurate value of this improved constant is not explicitly given in [
8].
The present paper further develops Gilmer’s (or Sawin’s) technique to derive new constants (or bounds) in the optimization form for the union-closed sets conjecture. These bounds include Sawin’s improvement as a special case. By providing cardinality bounds on auxiliary random variables, we make Sawin’s improvement computable and then evaluate it numerically which yields a bound approximately , slightly better than .
2. Main Results
To state our result, we need to introduce some notations. Since we only consider distributions on finite alphabets, we do not distinguish between the terms “distributions” and “probability mass functions”. For a pair of distributions , a coupling of is a joint distribution whose marginals are, respectively, . For a distribution defined on a finite alphabet , a coupling of is called symmetric if for all . Denote as the set of symmetric couplings of . Denote as the Dirac measure with a single atom at x. That is, the PMF of this measure takes the value 1 at x and takes the value 0 at other points.
For a joint distribution
, the (Pearson) correlation coefficient between
is defined by
The maximal correlation between
is defined by
where the supremum is taken over all pairs of real-valued functions
such that
. Note that
and, moreover,
if and only if
are independent. Moreover,
is equal to the second largest singular value of the matrix
; see, e.g., [
15]. Clearly, the largest singular value of the matrix
is equal to 1 with corresponding eigenvectors
and
.
Denote for
,
and
where
denotes the median value of elements in a multiset
A. We regard the set in (
1) as a multiset which means
. Denote
for
as the binary entropy function. Define for
,
where the supremum over
and the infimum over
are both taken over all finitely supported probability distributions on
.
Our main results are as follows.
Theorem 1. If for some , then for any OR-closed (i.e., for any union-closed family , there exists an element contained in at least a proportion t of the sets of ).
The proof of Theorem 1 is given in
Section 2 by using a technique based on coupling and entropy. It is essentially the same as the technique used by Sawin [
8]. Prior to Sawin’s work, such a technique was used by the present author in several works; see [
12,
13,
14].
Equivalently, Theorem 1 states that
for any OR-closed
, where
To compute
numerically, it is required to upper bound the cardinality of the support of
in the outer infimum in (
2) since, otherwise, infinitely many parameters are needed to optimize. This is left to be done in a future work. We next provides a computable bound, which is a lower bound of
, instead
itself.
If we choose
, then Theorem 1 implies Gilmer’s bound in [
7] since, for this case, the couplings constructed in the proof of Theorem 1 (given in the next section) turn out to be independent, coinciding with Gilmer’s construction. On the other hand, if we choose
, then the couplings constructed in our proof are arbitrary. In fact, we can make a choice of
better than these two special cases. As suggested by Sawin [
8], we can choose
which in fact leads to an optimization over mixtures of independent couplings and arbitrary couplings. This final choice yields the following bound.
Substituting
and 1, respectively, into
yields
where, in the evaluation of
the following facts were used: (1)
for all
; (2) if
, then
and otherwise,
By defining
and substituting
into Theorem 1, one obtains the following simpler bound.
Proposition 1. For ,where the infimum is taken over all distributions of the form withand or such that . (Note that if and only if is a convex combination of , , , and .) Here,with denoting the Dirac measure at (whose PMF takes the value 1 at and takes the value 0 at other points). As a consequence of the two results above, we have the following corollary.
Corollary 1. If for some , then for any OR-closed .
The proof of Corollary 1 is given in
Section 3.
The lower bound in (
5) without the cardinality bound on the support of
was given by Sawin [
8], which was used to show
However, thanks to the cardinality bound, we can numerically compute the best bound on
that can be derived using
. That is,
for any OR-closed
, where
Numerical results show that if we set
, then the optimal
with
and
which leads to the lower bound
. Hence,
for any OR-closed
. This is slightly better than the previous bound
. The choice of
in our evaluation is nearly optimal. Our code can be found on the author’s homepage
https://leiyudotscholar.wordpress.com/ (accessed on 1 May 2023.) More decimal places of Sawin’s bound (or equivalently,
) were computed by Cambie in a concurrent work [
16], i.e.,
which is attained by the choice
. This more precise evaluation can be also verified using our code above.
3. Proof of Theorem 1
Denote as the Shannon entropy of a random variable . Let be closed under the OR operation. We assume . This is because Theorem 1 holds obviously for singletons A, since for this case, . Let . So, and, by the chain rule, .
If
, then
a.s. where
. So, we have
We hence have
If
, then
. Relaxing
to arbitrary distributions such that
, we obtain
where
In other words, if given
t,
, then, by contradiction,
.
We next show that which implies Theorem 1. To this end, we need the following lemmas.
For two conditional distributions
, denote
as the set of conditional distributions
such that their marginals satisfy
. The conditional (Pearson) correlation coefficient of
X and
Y given
U is defined by
The conditional maximal correlation coefficient of
X and
Y given
U is defined by
where the supremum is taken over all real-valued functions
such that
,
. It has been shown in [
17] that
where
with
.
Lemma 1 (Product Construction of Couplings).
Lemma 9 in [12], Corollary 3 in [17], and Lemma 6 in [18] For any conditional distributions and anyit holds thatMoreover, for , it holds that For a conditional distribution
defined on finite alphabets, a conditional coupling
of
is called symmetric if
for all
. Denote
as the set of symmetric conditional couplings of
. Applying the lemma above to symmetric couplings, we have that if couplings
satisfy
for some
, then
with
. We hence have that, for any
,
where the first inequality above follows by Lemma 1 and the chain rule for entropies. In fact, in the derivation above, the
i-th distribution
is chosen as a greedy coupling in the sense that it only maximizes the
i-th objective function
, regardless of other
with
(although it indeed affects their values).
By the fact that conditioning reduces entropy, it holds that
Denote
Then, the expression at the right-hand side of (
11) is further lower bounded by
. Combing this with (
8) and (
11), and by noting that
is arbitrary, we obtain that
where
(
13) follows since
for
, and
implies
is a deterministic function of
and, hence,
;
The index
j in the last line is the optimal
i attaining the minimum in (
13).
Denote
, and
. Then,
We next further simplify the lower bound in (
14). Denote
So,
with
Note that
where
,
denotes the Pearson correlation coefficient and (
16) follows since the maximal correlation coefficient between two binary random variables is equal to the absolute value of the Pearson correlation coefficient between them; see, e.g., [
19]. So,
is equivalent to
a.s. and also equivalent to
a.s.
The inner supremum in (
14) can be rewritten as
By the fact that
h is increasing on
and decreasing on
, it holds that the optimal
r attaining the supremum in the last line above, denoted by
, is the median of
,
, and
, which implies
Recall the definition of
in (
1). So, the inner supremum in (
14) is equal to
.
We make the following observations. Firstly,
Secondly, by the definition of maximal correlation,
holds (which is known as the data processing inequality) since
are, respectively, functions of
; see (
15). Lastly, observe that
is symmetric and
are obtained from
via the same function
(since
holds by the symmetry of
). Hence,
is symmetric as well. Substituting all of these into (
14) yields
. □
5. Discussion
The breakthrough made by Gilmer [
7] shows the power of information-theoretic techniques in tackling problems in related fields. In fact, the union-closed sets conjecture has a natural interpretation in the information-theoretic (or coding-theoretic) sense. Consider the memoryless OR multi-access channel
. We would like to find a nonempty code
to generate two independent inputs
with each following
such that the input constraint
is satisfied and the output
is still in
A a.s. The union-closed sets conjecture states that such a code exists if and only if
. Based on this information-theoretic interpretation, it is reasonable to see that the information-theoretic techniques work for this conjecture. It is well-known that information-theoretic techniques usually work very well for problems with “approximate” constraints, e.g., the channel coding problem with the asymptotically vanishing error probability constraint (or the approximate version of the union-closed sets problem introduced in [
9]). It is still unclear whether information-theoretic techniques are sufficient to prove sharp bounds for problems with “exact” constraints, e.g., the zero-error coding problem (or the original version of the union-closed sets conjecture).
Furthermore, as an intermediate result, it has been shown that
implies
for any OR-closed
. Here
is given in (
8), expressed in the multi-letter form (i.e., the dimension-dependent form). By the super-block coding argument, it is verified that, given
,
exists. It is interesting to investigate this limit and prove a single-letter (dimension-independent) expression for it.
For simplicity, in this paper, we only consider the maximal correlation coefficient as the constraint function. In fact, the maximal correlation coefficient used here can be replaced by other functionals. The key property of the maximal correlation coefficient we used in this paper is the “tensorization” property, i.e., (
10) (in fact, only “≤” part of (
10) was used in our proof). In the literature, there is a class of measures of correlation satisfying this property, e.g., the hypercontractivity constant, strong data processing inequality constant, or, more generally,
-ribbons, see [
20,
21,
22]. (Although the tensorization property in the literature is only defined and proven for independent random variables, this property can be extended to the coupling constructed in (
9)). Following the same proof steps given in this paper, one can obtain various variants of Theorem 1 with the maximal correlation coefficient replaced by other quantities, as long as these quantities satisfy the tensorization property. Another potential direction is to replace the Shannon entropy with a class of more general quantities, Rényi entropies. However, unfortunately Rényi entropies do not satisfy the chain rule (unlike the Shannon entropy), which leads to a serious difficulty in single-letterizing the corresponding multi-letter bound such as
in (
8) (i.e., in making the multi-letter bound dimension-independent).