Dimension-Free Bounds for the Union-Closed Sets Conjecture

Yu, Lei

doi:10.3390/e25050767

Open AccessArticle

Dimension-Free Bounds for the Union-Closed Sets Conjecture

by

Lei Yu

School of Statistics and Data Science, The Key Laboratory of Pure Mathematics and Combinatorics (LPMC), Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin (KLMDASR), and Laboratory for Economic Behaviors and Policy Simulation (LEBPS), Nankai University, Tianjin 300071, China

Entropy 2023, 25(5), 767; https://doi.org/10.3390/e25050767

Submission received: 12 February 2023 / Revised: 5 May 2023 / Accepted: 5 May 2023 / Published: 8 May 2023

(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Download Review Reports Versions Notes

Abstract

:

The union-closed sets conjecture states that, in any nonempty union-closed family

F

of subsets of a finite set, there exists an element contained in at least a proportion

1 / 2

of the sets of

F

. Using an information-theoretic method, Gilmer recently showed that there exists an element contained in at least a proportion

0.01

of the sets of such

F

. He conjectured that their technique can be pushed to the constant

\frac{3 - \sqrt{5}}{2}

which was subsequently confirmed by several researchers including Sawin. Furthermore, Sawin also showed that Gilmer’s technique can be improved to obtain a bound better than

\frac{3 - \sqrt{5}}{2}

but this new bound was not explicitly given by Sawin. This paper further improves Gilmer’s technique to derive new bounds in the optimization form for the union-closed sets conjecture. These bounds include Sawin’s improvement as a special case. By providing cardinality bounds on auxiliary random variables, we make Sawin’s improvement computable and then evaluate it numerically, which yields a bound approximately

0.38234

, slightly better than

\frac{3 - \sqrt{5}}{2} \approx 0.38197

.

Keywords:

union-closed sets conjecture; information-theoretic method; coupling

1. Introduction

This paper concerns the union-closed sets conjecture which is described in the information-theoretic language as follows. For that purpose, every set

B \subseteq [n] : = {1, 2, \dots, n}

is uniquely described by an n-length sequence

x^{n} : = (x_{1}, x_{2}, \dots, x_{n}) \in Ω^{n}

with

Ω : = {0, 1}

such that

x_{i} = 1

if

i \in B

and

x_{i} = 0

otherwise. So, a family

F

of subsets of

[n]

uniquely corresponds to a subset

A \subseteq Ω^{n}

. Denote the (element-wise) OR operation for two finite

Ω

-valued sequences as

x^{n} \lor y^{n} : = {(x_{i} \lor y_{i})}_{i \in [n]}

with

x^{n}, y^{n} \in Ω^{n}

, where ∨ is the OR operation. The family

F

is closed under the union operation (i.e.,

F \cup G \in F, \forall F, G \in F

) if and only if the corresponding set

A \subseteq Ω^{n}

is closed under the OR operation (i.e.,

x^{n} \lor y^{n} \in A, \forall x^{n}, y^{n} \in A

).

Let

A \subseteq Ω^{n}

be closed under the OR operation. Let

X^{n} : = (X_{1}, X_{2}, \dots, X_{n})

be a random vector uniformly distributed on A and denote

P_{X^{n}} = Unif (A)

as its distribution (or probability mass function, PMF). We are interested in estimating

p_{A} : = max_{i \in [n]} P_{X_{i}} (1)

where

P_{X_{i}}

is the distribution of

X_{i}

and, hence,

P_{X_{i}} (1)

is the proportion of the sets containing the element i among all sets in

F

. Frankl made the following conjecture.

Conjecture 1

(Frankl Union-Closed Sets Conjecture).

p_{A} \geq 1 / 2

for any OR-closed set A.

This conjecture equivalently states that, for any union-closed family

F

, there exists an element contained in at least a proportion

1 / 2

of the sets of

F

. Since the union-closed conjecture was posed by Peter Frankl in 1979, it has attracted a great deal of research interest; see, e.g., [1,2,3,4,5]. We refer readers to the survey paper [6] for more details. Gilmer [7] made a breakthrough recently, showing that this conjecture holds with constant

0.01

. His method used a clever idea from information theory in which two independent random vectors were constructed. It was conjectured by Gilmer that his method can improve the constant to

\frac{3 - \sqrt{5}}{2}

, which is now confirmed by several groups of researchers [8,9,10,11]. This constant is shown to be the best for an approximate version of the union-closed sets problem [9]. Moreover, Sawin [8] further developed Gilmer’s idea by allowing the two random vectors to depend on each other. In fact, the same idea was previously used by the present author in several works [12,13,14]. By this technique, Sawin [8] showed that the constant can be improved to a value that is strictly larger than

\frac{3 - \sqrt{5}}{2}

. However, without cardinality bounds on auxiliary random variables, Sawin’s constant is difficult to compute, hence the accurate value of this improved constant is not explicitly given in [8].

The present paper further develops Gilmer’s (or Sawin’s) technique to derive new constants (or bounds) in the optimization form for the union-closed sets conjecture. These bounds include Sawin’s improvement as a special case. By providing cardinality bounds on auxiliary random variables, we make Sawin’s improvement computable and then evaluate it numerically which yields a bound approximately

0.38234

, slightly better than

\frac{3 - \sqrt{5}}{2} \approx 0.38197

.

2. Main Results

To state our result, we need to introduce some notations. Since we only consider distributions on finite alphabets, we do not distinguish between the terms “distributions” and “probability mass functions”. For a pair of distributions

(P_{X}, P_{Y})

, a coupling of

(P_{X}, P_{Y})

is a joint distribution

P_{X Y}

whose marginals are, respectively,

P_{X}, P_{Y}

. For a distribution

P_{X}

defined on a finite alphabet

X

, a coupling

P_{X X^{'}}

of

(P_{X}, P_{X})

is called symmetric if

P_{X X^{'}} (x, y) = P_{X X^{'}} (y, x)

for all

x, y \in X

. Denote

C_{s} (P_{X})

as the set of symmetric couplings of

(P_{X}, P_{X})

. Denote

δ_{x}

as the Dirac measure with a single atom at x. That is, the PMF of this measure takes the value 1 at x and takes the value 0 at other points.

For a joint distribution

P_{X Y}

, the (Pearson) correlation coefficient between

(X, Y) \sim P_{X Y}

is defined by

ρ_{p} (X; Y) : = \{\begin{matrix} \frac{Cov (X, Y)}{\sqrt{Var (X) Var (Y)}}, & Var (X) Var (Y) > 0 \\ 0, & Var (X) Var (Y) = 0 \end{matrix} .

The maximal correlation between

(X, Y) \sim P_{X Y}

is defined by

\begin{matrix} ρ_{m} (X; Y) & : = sup_{f, g} ρ_{p} (f (X); g (Y)) \\ = sup_{f, g} \{\begin{matrix} \frac{Cov (f (X), g (Y))}{\sqrt{Var (f (X)) Var (g (Y))}}, & Var (f (X)) Var (g (Y)) > 0 \\ 0, & Var (f (X)) Var (g (Y)) = 0 \end{matrix}, \end{matrix}

where the supremum is taken over all pairs of real-valued functions

(f, g)

such that

Var (f (X)), Var (g (Y)) < \infty

. Note that

ρ_{m} (X; Y) \in [0, 1]

and, moreover,

ρ_{m} (X; Y) = 0

if and only if

X, Y

are independent. Moreover,

ρ_{m} (X; Y)

is equal to the second largest singular value of the matrix

{[\frac{P_{X Y} (x, y)}{\sqrt{P_{X} (x) P_{Y} (y)}}]}_{(x, y)}

; see, e.g., [15]. Clearly, the largest singular value of the matrix

{[\frac{P_{X Y} (x, y)}{\sqrt{P_{X} (x) P_{Y} (y)}}]}_{(x, y)}

is equal to 1 with corresponding eigenvectors

{(\sqrt{P_{X} (x)})}_{x}

and

{(\sqrt{P_{Y} (y)})}_{y}

.

Denote for

p, q, ρ \in [0, 1]

,

\begin{matrix} z_{1} & : = p q - ρ \sqrt{p (1 - p) q (1 - q)} \\ z_{2} & : = p q + ρ \sqrt{p (1 - p) q (1 - q)} \end{matrix}

and

φ (ρ, p, q) : = median \{max {p, q, p + q - z_{2}}, 1 / 2, min {p + q, p + q - z_{1}}\},

(1)

where

median (A)

denotes the median value of elements in a multiset A. We regard the set in (1) as a multiset which means

median {a, a, b} = a

. Denote

h (a) = - a {log}_{2} a - (1 - a) {log}_{2} (1 - a)

for

a \in [0, 1]

as the binary entropy function. Define for

t > 0

,

Γ (t) : = sup_{P_{ρ}} inf_{P_{p} : E h (p) > 0, E p \leq t} E_{ρ} [inf_{P_{p q} \in C_{s} (P_{p}) : ρ_{m} (p; q) \leq ρ} \frac{E_{p, q} h (φ (ρ, p, q))}{E h (p)}],

(2)

where the supremum over

P_{ρ}

and the infimum over

P_{p}

are both taken over all finitely supported probability distributions on

[0, 1]

.

Our main results are as follows.

Theorem 1.

If

Γ (t) > 1

for some

t \in (0, 1 / 2)

, then

p_{A} \geq t

for any OR-closed

A \subseteq Ω^{n}

(i.e., for any union-closed family

F

, there exists an element contained in at least a proportion t of the sets of

F

).

The proof of Theorem 1 is given in Section 2 by using a technique based on coupling and entropy. It is essentially the same as the technique used by Sawin [8]. Prior to Sawin’s work, such a technique was used by the present author in several works; see [12,13,14].

Equivalently, Theorem 1 states that

p_{A} \geq t_{sup}

for any OR-closed

A \subseteq Ω^{n}

, where

t_{sup} : = sup {t \in (0, 1 / 2) : Γ (t) > 1} .

To compute

Γ (t)

numerically, it is required to upper bound the cardinality of the support of

P_{p}

in the outer infimum in (2) since, otherwise, infinitely many parameters are needed to optimize. This is left to be done in a future work. We next provides a computable bound, which is a lower bound of

Γ (t)

, instead

Γ (t)

itself.

If we choose

P_{ρ} = δ_{0}

, then Theorem 1 implies Gilmer’s bound in [7] since, for this case, the couplings constructed in the proof of Theorem 1 (given in the next section) turn out to be independent, coinciding with Gilmer’s construction. On the other hand, if we choose

P_{ρ} = δ_{1}

, then the couplings constructed in our proof are arbitrary. In fact, we can make a choice of

P_{ρ}

better than these two special cases. As suggested by Sawin [8], we can choose

P_{ρ} = (1 - α) δ_{0} + α δ_{1}

which in fact leads to an optimization over mixtures of independent couplings and arbitrary couplings. This final choice yields the following bound.

Substituting

ρ = 0

and 1, respectively, into

φ (ρ, p, q)

yields

\begin{matrix} φ (0, p, q) & = p + q - p q, \end{matrix}

(3)

\begin{matrix} φ (1, p, q) & = median \{max {p, q}, 1 / 2, p + q\}, \end{matrix}

(4)

where, in the evaluation of

φ (1, p, q),

the following facts were used: (1)

p + q - p q - \sqrt{p (1 - p) q (1 - q)} \leq max {p, q}

for all

p, q \in [0, 1]

; (2) if

p + q \leq 1

, then

p + q - p q + \sqrt{p (1 - p) q (1 - q)} \geq p + q,

and otherwise,

1 / 2 < max {p, q} \leq p + q - p q + \sqrt{p (1 - p) q (1 - q)} .

By defining

\begin{matrix} g (P_{p q}, α) & : = (1 - α) E_{(p, q) \sim P_{p}^{\otimes 2}} [h (p + q - p q)] + α E_{(p, q) \sim P_{p q}} [h (φ (1, p, q))] \end{matrix}

and substituting

P_{ρ} = (1 - α) δ_{0} + α δ_{1}

into Theorem 1, one obtains the following simpler bound.

Proposition 1.

For

t \in (0, 1 / 2)

,

Γ (t) \geq \hat{Γ} (t) : = sup_{α \in [0, 1]} inf_{symmetric P_{p q} : E h (p) > 0} \frac{g (P_{p q}, α)}{E h (p)},

(5)

where the infimum is taken over all distributions

P_{p q}

of the form

(1 - β) Q_{a_{1}, a_{2}} + β Q_{b_{1}, b_{2}}

with

0 \leq a : = \frac{a_{1} + a_{2}}{2} \leq t < b : = \frac{b_{1} + b_{2}}{2} \leq 1

(6)

and

β = 0

or

β = \frac{t - a}{b - a} > 0

such that

E h (p) > 0

. (Note that

E h (p) = 0

if and only if

P_{p q}

is a convex combination of

δ_{(0, 0)}

,

δ_{(0, 1)}

,

δ_{(1, 0)}

, and

δ_{(1, 1)}

.) Here,

Q_{x, y} : = \frac{1}{2} δ_{(x, y)} + \frac{1}{2} δ_{(y, x)}

(7)

with

δ_{(x, y)}

denoting the Dirac measure at

(x, y)

(whose PMF takes the value 1 at

(x, y)

and takes the value 0 at other points).

As a consequence of the two results above, we have the following corollary.

Corollary 1.

If

\hat{Γ} (t) > 1

for some

t \in (0, 1 / 2)

, then

p_{A} \geq t

for any OR-closed

A \subseteq Ω^{n}

.

The proof of Corollary 1 is given in Section 3.

The lower bound in (5) without the cardinality bound on the support of

P_{p q}

was given by Sawin [8], which was used to show

p_{A} > \frac{3 - \sqrt{5}}{2} .

However, thanks to the cardinality bound, we can numerically compute the best bound on

p_{A}

that can be derived using

\hat{Γ} (t)

. That is,

p_{A} \geq {\hat{t}}_{sup}

for any OR-closed

A \subseteq Ω^{n}

, where

{\hat{t}}_{sup} : = sup {t \in (0, 1 / 2) : \hat{Γ} (t) > 1} .

Numerical results show that if we set

α = 0.035, t = 0.38234

, then the optimal

P_{p q} = (1 - β) Q_{a, a} + β Q_{a, 1}

with

a \approx 0.3300622

and

β \approx 0.1560676

which leads to the lower bound

\hat{Γ} (t) \geq 1.00000889

. Hence,

p_{A} \geq 0.38234

for any OR-closed

A \subseteq Ω^{n}

. This is slightly better than the previous bound

\frac{3 - \sqrt{5}}{2} \approx 0.38197

. The choice of

(α, t)

in our evaluation is nearly optimal. Our code can be found on the author’s homepage https://leiyudotscholar.wordpress.com/ (accessed on 1 May 2023.) More decimal places of Sawin’s bound (or equivalently,

{\hat{t}}_{sup}

) were computed by Cambie in a concurrent work [16], i.e.,

0.382345533366702 \leq {\hat{t}}_{sup} \leq 0.382345533366703

which is attained by the choice

α \approx 0.03560698136437784

. This more precise evaluation can be also verified using our code above.

3. Proof of Theorem 1

Denote

H (X) = - \sum_{x} P_{X} (x) log P_{X} (x)

as the Shannon entropy of a random variable

X \sim P_{X}

. Let

A \subseteq Ω^{n}

be closed under the OR operation. We assume

| A | \geq 2

. This is because Theorem 1 holds obviously for singletons A, since for this case,

p_{A} = 1

. Let

P_{X^{n}} = Unif (A)

. So,

H (X^{n}) > 0

and, by the chain rule,

H (X^{n}) = \sum_{i = 1}^{n} H (X_{i} | X^{i - 1})

.

If

P_{X^{n} Y^{n}} \in C_{s} (P_{X^{n}})

, then

Z^{n} : = X^{n} \lor Y^{n} \in A

a.s. where

(X^{n}, Y^{n}) \sim P_{X^{n} Y^{n}}

. So, we have

H (Z^{n}) \leq log | A | = H (X^{n}) .

We hence have

sup_{P_{X^{n} Y^{n}} \in C_{s} (P_{X^{n}})} \frac{H (Z^{n})}{H (X^{n})} \leq 1 .

If

p_{A} \leq t

, then

P_{X_{i}} (1) \leq t, \forall i \in [n]

. Relaxing

P_{X^{n}} = Unif (A)

to arbitrary distributions such that

P_{X_{i}} (1) \leq t

, we obtain

Γ_{n} (t) \leq 1

where

Γ_{n} (t) : = inf_{P_{X^{n}} : P_{X_{i}} (1) \leq t, \forall i} sup_{P_{X^{n} Y^{n}} \in C_{s} (P_{X^{n}})} \frac{H (Z^{n})}{H (X^{n})} .

(8)

In other words, if given t,

Γ_{n} (t) > 1

, then, by contradiction,

p_{A} > t

.

We next show that

Γ_{n} (t) \geq Γ (t)

which implies Theorem 1. To this end, we need the following lemmas.

For two conditional distributions

P_{X | U}, P_{Y | V}

, denote

C (P_{X | U}, P_{Y | V})

as the set of conditional distributions

Q_{X Y | U V}

such that their marginals satisfy

Q_{X | U V} = P_{X | U}, Q_{Y | U V} = P_{Y | V}

. The conditional (Pearson) correlation coefficient of X and Y given U is defined by

ρ_{p} (X; Y | U) = \{\begin{matrix} \frac{E [cov (X, Y | U)]}{\sqrt{E [var (X | U)]} \sqrt{E [var (Y | U)]}}, & E [var (X | U)] E [var (Y | U)] > 0, \\ 0, & E [var (X | U)] E [var (Y | U)] = 0 . \end{matrix}

The conditional maximal correlation coefficient of X and Y given U is defined by

ρ_{m} (X; Y | U) = sup_{f, g} ρ_{p} (f (X, U); g (Y, U) | U),

where the supremum is taken over all real-valued functions

f (x, u), g (y, u)

such that

E [var (f (X, U) | U)]

,

E [var (g (Y, U) | U)] < \infty

. It has been shown in [17] that

ρ_{m} (X; Y | U) = sup_{u : P_{U} (u) > 0} ρ_{m} (X; Y | U = u),

where

ρ_{m} (X; Y | U = u) = ρ_{m} (X^{'}; Y^{'})

with

(X^{'}, Y^{'}) \sim P_{X Y | U = u}

.

Lemma 1

(Product Construction of Couplings). Lemma 9 in [12], Corollary 3 in [17], and Lemma 6 in [18] For any conditional distributions

P_{X_{i} | X^{i - 1}}, P_{Y_{i} | Y^{i - 1}}, i \in [n]

and any

Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} \in C (P_{X_{i} | X^{i - 1}}, P_{Y_{i} | Y^{i - 1}}), \forall i \in [n],

it holds that

\begin{matrix} \prod_{i = 1}^{n} Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} & \in C (\prod_{i = 1}^{n} P_{X_{i} | X^{i - 1}}, \prod_{i = 1}^{n} P_{Y_{i} | Y^{i - 1}}) . \end{matrix}

(9)

Moreover, for

(X^{n}, Y^{n}) \sim \prod_{i = 1}^{n} Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}}

, it holds that

ρ_{m} (X^{n}; Y^{n}) = max_{i \in [n]} ρ_{m} (X_{i}; Y_{i} | X^{i - 1}, Y^{i - 1}) .

(10)

For a conditional distribution

P_{X | U}

defined on finite alphabets, a conditional coupling

P_{X X^{'} | U U^{'}}

of

(P_{X | U}, P_{X | U})

is called symmetric if

P_{X X^{'} | U U^{'}} (x, y | u, v) = P_{X X^{'} | U U^{'}} (y, x | v, u)

for all

x, y \in X, u, v \in U

. Denote

C_{s} (P_{X | U})

as the set of symmetric conditional couplings of

(P_{X | U}, P_{X | U})

. Applying the lemma above to symmetric couplings, we have that if couplings

Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} \in C_{s} (P_{X_{i} | X^{i - 1}})

satisfy

ρ_{m} (X_{i}; Y_{i} | X^{i - 1}, Y^{i - 1}) \leq ρ

for some

ρ > 0

, then

\begin{matrix} \prod_{i = 1}^{n} Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} & \in C_{s} (\prod_{i = 1}^{n} P_{X_{i} | X^{i - 1}}), \\ ρ_{m} (X^{n}; Y^{n}) & \leq ρ, \end{matrix}

with

(X^{n}, Y^{n}) \sim \prod_{i = 1}^{n} Q_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}}

. We hence have that, for any

ρ \in [0, 1]

,

\begin{matrix} sup_{\begin{matrix} P_{X^{n} Y^{n}} \in C_{s} (P_{X^{n}}) : \\ ρ_{m} (X^{n}; Y^{n}) \leq ρ \end{matrix}} H (Z^{n}) \\ \geq sup_{\begin{matrix} P_{X^{n - 1} Y^{n - 1}} \in C_{s} (P_{X^{n - 1}}) : \\ ρ_{m} (X^{n - 1}; Y^{n - 1}) \leq ρ \end{matrix}} (H (Z^{n - 1}) + sup_{\begin{matrix} P_{X_{n} Y_{n} | X^{n - 1} Y^{n - 1}} \in C_{s} (P_{X_{n} | X^{n - 1}}) : \\ ρ_{m} (X_{n}; Y_{n} | X^{n - 1}, Y^{n - 1}) \leq ρ \end{matrix}} H (Z_{n} | Z^{n - 1})) \\ \geq sup_{\begin{matrix} P_{X^{n - 1} Y^{n - 1}} \in C_{s} (P_{X^{n - 1}}) : \\ ρ_{m} (X^{n - 1}; Y^{n - 1}) \leq ρ \end{matrix}} H (Z^{n - 1}) \\ + inf_{\begin{matrix} P_{X^{n - 1} Y^{n - 1}} \in C_{s} (P_{X^{n - 1}}) : \\ ρ_{m} (X^{n - 1}; Y^{n - 1}) \leq ρ \end{matrix}} sup_{\begin{matrix} P_{X_{n} Y_{n} | X^{n - 1} Y^{n - 1}} \in C_{s} (P_{X_{n} | X^{n - 1}}) : \\ ρ_{m} (X_{n}; Y_{n} | X^{n - 1}, Y^{n - 1}) \leq ρ \end{matrix}} H (Z_{n} | Z^{n - 1}) \\ \geq \dots \dots \\ \geq \sum_{i = 1}^{n} inf_{\begin{matrix} P_{X^{i - 1} Y^{i - 1}} \in C_{s} (P_{X^{i - 1}}) : \\ ρ_{m} (X^{i - 1}; Y^{i - 1}) \leq ρ \end{matrix}} sup_{\begin{matrix} P_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} \in C_{s} (P_{X_{i} | X^{i - 1}}) : \\ ρ_{m} (X_{i}; Y_{i} | X^{i - 1}, Y^{i - 1}) \leq ρ \end{matrix}} H (Z_{i} | Z^{i - 1}), \end{matrix}

(11)

where the first inequality above follows by Lemma 1 and the chain rule for entropies. In fact, in the derivation above, the i-th distribution

P_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}}

is chosen as a greedy coupling in the sense that it only maximizes the i-th objective function

H (Z_{i} | Z^{i - 1})

, regardless of other

H (Z_{j} | Z^{j - 1})

with

j > i

(although it indeed affects their values).

By the fact that conditioning reduces entropy, it holds that

H (Z_{i} | Z^{i - 1}) \geq H (Z_{i} | X^{i - 1}, Y^{i - 1}) .

Denote

\begin{matrix} g_{i} (P_{X^{i - 1}}, ρ) : = & inf_{\begin{matrix} P_{X^{i - 1} Y^{i - 1}} \in C_{s} (P_{X^{i - 1}}) : \\ ρ_{m} (X^{i - 1}; Y^{i - 1}) \leq ρ \end{matrix}} sup_{\begin{matrix} P_{X_{i} Y_{i} | X^{i - 1} Y^{i - 1}} \in C_{s} (P_{X_{i} | X^{i - 1}}) : \\ ρ_{m} (X_{i}; Y_{i} | X^{i - 1}, Y^{i - 1}) \leq ρ \end{matrix}} H (Z_{i} | X^{i - 1}, Y^{i - 1}) . \end{matrix}

(12)

Then, the expression at the right-hand side of (11) is further lower bounded by

\sum_{i = 1}^{n} g_{i} (P_{X^{i - 1}}, ρ)

. Combing this with (8) and (11), and by noting that

ρ \in [0, 1]

is arbitrary, we obtain that

\begin{matrix} Γ_{n} (t) & \geq inf_{P_{X^{n}} : P_{X_{i}} (1) \leq t, \forall i} \frac{{sup}_{ρ \in [0, 1]} \sum_{i = 1}^{n} g_{i} (P_{X^{i - 1}}, ρ)}{\sum_{i = 1}^{n} H (X_{i} | X^{i - 1})} \\ = inf_{P_{X^{n}} : P_{X_{i}} (1) \leq t, \forall i} \frac{{sup}_{P_{ρ}} E_{P_{ρ}} \sum_{i = 1}^{n} g_{i} (P_{X^{i - 1}}, ρ)}{\sum_{i = 1}^{n} H (X_{i} | X^{i - 1})} \\ \geq sup_{P_{ρ}} inf_{P_{X^{n}} : P_{X_{i}} (1) \leq t, \forall i} \frac{\sum_{i = 1}^{n} E_{P_{ρ}} g_{i} (P_{X^{i - 1}}, ρ)}{\sum_{i = 1}^{n} H (X_{i} | X^{i - 1})} \\ \geq sup_{P_{ρ}} inf_{P_{X^{n}} : P_{X_{i}} (1) \leq t, \forall i} min_{i \in [n] : H (X_{i} | X^{i - 1}) > 0} \frac{E_{P_{ρ}} g_{i} (P_{X^{i - 1}}, ρ)}{H (X_{i} | X^{i - 1})} \\ \geq sup_{P_{ρ}} inf_{P_{X^{j}} : H (X_{j} | X^{j - 1}) > 0, P_{X_{j}} (1) \leq t} \frac{E_{P_{ρ}} g_{j} (P_{X^{j - 1}}, ρ)}{H (X_{j} | X^{j - 1})}, \end{matrix}

(13)

where

(13) follows since $\frac{a + b}{c + d} \geq min {\frac{a}{c}, \frac{b}{d}}$ for $a, b \geq 0, c, d > 0$ , and $H (X_{i} | X^{i - 1}) = 0$ implies $X_{i}$ is a deterministic function of $X^{i - 1}$ and, hence, $g_{i} (P_{X^{i - 1}}, ρ) = 0$ ;
The index j in the last line is the optimal i attaining the minimum in (13).

Denote

X = X_{j}, Y = Y_{j}, U = X^{j - 1}, V = Y^{j - 1}

, and

Z = X \lor Y

. Then,

\begin{matrix} Γ_{n} (t) & \geq sup_{P_{ρ}} inf_{P_{U X} : H (X | U) > 0, P_{X} (1) \leq t} E_{P_{ρ}} [inf_{\begin{matrix} P_{U V} \in C_{s} (P_{U}) : \\ ρ_{m} (U; V) \leq ρ \end{matrix}} sup_{\begin{matrix} P_{X Y | U V} \in C_{s} (P_{X | U}) : \\ ρ_{m} (X; Y | U, V) \leq ρ \end{matrix}} \frac{H (Z | U, V)}{H (X | U)}] . \end{matrix}

(14)

We next further simplify the lower bound in (14). Denote

p = P_{X | U} (1 | U), q = P_{Y | V} (1 | V), r = P_{X Y | U V} (1, 1 | U, V) .

(15)

So,

P_{X Y | U V} (\cdot | U, V) = [\begin{matrix} 1 + r - p - q & q - r \\ p - r & r \end{matrix}]

with

max {0, p + q - 1} \leq r \leq min {p, q} .

Note that

\begin{matrix} ρ_{m} (X; Y | U, V) & = sup_{u, v : P_{U V} (u, v) > 0} ρ_{m} (X_{u}; Y_{v}) \\ = sup_{u, v : P_{U V} (u, v) > 0} |ρ_{p} (X_{u}; Y_{v})| \\ = sup_{u, v : P_{U V} (u, v) > 0} \frac{|r - p q|}{\sqrt{p (1 - p) q (1 - q)}}, \end{matrix}

(16)

where

(X_{u}, Y_{v}) \sim P_{X Y | U = u, V = v}

,

ρ_{p}

denotes the Pearson correlation coefficient and (16) follows since the maximal correlation coefficient between two binary random variables is equal to the absolute value of the Pearson correlation coefficient between them; see, e.g., [19]. So,

ρ_{m} (X; Y | U, V) \leq ρ

is equivalent to

\frac{|r - p q|}{\sqrt{p (1 - p) q (1 - q)}} \leq ρ

a.s. and also equivalent to

z_{1} \leq r \leq z_{2}

a.s.

The inner supremum in (14) can be rewritten as

\begin{matrix} sup_{P_{X Y | U V} \in C_{s} (P_{X | U}) : ρ_{m} (X; Y | U, V) \leq ρ} H (Z | U, V) \\ = E_{p, q} sup_{max {0, p + q - 1, z_{1}} \leq r \leq min {p, q, z_{2}}} h (p + q - r) . \end{matrix}

By the fact that h is increasing on

[0, 1 / 2]

and decreasing on

[1 / 2, 1]

, it holds that the optimal r attaining the supremum in the last line above, denoted by

r^{*}

, is the median of

max {0, p + q - 1, z_{1}}

,

p + q - 1 / 2

, and

min {p, q, z_{2}}

, which implies

p + q - r^{*} = φ (ρ, p, q) .

Recall the definition of

φ

in (1). So, the inner supremum in (14) is equal to

\frac{E_{p, q} h (φ (ρ, p, q))}{E h (p)}

.

We make the following observations. Firstly,

\begin{matrix} H (X | U) & = E h (p), \\ P_{X} (1) & = E p . \end{matrix}

Secondly, by the definition of maximal correlation,

ρ_{m} (p; q) \leq ρ_{m} (U; V)

holds (which is known as the data processing inequality) since

p, q

are, respectively, functions of

U, V

; see (15). Lastly, observe that

P_{U V}

is symmetric and

p, q

are obtained from

U, V

via the same function

P_{X | U} (1 | \cdot)

(since

P_{X | U} = P_{Y | V}

holds by the symmetry of

P_{X Y | U V}

). Hence,

P_{p q}

is symmetric as well. Substituting all of these into (14) yields

Γ_{n} (t) \geq Γ (t)

. □

4. Proof of Proposition 1

By choosing

P_{ρ} = (1 - α) δ_{0} + α δ_{1}

in (2), we obtain

Γ (t) \geq sup_{α \in [0, 1]} inf_{symmetric P_{p q} : E h (p) > 0, E p \leq t} \frac{g (P_{p q}, α)}{E h (p)} .

Note that

P_{p q} \mapsto g (P_{p q}, α)

is concave, since, by Lemma 5 in [10]

P_{p} \mapsto E_{(p, q) \sim P_{p}^{\otimes 2}} h (p + q - p q)

is concave, and

P_{p q} \mapsto P_{p}

is linear.

Let B be a finite subset of

[0, 1]

. Let

P_{B}

be the set of symmetric distributions

P_{p q}

concentrated on

B^{2}

such that

E p \leq t

. By the Krein–Milman theorem,

P_{B}

is equal to the closed convex hull of its extreme points. These extreme points are of the form

(1 - β) Q_{a_{1}, a_{2}} + β Q_{b_{1}, b_{2}}

with

0 \leq a \leq t < b \leq 1

and

β = 0

or

\frac{t - a}{b - a}

; recall the definitions

a : = \frac{a_{1} + a_{2}}{2}, b : = \frac{b_{1} + b_{2}}{2}

, and

Q_{x, y} : = \frac{1}{2} δ_{(x, y)} + \frac{1}{2} δ_{(y, x)}

in (6) and (7). By Carathéodory’s theorem, it is easy to see that the convex hull of these extreme points is closed (in the weak topology or, equivalently, in the relative topology on the probability simplex). So, every

P_{p q}

supported on a finite set

B^{2} \subseteq {[0, 1]}^{2}

such that

E p \leq t

is a convex combination of the extreme points above, i.e.,

P_{p q} = \sum_{i = 1}^{k} γ_{i} Q_{i}

where

Q_{i}, i \in [k]

are extreme points, and

γ_{i} > 0

and

\sum_{i = 1}^{k} γ_{i} = 1

. For this distribution,

\begin{matrix} \frac{g (P_{p q}, α)}{E h (p)} & = \frac{g (\sum_{i = 1}^{k} γ_{i} Q_{i}, α)}{\sum_{i = 1}^{k} γ_{i} E_{Q_{i}} h (p)} \\ \geq \frac{\sum_{i = 1}^{k} γ_{i} g (Q_{i}, α)}{\sum_{i = 1}^{k} γ_{i} E_{Q_{i}} h (p)} \\ \geq min_{i : E_{Q_{i}} h (p) > 0} \frac{g (Q_{i}, α)}{E_{Q_{i}} h (p)} \end{matrix}

where, in the last line, the constraint

E_{Q_{i}} h (p) > 0

is posed since

E_{Q_{i}} h (p) = 0

implies

Q_{i} = δ_{(0, 0)}

(note that

t < 1 / 2

) and, hence,

g (Q_{i}, α) = 0

.

Therefore,

Γ (t) \geq sup_{α \in [0, 1]} inf_{P_{p q} : E h (p) > 0} \frac{g (P_{p q}, α)}{E h (p)},

(17)

where the infimum is taken over distributions

P_{p q}

of the form

(1 - β) Q_{a_{1}, a_{2}} + β Q_{b_{1}, b_{2}}

with

0 \leq a \leq t < b \leq 1

and

β = 0

or

β = \frac{t - a}{b - a} > 0

such that

E h (p) > 0

. (Recall the definition of

a, b

in (6)). □

5. Discussion

The breakthrough made by Gilmer [7] shows the power of information-theoretic techniques in tackling problems in related fields. In fact, the union-closed sets conjecture has a natural interpretation in the information-theoretic (or coding-theoretic) sense. Consider the memoryless OR multi-access channel

(x^{n}, y^{n}) \in Ω^{2 n} \mapsto x^{n} \lor y^{n} \in Ω^{n}

. We would like to find a nonempty code

A \subseteq Ω^{n}

to generate two independent inputs

X^{n}, Y^{n}

with each following

Unif (A)

such that the input constraint

E [X_{i}] \leq t, \forall i \in [n]

is satisfied and the output

X^{n} \lor Y^{n}

is still in A a.s. The union-closed sets conjecture states that such a code exists if and only if

t \geq 1 / 2

. Based on this information-theoretic interpretation, it is reasonable to see that the information-theoretic techniques work for this conjecture. It is well-known that information-theoretic techniques usually work very well for problems with “approximate” constraints, e.g., the channel coding problem with the asymptotically vanishing error probability constraint (or the approximate version of the union-closed sets problem introduced in [9]). It is still unclear whether information-theoretic techniques are sufficient to prove sharp bounds for problems with “exact” constraints, e.g., the zero-error coding problem (or the original version of the union-closed sets conjecture).

Furthermore, as an intermediate result, it has been shown that

Γ_{n} (t) > 1

implies

p_{A} > t

for any OR-closed

A \subseteq Ω^{n}

. Here

Γ_{n} (t)

is given in (8), expressed in the multi-letter form (i.e., the dimension-dependent form). By the super-block coding argument, it is verified that, given

t > 0

,

{lim}_{n \to \infty} Γ_{n} (t)

exists. It is interesting to investigate this limit and prove a single-letter (dimension-independent) expression for it.

For simplicity, in this paper, we only consider the maximal correlation coefficient as the constraint function. In fact, the maximal correlation coefficient used here can be replaced by other functionals. The key property of the maximal correlation coefficient we used in this paper is the “tensorization” property, i.e., (10) (in fact, only “≤” part of (10) was used in our proof). In the literature, there is a class of measures of correlation satisfying this property, e.g., the hypercontractivity constant, strong data processing inequality constant, or, more generally,

Φ

-ribbons, see [20,21,22]. (Although the tensorization property in the literature is only defined and proven for independent random variables, this property can be extended to the coupling constructed in (9)). Following the same proof steps given in this paper, one can obtain various variants of Theorem 1 with the maximal correlation coefficient replaced by other quantities, as long as these quantities satisfy the tensorization property. Another potential direction is to replace the Shannon entropy with a class of more general quantities, Rényi entropies. However, unfortunately Rényi entropies do not satisfy the chain rule (unlike the Shannon entropy), which leads to a serious difficulty in single-letterizing the corresponding multi-letter bound such as

Γ_{n} (t)

in (8) (i.e., in making the multi-letter bound dimension-independent).

Funding

This work was supported by the NSFC grant 62101286 and the Fundamental Research Funds for the Central Universities of China (Nankai University).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author would like to thank Fan Chang for bringing Gilmer’s breakthrough [7] to their attention and thank Stijn Cambie for sharing their early draft of [16] after he noticed the present paper on arXiv. The author also would like to thank the guest editor and the anonymous referees for their comments, which led to significant improvements in the presentation of this paper.

Conflicts of Interest

The author declares no conflict of interest.

References

Balla, I.; Bollobás, B.; Eccles, T. Union-closed families of sets. J. Comb. Theory Ser. A 2013, 120, 531–544. [Google Scholar] [CrossRef]
Johnson, R.T.; Vaughan, T.P. On union-closed families, I. J. Comb. Theory Ser. A 1998, 84, 242–249. [Google Scholar] [CrossRef]
Karpas, I. Two results on union-closed families. arXiv 2017, arXiv:1708.01434. [Google Scholar]
Knill, E. Graph generated union-closed families of sets. arXiv 1994, arXiv:math/9409215. [Google Scholar]
Wójcik, P. Union-closed families of sets. Discret. Math. 1999, 199, 173–182. [Google Scholar] [CrossRef]
Bruhn, H.; Schaudt, O. The journey of the union-closed sets conjecture. Graphs Comb. 2015, 31, 2043–2074. [Google Scholar] [CrossRef]
Gilmer, J. A constant lower bound for the union-closed sets conjecture. arXiv 2022, arXiv:2211.09055. [Google Scholar]
Sawin, W. An improved lower bound for the union-closed set conjecture. arXiv 2022, arXiv:2211.11504. [Google Scholar]
Chase, Z.; Lovett, S. Approximate union closed conjecture. arXiv 2022, arXiv:2211.11689. [Google Scholar]
Alweiss, R.; Huang, B.; Sellke, M. Improved lower bound for the union-closed sets conjecture. arXiv 2022, arXiv:2211.11731. [Google Scholar]
Pebody, L. Extension of a Method of Gilmer. arXiv 2022, arXiv:2211.13139. [Google Scholar]
Yu, L.; Tan, V.Y.F. On Exact and ∞-Rényi common information. IEEE Trans. Inf. Theory 2020, 66, 3366–3406. [Google Scholar] [CrossRef]
Yu, L. Strong Brascamp–Lieb inequalities. arXiv 2021, arXiv:2102.06935. [Google Scholar]
Yu, L. Exact exponents for concentration and isoperimetry in product Polish spaces. arXiv 2022, arXiv:2205.07596. [Google Scholar]
Witsenhausen, H.S. On sequences of pairs of dependent random variables. SIAM J. Appl. Math. 1975, 28, 100–113. [Google Scholar] [CrossRef]
Cambie, S. Better bounds for the union-closed sets conjecture using the entropy approach. arXiv 2022, arXiv:2212.12500. [Google Scholar]
Yu, L. On Conditional Correlations. arXiv 2018, arXiv:1811.03918. [Google Scholar]
Beigi, S.; Gohari, A. Monotone measures for non-local correlations. IEEE Trans. Inf. Theory 2015, 61, 5185–5208. [Google Scholar] [CrossRef]
Anantharam, V.; Gohari, A.; Kamath, S.; Nair, C. On maximal correlation, hypercontractivity, and the data processing inequality studied by Erkip and Cover. arXiv 2013, arXiv:1304.6133. [Google Scholar]
Ahlswede, R.; Gács, P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 1976, 4, 925–939. [Google Scholar] [CrossRef]
Raginsky, M. Strong data processing inequalities and Φ-Sobolev inequalities for discrete channels. IEEE Trans. Inf. Theory 2016, 62, 3355–3389. [Google Scholar] [CrossRef]
Beigi, S.; Gohari, A. Φ-entropic measures of correlation. IEEE Trans. Inf. Theory 2018, 64, 2193–2211. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, L. Dimension-Free Bounds for the Union-Closed Sets Conjecture. Entropy 2023, 25, 767. https://doi.org/10.3390/e25050767

AMA Style

Yu L. Dimension-Free Bounds for the Union-Closed Sets Conjecture. Entropy. 2023; 25(5):767. https://doi.org/10.3390/e25050767

Chicago/Turabian Style

Yu, Lei. 2023. "Dimension-Free Bounds for the Union-Closed Sets Conjecture" Entropy 25, no. 5: 767. https://doi.org/10.3390/e25050767

APA Style

Yu, L. (2023). Dimension-Free Bounds for the Union-Closed Sets Conjecture. Entropy, 25(5), 767. https://doi.org/10.3390/e25050767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dimension-Free Bounds for the Union-Closed Sets Conjecture

Abstract

1. Introduction

2. Main Results

3. Proof of Theorem 1

4. Proof of Proposition 1

5. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI