Next Article in Journal
Does the Differential Structure of Space-Time Follow from Physical Principles?
Next Article in Special Issue
Hierarchical Cache-Aided Networks for Linear Function Retrieval
Previous Article in Journal
Genetic Algorithm for Feature Selection Applied to Financial Time Series Monotonicity Prediction: Experimental Cases in Cryptocurrencies and Brazilian Assets
Previous Article in Special Issue
Age of Synchronization Minimization Algorithms in Wireless Networks with Random Updates under Throughput Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels †

1
State Key Laboratory of ISN, Xidian University, Xi’an 710071, China
2
IT Operation Center, Bank of China, Beijing 100094, China
3
Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd., Shatin, N.T., Hong Kong SAR, China
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in IEEE International Symposium on Information Theory (ISIT), 2022 (Espoo, Finland, 26 June–1 July) and 2023 (Taipei, Taiwan, 25–30 June).
Entropy 2024, 26(3), 178; https://doi.org/10.3390/e26030178
Submission received: 9 January 2024 / Revised: 15 February 2024 / Accepted: 15 February 2024 / Published: 20 February 2024
(This article belongs to the Special Issue Advances in Multiuser Information Theory)

Abstract

:
The celebrated Blahut–Arimoto algorithm computes the capacity of a discrete memoryless point-to-point channel by alternately maximizing the objective function of a maximization problem. This algorithm has been applied to degraded broadcast channels, in which the supporting hyperplanes of the capacity region are again cast as maximization problems. In this work, we consider general broadcast channels and extend this algorithm to compute inner and outer bounds on the capacity regions. Our main contributions are as follows: first, we show that the optimization problems are max–min problems and that the exchange of minimum and maximum holds; second, we design Blahut–Arimoto algorithms for the maximization part and gradient descent algorithms for the minimization part; third, we provide convergence analysis for both parts. Numerical experiments validate the effectiveness of our algorithms.

1. Introduction

In 1972, Cover [1] introduced the two-receiver discrete memoryless broadcast channel p ( y , z | x ) to model a system of downlink communication in which X is the sender and ( Y , Z ) are the receivers. In the same paper, he proposed a coding scheme which resulted in the superposition coding inner bound (SCIB). It turns out that the SCIB is indeed the capacity region for two-receiver broadcast channels in which the receivers are comparable in the following partial orders: degraded [2], less noisy [3], and more capable [3]. However, for a general broadcast channel the single-letter capacity region remains open.
To characterize the capacity region of a broadcast channel, a standard approach is to show that one inner bound matches another outer bound. Currently, the best inner bound for general broadcast channels is Marton’s inner bound (MIB) [4], while the UV outer bound (UVOB) [5] was the best outer bound until recently, when a better one called J version outer bound was proposed in [6].
The evaluation of inner and outer bounds is critical in the following aspects: (1) the evaluation of an inner bound usually results in an optimal input distribution which can help in the design of practical coding schemes; (2) the identification of the capacity region of a particular broadcast channel through the comparison of one inner bound and one outer bound relies on the evaluation of these two bounds; and (3) when claiming to establish a new bound, it is necessary to show that the new bound strictly improves on old ones through the evaluation of bounds on a particular channel.
Remark 1.
This is the full version of conference papers accepted by ISIT 2022 and 2023 [7,8].
However, this evaluation is usually difficult due to its non-convexity [9]. To alleviate this issue, there exist a number of generic optimization algorithms, such as interior point [10], active set [10], and sequential quadratic programming [11]. However, efficient algorithms should use the domain knowledge of information theory as well; from this viewpoint, we consider the Blahut–Arimoto (BA) algorithm, which is specially customized for information theory.
The original BA algorithm was independently developed by Blahut [12] and Arimoto [13] to calculate the channel capacity C = max q ( x ) I ( X ; Y ) for a general point-to-point channel p ( y | x ) . The algorithm transforms the original maximization problem into an alternating maximization problem:
max q ( x ) x , y q ( x ) p ( y | x ) ln q ( x | y ) q ( x ) , max q ( x ) , Q ( x | y ) x , y q ( x ) p ( y | x ) ln Q ( x | y ) q ( x )
where the updating formulae are explicit within each iteration.
There have been numerous extensions of the BA algorithm to various scenarios in information theory. For example, [14] applied the BA algorithm to compute the sum rate of the multiple-access channel. Later, using the idea from the BA algorithm, the whole capacity region of the multiple-access channel was formulated in [15] as a rank-one constrained problem and solved by relaxation methods. It is beyond the scope of this paper to list all of these references. Instead, we discuss those papers closely related to computing the bounds on capacity regions of broadcast channels.
In [16], the authors considered the capacity region of a degraded broadcast channel p ( y , z | x ) , where receiver Z is a degraded version of Y.
In this scenario, the capacity region of the rate pairs ( R Y , R Z ) is known, and can be achieved by the simplified version of superposition coding. The supporting hyperplanes can be characterized as
θ R Y + ( 1 θ ) R Z = max q ( u , x ) θ I ( X ; Y | U ) + ( 1 θ ) I ( U ; Z ) .
Using a similar idea to that of the BA algorithm, the authors designed an algorithm to alternatively maximize the objective function.
The method in [16] is directly applicable to less noisy broadcast channels, as the characterization of the capacity region is the same as that of the degraded case. However, this equivalence no longer holds for the more capable case, as this time the value of the supporting hyperplane θ R Y + ( 1 θ ) R Z is characterized as a max–min optimization problem (e.g., see Equation (14)). As a mater of fact, the supporting hyperplanes of the above-mentioned bounds, that is, SCIB, MIB, and UVOB, are all of the max–min form. The main issue is that the minimization part is inside the maximization part, which prevents the application of the BA algorithm to the whole problem.
The algorithms for calculating inner bounds and outer bounds for general broadcast channels are very limited. The authors of [17] considered MIB (see Section 3.2) and designed a BA algorithm to compute the sum rate R Y + R Z of the simplified version, where the auxiliary random variable W = . The objective function,
u , v , x , y , z q ( u , v ) q ( x | u , v ) p ( y , z | x ) ln Q ( u | y ) Q ( v | z ) q ( u , v )
is convex in q ( x | u , v ) , which means that the maximum input X is a function of ( U , V ) . Noticing this, the authors performed optimization over all fixed mappings q ( x | u , v ) . However, discarding W can result in a strictly smaller sum rate [18], making it is necessary to consider the complete version of MIB.
In this paper, we seek to design BA algorithms for general broadcast channels in order to compute the following inner and outer bounds: SCIB, MIB, and UVOB. The key difference here is that the optimization problems are max–min problems, rather than only containing a maximization part. In Table 1, we provide an intuitive comparison of related references.
The notation we use is as follows. p denotes a fixed (conditional) probability distribution such as p ( y , z | x ) , while q and Q are used for (conditional) probabilities that are changeable. Calligraphic letters such as S are used to denote sets. The use of square brackets in the function f [ g ] means that f is specified by the variable g; θ ¯ denotes θ ¯ : = 1 θ ; and unless otherwise specified, we use the natural logarithm. To make the mathematical expressions more concise, we use the following abbreviations of the Kullback–Leibler divergences:
D Y ( p | | q ) : = y p ( y ) ln p ( y ) q ( y ) , D Y | x ( p | | q ) : = y p ( y | x ) ln p ( y | x ) q ( y | x ) , D Y | X ( p | | q ) : = x p ( x ) · D Y | x ( p | | q ) .
The organization is as follows. First, in Section 2 we introduce the necessary background on the BA algorithm and its extension in [16]. Then, in Section 3 we extend the BA algorithm to the evaluation of SCIB, MIB, and UVOB. Convergence analyses of these algorithms are presented in Section 4. Finally, in Section 5 we perform numerical experiments to validate the effectiveness and efficiency of our algorithms.

2. Mathematical Background of Blahut–Arimoto Algorithms

We first introduce the standard BA algorithm in Section 2.1, as we will rely on several of its properties later. Then, in Section 2.2 we discuss why the method in [16] cannot be applied to general broadcast channels.

2.1. Blahut–Arimoto Algorithm for Point-to-Point Channel

For a point-to-point channel p ( y | x ) , the capacity C is the maximum of the mutual information C = max q ( x ) I ( X ; Y ) = max q ( x ) C ( q ) , where
C ( q ) = H ( X ) H ( X | Y ) = x , y q ( x ) p ( y | x ) ln q ( x | y ) q ( x ) .
By replacing q ( x | y ) with a free variable Q ( x | y ) , the BA algorithm performs alternating maximization max q , Q C ( q , Q ) , where
C ( q , Q ) = x , y q ( x ) p ( y | x ) ln Q ( x | y ) q ( x ) .
Notice here that we abuse the notation of C ( · ) , which should not cause confusion in general.
The above objective function can be reformulated as follows:
C ( q , Q ) = x q ( x ) ( d [ Q ] ( x ) ln q ( x ) ) ,
where d [ Q ] ( x ) = y p ( y | x ) ln Q ( x | y ) .
We call this the basic form. For different scenarios, BA algorithms mainly differ in the distribution q ( · ) and function d [ Q ] ( · ) .
The following theorem (see proof in Appendix A) provides the explicit formulae for the maximum Q given q (denoted as Q [ q ] ) and maximum q given Q (denoted as q [ Q ] ).
Theorem 1.
The following properties hold for the problem max q , Q C ( q , Q ) .
  • Given a fixed q, C ( q , Q ) is concave in Q, and the maximum point Q [ q ] is induced by the input and the channel,
    Q [ q ] ( x | y ) = q ( x | y ) = q ( x ) p ( y | x ) x q ( x ) p ( y | x ) .
    Further, the function values satisfy
    C ( q ) = C ( q , Q [ q ] ) C ( q , Q ) .
  • Given a fixed Q, C ( q , Q ) is concave in q, and the maximum point q [ Q ] is obtained by the Lagrangian,
    q [ Q ] = exp { d [ Q ] ( x ) } x exp { d [ Q ] ( x ) } .
    Further, evaluation of the function value results in
    C ( q [ Q ] , Q ) = ln x exp { d [ Q ] ( x ) } = d [ Q ] ( x ) ln q [ Q ] ( x ) , x .
Starting from an initial q 0 ( x ) > 0 , the BA algorithm performs alternating maximization and produces a sequence of points
q 0 Q 1 q 1 Q 2 q n Q n + 1
where
Q n = Q [ q n 1 ] , q n = q [ Q n ]
according to Equations (5) and (7), respectively.
The criterion for stopping the iterations is based on the following result [12,13].
Proposition 1
(Proposition 1 in [13], Theorem 2 in [12]). It is the case that q maximizes C ( q ) if and only if the following holds for Q [ q ] and some scalar D:
d [ Q [ q ] ] ( x ) ln q ( x ) = D , q ( x ) > 0 , D , q ( x ) = 0 .
Remark 2.
It should be mentioned that in order to avoid infinity minus infinity for those q ( x ) = 0 in the above proposition, the following equivalent formulae can be used:
d [ Q [ q ] ] ( x ) ln q ( x ) = y p ( y | x ) ln q ( x | y ) q ( x ) = y p ( y | x ) ln p ( y | x ) q ( y ) .
This is not be an issue in the BA algorithm, as q n ( x ) > 0 according to Equation (7).
Thus, at the end of the n-th step, if the difference
max x { d [ Q [ q n 1 ] ] ( x ) ln q n 1 ( x ) } C ( q n , Q n ) = max x { d [ Q n ] ( x ) ln q n 1 ( x ) } ln x exp { d [ Q n ] ( x ) }
is small enough then the iteration is stopped.
Summarizing the above details, we arrive at the BA algorithm depicted in Algorithm 1. The convergence of the resulted sequence of C ( q n , Q n ) is characterized in the following theorem (see proof in Appendix A).
Theorem 2
(Theorem 1 in [13], Theorem 3 in [12]). If p 0 > 0 , then the value C ( q n , Q n ) converges monotonically from below to the capacity C.
Algorithm 1: Computing channel capacity
Input:  p ( y | x ) , maximum iterations N, threshold ϵ > 0 ;
Initialization:  q 0 ( x ) > 0 , ϵ 0 > ϵ , n = 0 ;
while  n < N and ϵ n > ϵ  do
   (   (   n n + 1 ;
Q n = Q [ q n 1 ] using Equation (5);
q n = q [ Q n ] using Equation (7);
C ( q n , Q n ) = ln x exp { d [ Q n ] ( x ) } using Equations (8) and (4);
ϵ n = max x { d [ Q n ] ( x ) ln q n 1 ( x ) } C ( q n , Q n ) using Equation (4);
end
Output:  q n ( x ) , Q n ( x | y ) , C ( q n , Q n )

2.2. Blahut–Arimoto Algorithm for Degraded Broadcast Channel

In [16], the authors considered the capacity region of the degraded broadcast channel. The original objective function for the value of the supporting hyperplane θ R Y + θ ¯ R Z (where θ < 1 2 ) is:
F ( q ( u , x ) ) = θ I ( X ; Y | U ) + θ ¯ I ( U ; Z ) = θ ¯ ( I ( X ; Y | U ) + I ( U ; Z ) ) ( θ ¯ θ ) I ( X ; Y | U )
  = θ ¯ ( H ( U , X ) H ( X | Y , U ) H ( U | Z ) ) ( θ ¯ θ ) ( H ( Y | U ) H ( Y | X ) ) .
Similar to the BA algorithm, the new objective function is
F ( q , Q ) = θ ¯ · u , x q ( u , x ) ( d [ Q ] ( u , x ) ln q ( u , x ) ) ,
where
d [ Q ] ( u , x ) = y , z p ( y , z | x ) ln Q ( x | y , u ) Q ( u | z ) + θ ¯ θ θ ¯ ln Q ( y | u ) p ( y | x ) .
Then, the authors designed an extended BA algorithm that alternately maximizes F ( q , Q ) and analysed the convergence.
However, the above method does not generalize to allow for evaluating the capacity bounds of general broadcast channels. The main reason can be summarized in just one sentence: the minimum of the expectation is, in general, greater than the expectation of the minimum. Taking SCIB as an example, using the representation A ( U , X ) in Lemma 1, the supporting hyperplane is
θ R Y + θ ¯ R Z = max q ( u , x ) θ I ( X ; Y | U ) + θ ¯ · min { I ( U ; Z ) , I ( U ; Y ) } = max q ( u , x ) ( θ ¯ ( H ( U , X ) H ( X | Y , U ) + min { H ( U | Z ) , H ( U | Y ) } ) ( θ ¯ θ ) ( H ( Y | U ) H ( Y | X ) ) ) .
A direct extension might try to reformulate the last expression in the form of Equation (12) by letting
d [ Q ] ( u , x ) = y , z p ( y , z | x ) ln Q ( x | y , u ) + min { ln Q ( u | z ) , ln Q ( u | y ) } + θ ¯ θ θ ¯ ln Q ( y | u ) p ( y | x ) ,
however, this is not an equivalent reformulation, as
min { H ( U | Z ) , H ( U | Y ) } u , x q ( u , x ) · y , z p ( y , z | x ) · min { ln Q ( u | z ) , ln Q ( u | y ) } .
In the following section, we first use the fact that min { f , g } = min α [ 0 , 1 ] { α f + α ¯ g } to parameterize the minimum, then show the max–min exchanges so that we can apply the BA algorithm in the maximization part.

3. Blahut–Arimoto Algorithms for Capacity Bounds of Broadcast Channels

In this section, we first introduce two inner bounds and one outer bound on the capacity region of the broadcast channel. We characterize their supporting hyperplanes as max–min problems and show that the maximum and minimum can be exchanged. Then, we design BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts.

3.1. Superposition Coding Inner Bound

The superposition coding inner bound was proposed by Cover in [1], which corresponds to region B in the following lemma (see proof in Appendix B). This region actually has three equivalent characterizations.
Lemma 1
(folklore). The following regions A , B , and C are equivalent characterizations of the superposition coding inner bound:
A : = q ( u , x ) A ( U , X ) : = R Y I ( X ; Y | U ) , R Z min { I ( U ; Z ) , I ( U ; Y ) } , B : = q ( u , x ) B ( U , X ) : = R Y I ( X ; Y | U ) , R Z I ( U ; Z ) , R Y + R Z I ( X ; Y ) , C : = q ( u , x ) C ( U , X ) : = R Z I ( U ; Z ) , R Y + R Z min { I ( X ; Y | U ) + I ( U ; Z ) , I ( X ; Y ) } .
To characterize the supporting hyperplane θ R Y + θ ¯ R Z = F , we choose to use the representation A ( U , X ) . It is clear that
F = max q ( u , x ) θ I ( X ; Y | U ) + θ ¯ · min { I ( U ; Z ) , I ( U ; Y ) } = max q ( u , x ) θ I ( X ; Y | U ) + θ ¯ · min α [ 0 , 1 ] α I ( U ; Z ) + α ¯ I ( U ; Y ) = max q ( u , x ) min α [ 0 , 1 ] θ I ( X ; Y | U ) + θ ¯ α I ( U ; Z ) + θ ¯ α ¯ I ( U ; Y ) .
Notice here that we cannot use the BA algorithm directly, as there is a minimum inside the maximum. If we are able to swap the orders of the maximum and minimum, then we can adopt the BA algorithm in the maximization part.
To show this kind of exchange, we first introduce a Terkelsen-type min–max result in the following lemma.
Lemma 2
(Corollary 2 in Appendix A of [19]). Let Λ d be the d-dimensional simplex, i.e., λ i 0 and i = 1 d λ i = 1 , let P be a set of probability distributions p ( u ) , and let T i ( p ( u ) ) , i = 1 , . . , d be a set of functions such that the set T defined by
T = { ( a 1 , . . . , a d ) R d : a i T i ( p ( u ) ) for some p ( u ) P }
is a convex set; then,
sup p ( u ) P min λ Λ d i λ i T i ( p ( u ) ) = min λ Λ d sup p ( u ) P i λ i T i ( p ( u ) ) .
With this lemma, we are to establish the following theorem (proof in Appendix B).
Theorem 3.
The supporting hyperplane θ R Y + θ ¯ R Z = F of the superposition coding inner bound is as follows: if θ [ 1 2 , 1 ] , then F = max q ( x ) θ I ( X ; Y ) ; otherwise, if θ [ 0 , 1 2 ) then
F = min { min α 1 2 θ 1 θ max q ( x ) θ ¯ α ¯ I ( X ; Y ) + θ ¯ α I ( X ; Z ) , min α > 1 2 θ 1 θ max q ( u , x ) θ I ( X ; Y | U ) + θ ¯ α ¯ I ( U ; Y ) + θ ¯ α I ( U ; Z ) } .
Further, it suffices to consider the cardinality size: | U | | X | .
For the maximization part and the nontrivial case where θ ( 0 , 1 2 ) , following the above theorem, two types of BA algorithms can be designed according to the value of α .
When α ( 1 2 θ 1 θ , 1 ] , the original objective function is
F ( α , q ) = θ I ( X ; Y | U ) + θ ¯ α ¯ I ( U ; Y ) + θ ¯ α I ( U ; Z ) = θ ¯ ( I ( X ; Y | U ) + α ¯ I ( U ; Y ) + α I ( U ; Z ) ) ( θ ¯ θ ) I ( X ; Y | U ) = θ ¯ ( H ( U , X ) H ( X | Y , U ) α ¯ H ( U | Y ) α H ( U | Z ) )
+ ( θ ¯ θ ) ( H ( Y | X ) H ( Y | U ) ) = θ ¯ · u , x , y , z q ( u , x ) p ( y , z | x ) ln q ( x | y , u ) q ( u , x ) q ( u | y ) α ¯ q ( u | z ) α + θ ¯ θ θ ¯ ln q ( y | u ) p ( y | x ) .
By replacing the conditional qs with free variables Qs, we have the new objective function
F ( α , q , Q ) = θ ¯ · u , x q ( u , x ) ( d [ Q ] ( u , x ) ln q ( u , x ) ) ,
where
d [ Q ] ( u , x ) = y , z p ( y , z | x ) ln Q ( x | y , u ) Q ( u | y ) α ¯ Q ( u | z ) α + θ ¯ θ θ ¯ ln Q ( y | u ) p ( y | x ) .
When α [ 0 , 1 2 θ 1 θ ] , similar to Equation (4), the new objective function is
F ( α , q , Q ) = θ ¯ · x q ( x ) ( d [ Q ] ( x ) ln q ( x ) ) ,
where
d [ Q ] ( x ) = y , z p ( y , z | x ) ( α ¯ ln Q ( x | y ) + α ln Q ( x | z ) ) .
For the minimization part, it is possible to use the optimal q and Qs obtained in the maximization part to update α . Because the values of the optimal q and Qs may vary greatly when α changes, we propose changing α locally in the neighbourhood. A candidate approach is to use the gradient descent method, as follows:
α k + 1 = α k τ k · α F ( α k , q ) ,
where
F ( α , q ) α = θ ¯ · ( I ( X ; Z ) I ( X ; Y ) ) , if 0 α 1 2 θ 1 θ , θ ¯ · ( I ( U ; Z ) I ( U ; Y ) ) , if 1 2 θ 1 θ < α 1 .
If the change in α is sufficiently small, it can be assumed that the optimization with respect to α converges and then stop the iteration.
We summarize the above procedures in Algorithm 2. Note that the updating rules for the q and Qs depend on the interval in which the value of α falls.
Algorithm 2: Computing the superposition coding inner bound for θ ( 0 , 1 2 )
Input:  p ( y , z | x ) , maximum iterations K, N, thresholds η , ϵ > 0 , step size τ > 0 ;
Initialization:  α 0 ( 0 , 1 ) , q 0 ( u , x ) > 0 , η α > η , k = 0 ;
while  k < K and η α > η  do
initialize ϵ q > ϵ , n = 0 ;
while  n < N and ϵ q > ϵ  do
n n + 1 ;
Q n = Q [ q n 1 ] using Equation (5) similarly;
q n = q [ Q n ] using Equation (7) similarly;
F ( α k , q n , Q n ) = θ ¯ · ln u , x exp { d [ Q n ] } using Equation (20) or (18);
ϵ q = θ ¯ · max { d [ Q n ] ln q n 1 } F ( α k , q n , Q n ) ;
end
(   k k + 1 ;
calculate α k using Equations (21) and (22);
α k min { 1 , max { 0 , α k } } ;
η α = | α k α k 1 | ;
q 0 q n ;
end
Output:  α k , q n ( u , x ) , Q n , F ( α k 1 , q n , Q n )
Remark 3.
Given α k , according to Equation (8),
F ( α k , q n , Q n ) = θ ¯ · ln u , x exp { d [ Q n ] ( u , x ) } , exp { 1 θ ¯ F ( α k , q n , Q n ) } = u , x exp { d [ Q n ] ( u , x ) } .
According to Equation (18) or (20),  u , x exp { d [ Q n ] } equals a sum of exponents, as it is a function of α k . This is a simple function; thus, we might wonder whether we can minimize F ( α k , q n , Q n ) to update α k . It turns out that this kind of global updating rule can result in an oscillating effect, as can be observed from Figure 1 in [7]. The main reason for this is that q [ Q ] depends locally on α; therefore, it is not suitable to update α globally.

3.2. Marton’s Inner Bound

Marton’s inner bound [4] refers to the union over q ( u , v , w , x ) (such that ( U , V , W ) X ( Y , Z ) is Markov) of the non-negative rate pairs ( R Y , R Z ) satisfying
R Y I ( U , W ; Y ) , R Z I ( V , W ; Z ) , R Y + R Z min { I ( W ; Y ) , I ( W ; Z ) } + I ( U ; Y | W ) + I ( V ; Z | W ) I ( U ; V | W ) .
For general broadcast channels, this is the most well known inner bound.
In the following, we characterize the supporting hyperplane θ R Y + θ ¯ R Z = M of MIB. Because the expressions in MIB have symmetry in Y and Z, without loss of generality we can assume that θ θ ¯ , i.e., θ [ 0 , 1 2 ] . According to [20], the supporting hyperplane is stated in the following lemma.
Lemma 3.
(Equations (2) and (5) in [20]). The supporting hyperplane θ R Y + θ ¯ R Z = M of Marton’s inner bound, where θ [ 0 , 1 2 ] , is
M = max q ( u , v , w , x ) min α [ 0 , 1 ] M ( α , q ( u , v , w , x ) ) = min α [ 0 , 1 ] max q ( u , v , w , x ) M ( α , q ( u , v , w , x ) ) ,
where
M ( α , q ) = ( θ ¯ α θ ) I ( W ; Z ) + α θ I ( W ; Y ) + θ ¯ I ( V ; Z | W ) + θ I ( U ; Y | W ) θ I ( U ; V | W ) .
Further, it suffices to consider the following cardinalities: | U | , | V | | X | , and | W | | X | + 4 .
To compute the value of this supporting hyperplane, we can reformulate M ( α , q ) as follows:
M ( α , q ) = θ ¯ H ( U , V , W , X ) ( θ ¯ α θ ) H ( W | Z ) α θ H ( W | Y ) θ ¯ H ( V | W , Z ) θ H ( U | W , Y ) ( θ ¯ θ ) H ( U | V , W ) θ ¯ H ( X | U , V , W ) .
Then, the objective function M ( α , q , Q ) can be expressed as
M ( α , q , Q ) = θ ¯ u , v , w , x q ( u , v , w , x ) ( d [ Q ] ( u , v , w , x ) ln q ( u , v , w , x ) ) ,
where
d [ Q ] = y , z p ( y , z | x ) ( θ ¯ α θ θ ¯ ln Q ( w | z ) + α θ θ ¯ ln Q ( w | y ) + ln Q ( v | w , z ) + θ θ ¯ ln Q ( u | w , y ) + θ ¯ θ θ ¯ ln Q ( u | v , w ) + ln Q ( x | u , v , w ) ) .
For minimization over α , similar to Section 3.1, we update α along the gradient
α k + 1 = α k τ k · α M ( α k , q ) = α k τ k · θ · ( I ( W ; Y ) I ( W ; Z ) ) .
Similar to Algorithm 2, we summarize the algorithm for MIB in Algorithm 3.
Algorithm 3: Computing Marton’s inner bound for θ ( 0 , 1 2 ]
Input:  p ( y , z | x ) , maximum iterations K, N, thresholds η , ϵ > 0 , step size τ > 0 ;
Initialization:  α 0 ( 0 , 1 ) , q 0 ( u , v , w , x ) > 0 , η α > η , k = 0 ;
while  k < K and η α > η  do
initialize  ϵ q > ϵ , n = 0 ;
while  n < N and ϵ q > ϵ  do
n n + 1 ;
Q n = Q [ q n 1 ] using Equation (5) similarly;
q n = q [ Q n ] using Equation (7) similarly;
M ( α k , q n , Q n ) = θ ¯ · ln u , v , w , x exp { d [ Q n ] } using Equation (26);
ϵ q = θ ¯ · max { d [ Q n ] ln q n 1 } M ( α k , q n , Q n ) ;
end
k k + 1 ;
calculate α k using Equation (27);
α k min { 1 , max { 0 , α k } } ;
η α = | α k α k 1 | ;
q 0 q n ;
end
Output:  α k , q n ( u , v , w , x ) , Q n , M ( α k 1 , q n , Q n )

3.3. UV Outer Bound

The UV outer bound [5] refers to the union over q ( u , v , x ) of non-negative rate pairs ( R Y , R Z ) satisfying
R Y I ( U ; Y ) , R Z I ( V ; Z ) , R Y + R Z I ( U ; Y ) + I ( X ; Z | U ) , R Y + R Z I ( V ; Z ) + I ( X ; Y | V ) .
For general broadcast channels, this was the best outer bound until [6] strictly improved upon it over an erasure Blackwell channel. The following theorem (proof in Appendix B) characterizes the supporting hyperplanes.
Theorem 4
(Claim 2 and Remark 1 in [21]). The supporting hyperplane of the UV outer bound is
θ R Y + θ ¯ R Z = min α , β max q ( u , v , x ) θ ¯ α I ( V ; Z ) + θ ¯ α ¯ I ( X ; Z | U ) + θ β I ( U ; Y ) + θ β ¯ I ( X ; Y | V ) ,
where α , β [ 0 , 1 ] satisfy θ ¯ α + θ β max { θ , θ ¯ } . Further, it suffices to consider the cardinality sizes | U | , | V | | X | .
The original objective function can be reformulated as
G ( α , β , q ) = θ ¯ α ( I ( X ; Y | V ) + I ( V ; Z ) ) ( θ ¯ α θ β ¯ ) I ( X ; Y | V ) + θ β ( I ( X ; Z | U ) + I ( U ; Y ) ) ( θ β θ ¯ α ¯ ) I ( X ; Z | U ) .
The right-hand side contains two parts, both of which are similar to Equation (10), i.e., the objective function of the degraded broadcast channel. It seems workable to apply the BA algorithm twice, however, it should be noted that these two parts are coupled by the same  q ( x ) .
Observe that the first part depends only on q ( v , x ) , while the other depends on q ( u , x ) . It suffices to consider the subset of distributions such that q ( u , v , x ) = q ( x ) q ( v | x ) q ( u | x ) . Thus, it is natural to decouple these two parts by fixing q ( x ) and applying the BA algorithm separately to q ( v | x ) and q ( u | x ) . After some manipulations, we have
G ( α , β , q , Q ) = ( θ ¯ α + θ β ) H ( X ) + θ ¯ α ( H ( V | X ) H ( X | Y , V ) H ( V | Z ) ) ( θ ¯ α θ β ¯ ) ( H ( Y | V ) H ( Y | X ) )
+ θ β ( H ( U | X ) H ( X | Z , U ) H ( U | Y ) ) ( θ β θ ¯ α ¯ ) ( H ( Z | U ) H ( Z | X ) ) = ( θ ¯ α + θ β ) x q ( x ) ln q ( x ) + θ ¯ α x , v q ( x ) q ( v | x ) ( d 1 [ Q ] ( v , x ) ln q ( v | x ) ) + θ β x , u q ( x ) q ( u | x ) ( d 2 [ Q ] ( u , x ) ln q ( u | x ) ) .
The functions d 1 and d 2 in the above are
d 1 [ Q ] ( v , x ) = y , z p ( y , z | x ) ln Q ( x | y , v ) Q ( v | z ) + θ ¯ α θ β ¯ θ ¯ α ln Q ( y | v ) p ( y | x ) ,
d 2 [ Q ] ( u , x ) = y , z p ( y , z | x ) ln Q ( x | z , u ) Q ( u | y ) + θ β θ ¯ α ¯ θ β ln Q ( z | u ) p ( z | x ) .
For fixed q ( x ) q ( v | x ) q ( u | x ) , according to Equation (5), the optimal Qs are induced Q [ q ] s. For fixed Qs, according to Equation  (7), for each x we have
q [ Q ] ( v | x ) = exp { d 1 [ Q ] ( v , x ) } v exp { d 1 [ Q ] ( v , x ) } ,
q [ Q ] ( u | x ) = exp { d 2 [ Q ] ( u , x ) } u exp { d 2 [ Q ] ( u , x ) } .
The value of the objective function is
G ( α , β , q ( x ) , q [ Q ] ( v | x ) , q [ Q ] ( u | x ) , Q ) = ( θ ¯ α + θ β ) x q ( x ) ( d [ Q ] ( x ) ln q ( x ) ) ,
where
d [ Q ] ( x ) = θ ¯ α θ ¯ α + θ β ln v exp { d 1 [ Q ] ( v , x ) } + θ β θ ¯ α + θ β ln u exp { d 2 [ Q ] ( u , x ) } .
Again, according to Equation (7) the optimal q [ Q ] ( x ) and corresponding function value are
q [ Q ] ( x ) = exp { d [ Q ] ( x ) } x exp { d [ Q ] ( x ) } ,
G ( α , β , q [ Q ] , Q ) = ( θ ¯ α + θ β ) ln x exp { d [ Q ] ( x ) } .
For minimization over ( α , β ) , similar to Section 3.1, we update ( α , β ) along the gradient:
α k + 1 = α k τ k · α G ( α k , β k , q ) = α k τ k · θ ¯ · ( I ( V ; Z ) I ( X ; Z | U ) ) ,
β k + 1 = β k τ k · β G ( α k , β k , q ) = β k τ k · θ · ( I ( U ; Y ) I ( X ; Y | V ) ) .
Here, it should be mentioned that ( α , β ) must satisfy the constraint θ ¯ α + θ β max { θ , θ ¯ } . Thus, if the resulting ( α k + 1 , β k + 1 ) violate this constraint, then we need to scale θ ¯ α k + 1 + θ β k + 1 up to be (at least) equal to max { θ , θ ¯ } . One way to accomplish this is to use the equality to make β dependent on α , in which case the gradient descent update becomes α k + 1 = α k τ k · d k , β k + 1 = β k τ k · ( θ ¯ d k / θ ) , where
d k = α G ( α k , β k , q ) θ ¯ θ · β G ( α k , β k , q ) .
Similar to Algorithm 2, we summarize the algorithm for UVOB in Algorithm 4.
Algorithm 4: Computing the UV outer bound
Input:  p ( y , z | x ) , maximum iterations K, N, thresholds η , ϵ > 0 , step size τ > 0 ;
Initialization:  α 0 , β 0 ( 0 , 1 ) , q 0 ( u , v , x ) > 0 , θ ¯ α 0 + θ β 0 max { θ , θ ¯ } , η α , η β > η ,
   k = 0 ;
while  k < K and max { η α , η β } > η  do
initialize ϵ q > ϵ , n = 0 ;
while  n < N and ϵ q > ϵ  do
n n + 1 ;
Q n = Q [ q n 1 ] using Equation (5) similarly;
q n = q [ Q n ] using Equations (33), (34) and (37);
G ( α k , β k , q n , Q n ) = ( θ ¯ α + θ β ) · ln x exp { d [ Q n ] } using Equation (36);
ϵ q = ( θ ¯ α + θ β ) · max { d [ Q n ] ( x ) ln q n 1 ( x ) } G ( α k , β k , q n , Q n ) ;
end
k k + 1 ;
calculate α k and β k using Equations (39) and (40);
α k min { 1 , max { 0 , α k } } ;
β k min { 1 , max { 0 , β k } } ;
if  θ ¯ α k + θ β k < max { θ , θ ¯ } , scale up to equality;
η α = | α k α k 1 |
η β = | β k β k 1 | ;
q 0 q n ;
end
Output:  α k , β k , q n ( u , v , x ) , Q n , G ( α k 1 , β k 1 , q n , Q n )

4. Convergence Analysis

Here, we aim to show that certain convergence results hold if q n lies in a proper convex set which contains the global maximizer q . For this purpose, we first introduce the first-order characterization of a concave function.
Lemma 4
(Lemma 3 in [22]). Given a convex set S , a differentiable function f is concave in S if and only if, for all x , y S ,
f ( y ) f ( x ) ( y x ) T f | x 0 .
Similar to [22], we use the superlevel set to construct the convex set S . Let S F ( α , k ) be the superlevel set of the objective function F ( α , q ) of SCIB:
S F ( α , k ) : = { q | F ( α , q ) k } .
For a fixed k, it is possible for S F ( α , k ) to contain more than one connected set. For q S F ( α , k ) , we denote the connected set that contains q as T F ( α , k , q ) .
Similarly, for MIB and UVOB we define the corresponding (connected) superlevel sets: S M ( α , k ) , T M ( α , k , q ) , S G ( α , β , k ) , T G ( α , β , k , q ) . Note that k here should not be confused with the notation indicating the number of iterations in the algorithms.

4.1. Superposition Coding Inner Bound

According to Theorem 3, the expression of the objective function F ( α , q ) of SCIB depends on the value of α . Without loss of generality, we can consider the objective function depicted in Equation (16). An equivalent condition for F ( α , q ) to be concave is provided in the following lemma.
Lemma 5.
Given a convex set S with a distribution q ( u , x ) , then F ( α , q ) as depicted in Equation (16) is concave in S if and only if, for all q 1 , q 2 S , we have
θ ¯ ( D U X + D X | Y U + α ¯ D U | Y + α D U | Z ) + ( θ ¯ θ ) D Y | U 0 ,
where D A | B denotes D A | B ( q 2 | | q 1 ) .
The following lemma shows that q n + 1 lies in the same connected superlevel set as that of q n . The proof (see Appendix C) is similar to that for Lemma 4 in [16].
Lemma 6.
In Algorithm 2, if q n ( u , x ) S F ( α , k ) , then q n + 1 T F ( α , k , q n ) .
Fixing α and letting q ( u , x ) be the maximizer, the following theorem states that the function values F ( α , q n , Q n ) converge. The proof (see Appendix C) is similar to that of Theorem 2.
Theorem 5.
If q , q 0 T F ( α , k , q ˜ ) for some k and q ˜ , and if F ( α , q ) is concave in T F ( α , k , q ˜ ) , then the sequence F ( α , q n , Q n ) generated by Algorithm 2 converges monotonically from below to F ( α , q ) .
The following corollary is implied by the proof of Theorem 5.
Corollary 1.
If q , q 0 T F ( α , k , q ˜ ) for some k and q ˜ , and if F ( α , q ) is concave in T F ( α , k , q ˜ ) , then
F ( α , q ) F ( α , q N , Q N ) θ ¯ N · D U X ( q | | q 0 ) .
The above analyses deal with F ( α , q n , Q n ) for a fixed α . When α m changes to α m + 1 , the estimation for the one-step change in the function value is presented in the following proposition (see proof in Appendix C).
Proposition 2.
Given α m , suppose that Algorithm 2 converges to the optimal variables q ˜ and Q ˜ such that q ˜ = q [ Q ˜ ] and Q ˜ = Q [ q ˜ ] . Letting α m + 1 be updated using Equation (22) and letting q 0 = q ˜ be the initial point for the next round, we have
F ( α m + 1 , q 1 , Q 1 ) F ( α m , q ˜ ) ( α m + 1 α m ) 2 τ m .

4.2. Marton’s Inner Bound

Next, we present the convergence results of the BA algorithm for MIB. The proofs are omitted, as they are similar to those for SCIB.
Lemma 7.
Given a convex set S with a distribution q ( u , v , w , x ) , M ( α , q ) as depicted in Equation (24) is concave in S if and only if, for all q 1 , q 2 S , we have
θ ¯ D U V W X + ( θ ¯ α θ ) D W | Z + α θ D W | Y + θ ¯ D V | W Z + θ D U | W Y + ( θ ¯ θ ) D U | V W + θ ¯ D X | U V W 0 ,
where D A | B denotes D A | B ( q 2 | | q 1 ) .
Lemma 8.
In Algorithm 3, if q n ( u , v , w , x ) S M ( α , k ) , then q n + 1 T M ( α , k , q n ) .
Fixing α and letting q ( u , v , w , x ) be the maximizer, the following theorem states that the function values M ( α , q n , Q n ) converge.
Theorem 6.
If q , q 0 T M ( α , k , q ˜ ) for some k and q ˜ and if M ( α , q ) is concave in T M ( α , k , q ˜ ) , then the sequence M ( α , q n , Q n ) generated by Algorithm 3 converges monotonically from below to M ( α , q ) .
The following corollary is implied by the proof of Theorem 6.
Corollary 2.
If q , q 0 T M ( α , k , q ˜ ) for some k and q ˜ , and if M ( α , q ) is concave in T M ( α , k , q ˜ ) , then
M ( α , q ) M ( α , q N , Q N ) θ ¯ N · D U V W X ( q | | q 0 ) .
The estimation for the one-step change in the function value for MIB is presented in the following proposition.
Proposition 3.
Given α m , suppose that Algorithm 3 converges to the optimal variables q ˜ and Q ˜ such that q ˜ = q [ Q ˜ ] and Q ˜ = Q [ q ˜ ] . Letting α m + 1 be updated using Equation (27) and letting q 0 = q ˜ be the initial point for the next round, we have
M ( α m + 1 , q 1 , Q 1 ) M ( α m , q ˜ ) ( α m + 1 α m ) 2 τ m .

4.3. UV Outer Bound

Now, we present the convergence results of of the BA algorithm for UVOB. The proofs are again omitted, as they are similar to those of SCIB.
Lemma 9.
Given a convex set S of distribution q ( u , v , x ) , G ( α , β , q ) as depicted in Equation (29) is concave in S if and only if, for all q 1 , q 2 S , we have
( θ ¯ α + θ β ) D X + θ ¯ α ( D V | X + D X | Y V + D V | Z ) + ( θ ¯ α θ β ¯ ) D Y | V + θ β ( D U | X + D X | Z U + D U | Y ) + ( θ β θ ¯ α ¯ ) D Z | U 0 ,
where D A | B denotes D A | B ( q 2 | | q 1 ) .
Lemma 10.
In Algorithm 4, if q n ( u , v , x ) S G ( α , β , k ) , then q n + 1 T G ( α , β , k , q n ) .
Fixing ( α , β ) and letting q ( u , v , x ) be the maximizer, the following theorem states that the function values G ( α , β , q n , Q n ) converge.
Theorem 7.
If q , q 0 T G ( α , β , k , q ˜ ) for some k and q ˜ , and if G ( α , β , q ) is concave in T G ( α , β , k , q ˜ ) , then the sequence G ( α , β , q n , Q n ) generated by Algorithm 4 converges monotonically from below to G ( α , β , q ) .
The following corollary is implied by the proof of Theorem 7.
Corollary 3.
If q , q 0 T G ( α , β , k , q ˜ ) for some k and q ˜ , and if G ( α , β , q ) is concave in T G ( α , β , k , q ˜ ) , then
G ( α , β , q ) G ( α , β , q N , Q N ) θ ¯ α + θ β N · D U V X ( q | | q 0 ) .
The estimation for the one-step change in the function value for UVOB is presented in the following proposition.
Proposition 4.
Given ( α m , β m ) , suppose that Algorithm 4 converges to the optimal variables q ˜ and Q ˜ such that q ˜ = q [ Q ˜ ] and Q ˜ = Q [ q ˜ ] . Letting ( α m + 1 , β m + 1 ) be updated using Equations (39) and (40) and letting q 0 = q ˜ be the initial point for the next round, we have
G ( α m + 1 , β m + 1 , q 1 , Q 1 ) G ( α m , β m , q ˜ ) ( α m + 1 α m ) 2 τ m ( β m + 1 β m ) 2 τ m .

5. Numerical Results

We take the binary skew-symmetric broadcast channel p ( y , z | x ) as the test channel. The conditional probability matrices are
P Y | X = 1 0 0.5 0.5 , P Z | X = 0.5 0.5 0 1 .
This is perhaps the simplest broadcast channel for which the capacity region is still unknown.
This broadcast channel plays a very important role in research on capacity bounds. It was first studied in [23] to show that the time-sharing random variable is useful for the Cover–van der Meulen inner bound [24,25]. Later, [26,27,28] demonstrated that the sum rate of the UVOB for this broadcast channel is strictly larger than that of the MIB, showing for the first time that at least one of these two bounds are suboptimal.
Our algorithms are important in at least the following sense: supposing that it is not known whether the MIB matches the UVOB (or the other two bounds for a new scenario) and we want to check this; we can perform an exhaustive search on channel matrices of size 2 × 2 (or of higher dimensions) to check whether they match. According to the results shown below in Section 5.5, this does not take very much time compared with generic algorithms.
In the following, we apply the algorithms to compute the value of the supporting hyperplane θ R Y + θ ¯ R Z , where θ = 0.4 . The initial values of α and β are α 0 = β 0 = 0.7 . This set of parameters is feasible for UVOB, as θ ¯ α + θ β = 0.7 > max { θ , θ ¯ } .
We demonstrate the algorithms in the following aspects: (1) the maximization part; (2) the minimization part; (3) the change from the maximum part to the minimization part; (4) the superlevel set; and (5) comparison with generic non-convex algorithms.

5.1. Maximization Part

In this part, we fix α and β to the initial values and let the BA algorithms iterate for N = 200 times. The results are presented in Figure 1. Because this is the maximization part, the function values increase as the iterations proceed. It is clear that the function values behave properly for fixed α and β .

5.2. Minimization Part

In this part, we start with the initial α 0 and β 0 , then let the algorithms iterate for K = 200 times. The results for ( α k , β k ) are presented in Figure 2. Because this is the minimization part, the function values decrease as the iterations proceed. It is clear that α k in SCIB and MIB gradually changes as k grows. For UVOB, it is necessary to ensure that θ ¯ α + θ β max { θ , θ ¯ } . When the updated ( α k + 1 , β k + 1 ) makes θ ¯ α + θ β fall below this value, it becomes necessary to scale it back. This happens approximately starting from k = 5 .

5.3. Change from Maximization to Minimization

In this part, we consider UVOB and let K in the algorithm be K = 100 . Figure 3 plots the following three values in Equation (43):
G ( α k , β k , q ˜ ) , L H S : = G ( α k + 1 , β k + 1 , q 1 , Q 1 ) , R H S : = G ( α k , β k , q ˜ ) ( α k + 1 α k ) 2 τ k ( β k + 1 β k ) 2 τ k .
As the algorithm iterates, the estimate in Equation (43) becomes more and more accurate, as exp { x } 1 + x and ln ( 1 + x ) x for small x.

5.4. Superlevel Set

To visualize the convergence of q n and its relation with the superlevel set, we take SCIB as an example and fix q ( u ) such that q ( x | u ) has two free variables. We reformulate the objective function of SCIB depicted in Equation (16) as follows:
F ˜ ( α , q ( x | u ) , Q ) = θ ¯ H ( U ) + θ ¯ ( H ( X | U ) H ( X | Y , U ) α ¯ H ( U | Y ) α H ( U | Z ) ) ( θ ¯ θ ) ( H ( Y | U ) H ( Y | X ) ) = θ ¯ H ( U ) + θ ¯ x , u q ( u ) q ( x | u ) ( d [ Q ] ( u , x ) ln q ( x | u ) ) ,
where
d [ Q ] ( u , x ) = y , z p ( y , z | x ) ln Q ( x | y , u ) Q ( u | y ) α ¯ Q ( u | z ) α + θ ¯ θ θ ¯ ln Q ( y | u ) p ( y | x ) .
In particular, we fix α 0 = 0.7 and P U = ( 0.3 , 0.7 ) , then use the algorithm to find the values of P ( X = 0 | U = 0 ) and P ( X = 0 | U = 1 ) . The results are shown in Figure 4. In this case, q n for large enough n lies in the concave part of the superlevel set, meaning that the algorithm converges. Here, it should be mentioned that it is possible that the algorithm may not converge to the optimal point for some initial q 0 s that do not lie in the concave part.

5.5. Comparison with Generic Non-Convex Algorithms

Here, we compare our algorithms with the following generic algorithms implemented using the “fmincon” MATLAB function: interior-point, active-set, and sequential quadratic programming (sqp). For simplicity, we only compare the sum rate of MIB, for which the optimal value is 0.2506717… nats (0.3616428… bits). The optimization problem for computing the sum rate is
max q ( u , v , w , x ) min { I ( W ; Y ) , I ( W ; Z ) } + I ( U ; Y | W ) + I ( V ; Z | W ) I ( U ; V | W ) .
According to Lemma 3, the cardinality size is | U | · | V | · | W | · | X | = | X | 3 ( | X | + 4 ) = 48 .
Notice that we do not carry out a comparison with the method in [16], as it cannot be applied to cases where there is a minimum. For scenarios in which [16] can be used, our algorithms degenerate to the method in [16].
The initial point of q ( u , v , w , x ) is randomly generated for all the algorithms. Table 2 lists the experimental results. For the first three algorithms, a randomly picked starting point usually does not provide a good enough result. Thus, we ran the first three algorithms multiple times until the best function value hit 0.2506 in order to test their effectiveness. It is clear from the table that only sqp can be considered comparable to our algorithms.
For further comparison with sqp, we randomly generated broadcast channels with cardinalities of | X | = 3 , 4, 5, 6, and | Y | = | Z | = | X | . The corresponding dimensions are | X | 3 ( | X | + 4 ) = 189 , 512, 1125, 2160. Because the optimal sum rate is not yet known, we ran sqp once to record the running time. The results in Table 3 suggest that our algorithms are highly scalable. This meets our expectation, as the updating formulae in Equations (5) and (7) are all explicit and can be computed rapidly.

6. Discussion and Conclusions

6.1. Initial Points of Algorithms

Taking MIB as an example, we next discuss how to choose the initial points. When there is no prior knowledge on the optimization problem, the initial point is usually generated randomly. In this paper, Theorem 6 and Lemma 7 provide some guidance on the choice of the initial point p 0 ( u , v , w , x ) . A possibly workable method is to randomly generate an initial point and slightly perturb it to check whether these two points satisfy the inequality in Lemma 7. If the answer is no, then it is possible that the objective function is not concave in the neighbourhood of this point, and we continue to generate new initial points.
For the initial point α 0 , because it lies in [ 0 , 1 ] it is affordable to perform a grid search, especially when | X | is small. For example, we can take 0.1 as the equal space and try each α 0 { 0 , 0.1 , 0.2 , , 1 } . This approach can to some extent help us avoid becoming stuck in local extreme points.

6.2. J Version Outer Bound

As mentioned earlier, the best general outer bound is the J version outer bound proposed in [6]. However, the evaluation of this outer bound turns out to be even harder, as there are additional constraints on the free variables and the auxiliary channel with the joint distribution
q ( x ) q ( u , v , w | x ) q ( u ˜ , v ˜ , w ˜ | x ) q ( u ^ , v ^ , w ^ | x ) p ( y , z | x ) T ( j | x , y , z ) .
These constraints are presented in Equations (18a)–(18c) and (19a)–(19c) in [6]. Taking Equations (18a) and (19a) as an example,
( 18 a ) : I ( W ˜ ; Z ) I ( W ˜ ; J ) + I ( W ^ ; J ) I ( W ^ ; Y ) = I ( W ; Z ) I ( W ; Y ) , ( 19 a ) : 0 I ( X ; Z | U ˜ , W ˜ ) I ( X ; J | U ˜ , W ˜ ) I ( V ˜ ; Z | W ˜ ) I ( V ˜ ; J | W ˜ ) .
Direct application of Equation (7) does not yield an updated q [ Q ] guaranteed to satisfy these constraints; thus, the design of BA algorithms for the J version outer bound should carefully address this kind of problem. We leave this for future research.
Finally, to conclude our paper, the extension of the BA algorithm to inner and outer bounds for general broadcast channels encounters max–min problems. We have shown that the max–min order can be changed to min–max. Based on this observation, we have designed BA algorithms for the maximization parts and gradient descent algorithms for the minimization parts, then performed convergence analysis and numerical experiments to support our analysis. We have compared our algorithms to the following generic non-convex algorithms: interior-point, active-set, and sequential quadratic programming. The results show that our algorithms are both effective and efficient.

Author Contributions

Methodology and analysis, Y.G.; software and visualization, Y.D. and Y.L.; validation, X.N., B.B. and W.H.; writing—original draft preparation, Y.G.; writing—review and editing, X.N., B.B. and W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Key R&D Program of China (Grant No. 2021YFA1000500) and in part by Huawei Tech. Co., Ltd.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Yanqing Liu is employed by Bank of China. Xueyan Niu, Bo Bai, Wei Han are employed by Huawei Tech. Co., Ltd. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Proofs of Results in Section 1

Proof of Theorem 1.
For the first property, Fox fixed q, the concavity is clear, as ln Q is concave in Q. The equality in Equation (6) is easy to show, as if we take Q [ q ] ( x | y ) to be q ( x | y ) then the function C ( q , Q [ q ] ) in Equation (2) reduces to C ( q ) in Equation (1). The maximum can be proved using the Kullback–Leibler divergence:
C ( q , Q [ q ] ) C ( q , Q ) = C ( q ) C ( q , Q ) = x , y q ( x ) p ( y | x ) ln q ( x | y ) Q ( x | y ) = x , y q ( y ) q ( x | y ) ln q ( x | y ) Q ( x | y ) = D X | Y ( q | | Q ) 0 .
For the second property, we can reformulate C ( q , Q ) as
C ( q , Q ) = H ( X ) x q ( x ) ( y p ( y | x ) ln Q ( x | y ) ) .
The concavity holds, as the first term H ( X ) is concave and the second term is linear. To find the maximum q [ Q ] , because there is a constraint x q ( x ) = 1 , we consider the derivative of the Lagrangian:
0 = q ( x ) L ( λ , q , Q ) = q ( x ) x q ( x ) ( d [ Q ] ( x ) ln q ( x ) ) λ ( 1 x q ( x ) ) = d [ Q ] ( x ) ln q ( x ) 1 λ .
This implies that q ( x ) = exp { d [ Q ] ( x ) 1 λ } for all x. The common term 1 + λ can be eliminated by normalization, as depicted in Equation (7). We can then verify the function value as follows:
C ( q [ Q ] , Q ) = x q [ Q ] ( x ) ( d [ Q ] ( x ) ln q [ Q ] ( x ) ) = ( a ) x q [ Q ] ( x ) ( ln q [ Q ] ( x ) + ln x exp { d [ Q ] ( x ) } ln q [ Q ] ( x ) ) = ln x exp { d [ Q ] ( x ) } ,
where ( a ) is due to Equation (7). The second equality is again per Equation (7). □
Proof of Theorem 2.
The basic idea is to show that the sum is bounded for decreasing and positive numbers C C ( q n , Q n ) .
The monotonicity C ( q n + 1 , Q n + 1 ) C ( q n , Q n ) is clear, as we are performing alternating maximization; thus,
C ( q n + 1 , Q n + 1 ) = C ( q [ Q n + 1 ] , Q n + 1 ) C ( q n , Q n + 1 ) = C ( q n , Q [ q n ] ) C ( q n , Q n ) .
The positiveness of C C ( q n , Q n ) is because C C ( q n ) C ( q n , Q n ) .
Let q achieve the maximum of C ( q ) ; then,
C C ( q n , Q n ) x q ( x ) ln q n ( x ) q n 1 ( x ) = C ( q , Q [ q ] ) C ( q n , Q n ) x q ( x ) ln q n ( x ) q n 1 ( x ) = ( a ) x q ( x ) ( d [ Q [ q ] ] ( x ) ln q ( x ) d [ Q n ] ( x ) + ln q n ( x ) ln q n ( x ) q n 1 ( x ) ) = x q ( x ) ( d [ Q [ q ] ] ( x ) ln q ( x ) d [ Q [ q n 1 ] ] ( x ) + ln q n 1 ( x ) ) = ( b ) x p ( x ) y p ( y | x ) ( ln p ( y | x ) q ( y ) ln p ( y | x ) q n 1 ( y ) ) . = x , y q ( x ) p ( y | x ) ln q n 1 ( y ) q ( y ) = D Y ( q | | q n 1 ) 0 ,
where ( a ) is due to Equation (8) and ( b ) holds according to Equation (9). Now, we can bound the sum as follows:
n = 1 N ( C C ( q n , Q n ) ) n = 1 N x q ( x ) ln q n ( x ) q n 1 ( x ) = x q ( x ) ln q N ( x ) q 0 ( x ) = x q ( x ) ln q N ( x ) q ( x ) q 0 ( x ) q ( x ) = D X ( q | | q 0 ) D X ( q | | q N ) D X ( q | | q 0 ) .
The last term is finite, as q 0 > 0 . This implies C ( q n , Q n ) C . □

Appendix B. Proofs of Results in Section 3

Proof of Lemma 1.
The equivalence B = C is already stated without proof in Chapter 5.3 and Chapter 5.6 of [29]. We present the proof here for completeness.
It is clear that A B C . Thus, we only need to prove C A . It suffices to show that the corner points of C ( U , X ) lie inside A .
Because C ( U , X ) is a trapezoid with one corner point ( 0 , 0 ) , there are at most three nontrivial corner points, as follows:
  • Lower right ( r 1 , 0 ) , where r 1 = min { I ( X ; Y ) , I ( X ; Y | U ) + I ( U ; Z ) } . Because r 1 I ( X ; Y ) , we have ( r 1 , 0 ) A ( , X ) .
  • Upper left ( 0 , r 2 ) , where r 2 = min { I ( U ; Z ) , I ( X ; Y ) } . Clearly, ( 0 , r 2 ) A ( X , X ) .
  • Upper right ( min { I ( X ; Y | U ) , I ( X ; Y ) I ( U ; Z ) } , I ( U ; Z ) ) ; this corner point exists when I ( U ; Z ) I ( X ; Y ) . We can consider two cases:
    (a)
    I ( U ; Y ) I ( U ; Z ) : the corner point ( I ( X ; Y | U ) , I ( U ; Z ) ) is inside A ( U , X ) .
    (b)
    I ( U ; Y ) I ( U ; Z ) : the corner point is ( I ( X ; Y ) I ( U ; Z ) , I ( U ; Z ) ) . It suffices to consider I ( U ; Y ) I ( U ; Z ) I ( X ; Y ) , as otherwise this point does not exist. Now, let α be such that ( 1 α ) I ( X ; Y ) + α I ( U ; Y ) = I ( U ; Z ) . We can construct a pair ( U ^ , X ^ ) such that this corner point is inside A ( U ^ , X ^ ) . Considering the random variables Q B e r n o u l l i ( α ) , U ˜ = X when Q = 0 and U ˜ = U when Q = 1 , we can let U ^ = ( U ˜ , Q ) , X ^ = X ; then, I ( X ^ ; Y | U ^ ) = 0 + α I ( X ; Y | U ) = I ( X ; Y ) I ( U ; Z ) . Now, we have I ( U ^ ; Z ) I ( U ˜ ; Z | Q ) I ( U ; Z ) and I ( U ^ ; Y ) I ( U ˜ ; Y | Q ) = I ( U ; Z ) ; hence, min { I ( U ^ ; Z ) , I ( U ^ ; Y ) } I ( U ; Z ) .
  • The proof is finished. □
Proof of Theorem 3.
We can use A ( U , X ) in Lemma 1 to compute the supporting hyperplanes.
When θ [ 1 2 , 1 ] , then θ R Y + θ ¯ R Z is bounded from above by
max q ( u , x ) θ I ( X ; Y | U ) + θ ¯ I ( U ; Y ) = max q ( u , x ) θ ( I ( X ; Y ) I ( U ; Y ) ) + θ ¯ I ( U ; Y ) = max q ( u , x ) θ I ( X ; Y ) + ( 1 2 θ ) I ( U ; Y ) max q ( x ) θ I ( X ; Y ) .
On the other hand, this upper bound is achieved by setting U = in A ( U , X ) .
When θ [ 0 , 1 2 ) , we first have
F = max p ( u , x ) min α [ 0 , 1 ] θ I ( X ; Y | U ) + θ ¯ α I ( U ; Z ) + θ ¯ α ¯ I ( U ; Y ) .
To show that the maximum and minimum can be exchanged, we use Lemma 2; in particular, letting d = 2 , λ 1 = α , and λ 2 = α ¯ , we have
T 1 ( q ( u , x ) ) = θ I ( X ; Y | U ) + θ ¯ I ( U ; Z ) , T 2 ( q ( u , x ) ) = θ I ( X ; Y | U ) + θ ¯ I ( U ; Y ) .
Then, the objective function F ( α , q ) equals λ 1 T 1 + λ 2 T 2 . It remains to prove that T is convex. Assuming that ( a 1 , a 2 ) T for some q a ( u , x ) and that ( b 1 , b 2 ) T for some q b ( u , x ) while letting U ˜ = ( U , Q ) , where Q B e r n o u l l i ( β ) and ( U , X ) q a ( u , x ) if Q = 1 and ( U , X ) q b ( u , x ) if Q = 0 , we then have
T 1 ( β q a + β ¯ q b ) = T 1 ( q ( u ˜ , x ) ) = θ I ( X ; Y | U , Q ) + θ ¯ I ( U , Q ; Z ) θ I ( X ; Y | U , Q ) + θ ¯ I ( U ; Z | Q ) = β T 1 ( q a ) + β ¯ T 1 ( q b ) .
Similar inequalities hold for T 2 . This proves the convexity, and hence the exchange.
Now, we can show that the expression of F can be simplified for θ [ 0 , 1 2 ) and α 1 2 θ 1 θ . Noting that I ( U ; Y ) = I ( X ; Y ) I ( X ; Y | U ) and I ( U ; Z ) = I ( X ; Z ) I ( X ; Z | U ) , we have
θ I ( X ; Y | U ) + θ ¯ α ¯ I ( U ; Y ) + θ ¯ α I ( U ; Z ) = θ ¯ α ¯ I ( X ; Y ) + θ ¯ α I ( X ; Z ) + ( θ θ ¯ α ¯ ) I ( X ; Y | U ) θ ¯ α I ( X ; Z | U ) .
When α 1 2 θ 1 θ , i.e., θ θ ¯ α ¯ 0 , the last two terms above have a non-positive sum. The maximum value equals zero, and can be achieved by taking U = X . This finishes the proof of the expression.
The cardinality can be proved as follows:
θ I ( X ; Y | U ) + θ ¯ α ¯ I ( U ; Y ) + θ ¯ α I ( U ; Z ) = θ ¯ α ¯ I ( X ; Y ) + θ ¯ α I ( X ; Z ) + ( θ θ ¯ α ¯ ) I ( X ; Y | U ) θ ¯ α I ( X ; Z | U ) = f ( q ( x ) ) + u q ( u ) g ( q ( x | u ) ) ,
where f and g are some continuous functions corresponding to the mutual information. Subject to fixed marginal q ( x ) , the maximum of u q ( u ) g ( q ( x | u ) ) over all feasible q ( u ) and q ( x | u ) is the upper concave envelope of the function g evaluated at q ( x ) . Notice that as the degree of freedom of the distribution q ( x ) is | X | 1 , it suffices to consider | U | X | 1 + 1 = | X | for evaluating the envelope. □
Proof of Theorem 4.
For fixed q ( u , v , x ) , in the pentagon of the UVOB there are at most two corner points in the first quadrant, namely, the upper left and lower right ones. The line connecting these two points has slope 1 . We need to compute the the supporting hyperplane value G = max ( R Y , R Z ) U V O B θ R Y + θ ¯ R Z , where θ ( 0 , 1 ) .
For the case θ ( 0 , 1 2 ] , note that the slope of the line θ R Y + θ ¯ R Z = G is θ / θ ¯ 1 ; thus, it suffices to consider the upper left corner point. The expression of this point is different when q ( u , v , x ) falls into one of the following two sets:
S 1 = { q ( u , v , x ) : I ( V ; Z ) I ( U ; Y ) + I ( X ; Z | U ) } , S 2 = { q ( u , v , x ) : I ( V ; Z ) I ( U ; Y ) + I ( X ; Z | U ) } .
When q S 1 , this corner point and the corresponding expression of l 1 ( q ) : = θ R Y + θ ¯ R Z are
min { I ( U ; Y ) , I ( U ; Y ) + I ( X ; Z | U ) I ( V ; Z ) , I ( X ; Y | V ) } , I ( V ; Z ) , l 1 ( q ) = θ · min { I ( U ; Y ) , I ( U ; Y ) + I ( X ; Z | U ) I ( V ; Z ) , I ( X ; Y | V ) } + θ ¯ I ( V ; Z ) .
Otherwise, when q S 2 , this corner point and corresponding expression of l 2 ( q ) : = θ R Y + θ ¯ R Z are
0 , I ( U ; Y ) + I ( X ; Z | U ) , l 2 ( q ) = θ ¯ I ( U ; Y ) + I ( X ; Z | U ) .
Now, the supporting hyperplane value is
G = max { max q S 1 l 1 ( q ) , max q S 2 l 2 ( q ) } .
We want to show that
max q S 2 l 2 ( q ) = max q S 2 l 1 ( q ) .
For the left hand-side term, we have
max q S 2 l 2 ( q ) = max q S 2 θ ¯ I ( U ; Y ) + I ( X ; Z | U ) max q S 2 θ ¯ I ( V ; Z ) max q S 2 θ ¯ I ( X ; Z ) ,
where the equalities hold with U = ,   V = X and this choice is in S 2 . For the other term,
max q S 2 l 1 ( q ) = max q S 2 θ ( I ( U ; Y ) + I ( X ; Z | U ) I ( V ; Z ) ) + θ ¯ I ( V ; Z ) max q S 2 θ ¯ I ( V ; Z ) max q S 2 θ ¯ I ( X ; Z ) ,
where the equalities hold using the same settings as above. Hence, we have the supporting hyperplane value
G = max { max q S 1 l 1 ( q ) , max q S 2 l 1 ( q ) } = max q l 1 ( q ) .
We can simplify this expression as follows:
max q l 1 ( q ) = max q min a , b [ 0 , 1 ] θ ( 1 a ) I ( U ; Y ) + θ a ( 1 b ) I ( U ; Y ) + I ( X ; Z | U ) I ( V ; Z ) + θ a b I ( X ; Y | V ) + θ ¯ I ( V ; Z ) = max q min a , b [ 0 , 1 ] θ ( 1 a b ) I ( U ; Y ) + θ a ( 1 b ) I ( X ; Z | U ) + ( θ ¯ θ a ( 1 b ) ) I ( V ; Z ) + θ a b I ( X ; Y | V ) .
Letting α = 1 θ a ( 1 b ) / θ ¯ , β = 1 a b , we have α , β [ 0 , 1 ] , θ ¯ α + θ β = θ ¯ + θ ( 1 a ) , and the supporting hyperplane value is
θ R Y + θ ¯ R Z = max q min α , β [ 0 , 1 ] θ ¯ α I ( V ; Z ) + θ ¯ α ¯ I ( X ; Z | U ) + θ β I ( U ; Y ) + θ β ¯ I ( X ; Y | V ) .
Notice that the range of θ ¯ α + θ β is [ max { θ , θ ¯ } , 1 ] . Within this range, the reverse mapping from ( α , β ) to ( a , b ) is
a = 1 ( θ ¯ α + θ β ) θ , b = θ β ¯ θ β ¯ + θ ¯ α ¯ .
Notice that when θ [ 1 2 , 1 ) , we have θ ¯ ( 0 , 1 2 ] ; we can use similar reasoning as above (by swapping Y and Z, R Y and R Z , U and V, θ and θ ¯ , and finally α and β ) to obtain
θ ¯ R Z + θ R Y = max q min α , β [ 0 , 1 ] θ β I ( U ; Y ) + θ β ¯ I ( X ; Y | V ) + θ ¯ α I ( V ; Z ) + θ ¯ α ¯ I ( X ; Z | U ) .
The constraint then becomes θ β + θ ¯ α = θ + θ ¯ ( 1 a ) , for which the range is again [ max { θ ¯ , θ } , 1 ] . Thus, the expression and the constraints are the same as for the case where θ ( 0 , 1 2 ] . Putting these two cases together, we have the characterization of the supporting hyperplanes.
To exchange the max–min, we again use Lemma 2. The proof is similar to that of Theorem 3, and as such we omit the details, providing only the setting for θ ( 0 , 1 2 ] , where G = max q l 1 ( q ) : d = 3 and the functions are
T 1 ( q ) = θ I ( U ; Y ) + θ ¯ I ( V ; Z ) , T 2 ( q ) = θ ( I ( U ; Y ) + I ( X ; Z | U ) I ( V ; Z ) ) + θ ¯ I ( V ; Z ) , T 3 ( q ) = θ I ( X ; Y | V ) + θ ¯ I ( V ; Z ) .
The proof of the cardinality bounds is similar to that of Theorem 3. □

Appendix C. Proofs of Results in Section 4

Proof of Lemma 5.
Let A represent a generic random variable. The gradient of H ( A ) can be calculated as follows:
H ( A ) q ( u , x ) = q ( u , x ) a q ( a ) ln q ( a ) = a q ( a ) q ( u , x ) ln q ( a ) a q ( a ) 1 q ( a ) q ( a ) q ( u , x ) = a q ( a ) q ( u , x ) ( 1 + ln q ( a ) ) .
For the particular term H ( U | Y ) in F ( α , q ) , the gradient is
H ( U , Y ) H ( Y ) q ( u , x ) = u , y q ( u , y ) q ( u , x ) ( 1 + ln q ( u , y ) ) + y q ( y ) q ( u , x ) ( 1 + ln q ( y ) ) = y p ( y | x ) ( 1 + ln q ( u , y ) ) + y p ( y | x ) ( 1 + ln q ( y ) ) = y p ( y | x ) ln q ( u | y ) .
Now, we can calculate the first-order term in Equation (41):
( q 2 q 1 ) T H q 1 ( U | Y ) = u , x , y ( q 2 ( u , x ) q 1 ( u , x ) ) p ( y | x ) ln q 1 ( u | y ) = u , y q 2 ( u , y ) ln q 1 ( u | y ) + H q 1 ( U | Y ) .
Finally, the left-hand side of Equation (41) equals
H q 2 ( U | Y ) H q 1 ( U | Y ) ( q 2 q 1 ) T H q 1 ( U | Y ) = H q 2 ( U | Y ) + u , y q 2 ( u , y ) ln q 1 ( u | y ) = D U | Y ( q 2 | | q 1 ) .
For the other terms in Equation (16), we can perform similar calculations to obtain the desired inequality. □
Proof of Lemma 6.
Consider the superlevel set S ˜ F ( α , k ) of the function F ( α , q , Q [ q n ] ) . According to Equation (6), F ( α , q , Q [ q n ] ) F ( α , q ) for all q; thus, S ˜ F ( α , k ) S F ( α , k ) . From Equation (6), F ( α , q n , Q [ q n ] ) = F ( α , q n ) , we have q n S ˜ F ( α , k ) ; further, because F ( α , q , Q [ q n ] ) is concave in q, S ˜ F ( α , k ) is a convex set, and as such is connected. This implies that S ˜ F ( α , k ) T F ( α , k , q n ) . Because q n + 1 makes the function value F ( α , q , Q [ q n ] ) larger than that of q n , it must lie in S ˜ F ( α , k ) , and as such in T F ( α , k , q n ) . □
Proof of Theorem 5.
The proof is similar to that of Theorem 2. We perform the following manipulations:
F ( α , q ) F ( α , q n , Q n ) θ ¯ u , x q ( u , x ) ln q n ( u , x ) q n 1 ( u , x ) = F ( α , q , Q [ q ] ) F ( α , q n , Q n ) θ ¯ u , x q ( u , x ) ln q n ( u , x ) q n 1 ( u , x ) = ( a ) θ ¯ u , x q ( u , x ) d [ Q [ q ] ] ln q d [ Q n ] + ln q n ln q n q n 1
= θ ¯ u , x q ( u , x ) d [ Q [ q ] ] ln q d [ Q [ q n 1 ] ] + ln q n 1 = ( b ) θ ¯ ( D U X ( q | | q n 1 ) + D X | Y U + α ¯ D U | Y + α D U | Z ) + ( θ ¯ θ ) D Y | U ( c ) 0 ,
where ( a ) holds from Equation (8), ( b ) is due to
u , x q ( d [ Q [ q ] ] d [ Q [ q n 1 ] ] ) = u , x , y , z q ( u , x ) p ( y , z | x ) ( ln q ( x | y , u ) q n 1 ( x | y , u ) ( q ( u | y ) q n 1 ( u | y ) ) α ¯ ( q ( u | z ) q n 1 ( u | z ) ) α + θ ¯ θ θ ¯ ln q ( y | u ) q n 1 ( y | u ) ) = D X | Y U ( q | | q n 1 ) + α ¯ D U | Y + α D U | Z + θ ¯ θ θ ¯ D Y | U ,
and ( c ) is from Lemma 5. This implies that
n = 1 N F ( α , q ) F ( α , q n , Q n ) θ ¯ u , x q ( u , x ) ln q N ( u , x ) q 0 ( u , x ) = θ ¯ u , x q ( u , x ) ln q N ( u , x ) q ( u , x ) q 0 ( u , x ) q ( u , x ) = θ ¯ ( D U X ( q | | q 0 ) D U X ( q | | q N ) ) θ ¯ D U X ( q | | q 0 ) .
The last term is finite and positive, as q 0 > 0 . Finally, a sequence of positive terms has a finite sum, which implies that the terms converge to zero, i.e., F ( α , q n , Q n ) F ( α , q ) . □
Proof of Proposition 2.
Let C m and Δ d ( u , x ) be
C m = u , x exp { d m [ Q ˜ ] ( u , x ) } , Δ d ( u , x ) = d m + 1 [ Q ˜ ] d m [ Q ˜ ] ,
where d k [ Q ] is as in Equation (18) with α = α k . According to Equations (7) and (8),
q 0 = q ˜ = exp { d m [ Q ˜ ] } exp { d m [ Q ˜ ] } = exp { d m [ Q ˜ ] } C m , F ( α m , q ˜ ) = θ ¯ ln C m .
Noting that Q 1 = Q [ q 0 ] = Q [ q ˜ ] = Q ˜ , we estimate the difference as follows:
F ( α m + 1 , q 1 , Q 1 ) F ( α m , q ˜ ) = ( a ) θ ¯ ln exp { d m + 1 [ Q ˜ ] } θ ¯ ln C m = θ ¯ ln exp { d m [ Q ˜ ] } exp { Δ d } θ ¯ ln C m = θ ¯ ln C m q ˜ exp { Δ d } θ ¯ ln C m = θ ¯ ln q ˜ exp { Δ d } θ ¯ ln q ˜ ( 1 + Δ d ) = θ ¯ ln ( 1 + q ˜ Δ d ) θ ¯ q ˜ Δ d ,
where ( a ) holds from Equation (8) and from the fact that Q 1 = Q ˜ .
According to the definition of d [ Q ] in Equation (19),
Δ d = ( α m + 1 α m ) y , z p ( y , z | x ) ln Q ˜ ( u | z ) Q ˜ ( u | y ) .
The expectation of this difference is
u , x q ˜ Δ d = ( α m + 1 α m ) ( I ( U ; Z ) I ( U ; Y ) ) = ( b ) ( α m + 1 α m ) ( α m + 1 α m ) 1 τ m θ ¯ = ( α m + 1 α m ) 2 τ m θ ¯ ,
where ( b ) is due to Equation (22). The proof is finished. □

References

  1. Cover, T. Broadcast Channels. IEEE Trans. Inform. Theory 1972, 18, 2–14. [Google Scholar] [CrossRef]
  2. Bergmans, P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inform. Theory 1973, 19, 197–207. [Google Scholar] [CrossRef]
  3. Körner, J.; Marton, K. Comparison of two noisy channels. In Topics in Information Theory; Csiszár, I., Elias, P., Eds.; North-Holland: Amsterdam, The Netherlands, 1977; pp. 411–423. [Google Scholar]
  4. Marton, K. A coding theorem for the discrete memoryless broadcast channel. IEEE Trans. Inform. Theory 1979, 25, 306–311. [Google Scholar] [CrossRef]
  5. Nair, C.; El Gamal, A. An outer bound to the capacity region of the broadcast channel. IEEE Trans. Inform. Theory 2007, 53, 350–355. [Google Scholar] [CrossRef]
  6. Gohari, A.; Nair, C. Outer bounds for multiuser settings: The auxiliary receiver approach. IEEE Trans. Inform. Theory 2022, 68, 701–736. [Google Scholar] [CrossRef]
  7. Liu, Y.; Geng, Y. Blahut-Arimoto Algorithms for Computing Capacity Bounds of Broadcast Channels. In Proceedings of the IEEE International Symposium on Information Theory, Espoo, Finland, 26 June–1 July 2022; pp. 1145–1150. [Google Scholar]
  8. Liu, Y.; Dou, Y.; Geng, Y. Blahut-Arimoto Algorithm for Marton’s Inner Bound. In Proceedings of the IEEE International Symposium on Information Theory, Taipei, Taiwan, 25–30 June 2023; pp. 2159–2164. [Google Scholar]
  9. Calvo, E.; Palomar, D.P.; Fonollosa, J.R.; Vidal, J. The computation of the capacity region of the discrete degraded BC is a nonconvex DC problem. In Proceedings of the IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; pp. 1721–1725. [Google Scholar]
  10. Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  11. Wilson, R.B. A simplicial Algorithm for Concave Programming. Ph.D. Thesis, Graduate School of Business Administration, Harvard University, Cambridge, MA, USA, 1963. [Google Scholar]
  12. Blahut, R. Computation of channel capacity and rate distortion functions. IEEE Trans. Inform. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef]
  13. Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inform. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef]
  14. Rezaeian, M.; Grant, A. Computation of Total Capacity for Discrete Memoryless Multiple-Access Channels. IEEE Trans. Inform. Theory 2004, 50, 2779–2784. [Google Scholar] [CrossRef]
  15. Calvo, E.; Palomar, D.P.; Fonollosa, J.R.; Vidal, J. On the Computation of the Capacity Region of the Discrete MAC. IEEE Trans. Commun. 2010, 58, 3512–3525. [Google Scholar] [CrossRef]
  16. Yasui, K.; Matsushima, T. Toward Computing the Capacity Region of Degraded Broadcast Channel. In Proceedings of the IEEE International Symposium on Information Theory, Austin, TX, USA, 13–18 June 2010; pp. 570–574. [Google Scholar]
  17. Dupuis, F.; Yu, W.; Willems, F.M.J. Blahut-Arimoto algorithms for computing channel capacity and rate-distortion with side information. In Proceedings of the IEEE International Symposium on Information Theory, Chicago, IL, USA, 27 June–2 July 2004; p. 179. Available online: https://www.comm.utoronto.ca/~weiyu/ab_isit04.pdf (accessed on 8 January 2024).
  18. Gohari, A.; El Gamal, A.; Anantharam, V. On Marton’s Inner Bound for the General Broadcast Channel. IEEE Trans. Inform. Theory 2014, 60, 3748–3762. [Google Scholar] [CrossRef]
  19. Geng, Y.; Gohari, A.; Nair, C.; Yu, Y. On Marton’s Inner Bound and Its Optimality for Classes of Product Broadcast Channels. IEEE Trans. Inform. Theory 2014, 60, 22–41. [Google Scholar] [CrossRef]
  20. Anantharam, V.; Gohari, A.; Nair, C. On the Evaluation of Marton’s Inner Bound for Two-Receiver Broadcast Channels. IEEE Trans. Inform. Theory 2019, 65, 1361–1371. [Google Scholar] [CrossRef]
  21. Geng, Y. Single-Letterization of Supporting Hyperplanes to Outer Bounds for Broadcast Channels. In Proceedings of the IEEE/CIC International Conference on Communications in China, Changchun, China, 11–13 August 2019; pp. 70–74. [Google Scholar]
  22. Gowtham, K.R.; Thangaraj, A. Computation of secrecy capacity for more-capable channel pairs. In Proceedings of the IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; pp. 529–533. [Google Scholar]
  23. Hajek, B.E.; Pursley, M.B. Evaluation of an achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory 1979, 25, 36–46. [Google Scholar] [CrossRef]
  24. Cover, T.M. An achievable rate region for the broadcast channel. IEEE Trans. Inform. Theory 1975, 21, 399–404. [Google Scholar] [CrossRef]
  25. van der Meulen, E.C. Random coding theorems for the general discrete memoryless broadcast channel. IEEE Trans. Inform. Theory 1975, 21, 180–190. [Google Scholar] [CrossRef]
  26. Nair, C.; Wang, Z.V. On the inner and outer bounds for 2-receiver discrete memoryless broadcast channels. In Proceedings of the Information Theory and Applications Workshop, San Diego, CA, USA, 27 January–1 February 2008; pp. 226–229. [Google Scholar]
  27. Gohari, A.; Anantharam, V. Evaluation of Marton’s inner bound for the general broadcast channel. In Proceedings of the IEEE International Symposium on Information Theory, Seoul, Repubic of Korea, 28 June–3 July 2009; pp. 2462–2466. [Google Scholar]
  28. Jog, V.; Nair, C. An information inequality for the BSSC channel. In Proceedings of the Information Theory and Applications Workshop, La Jolla, CA, USA, 31 January–5 February 2010; pp. 1–8. [Google Scholar]
  29. El Gamal, A.; Kim, Y.-H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Figure 1. The maximization parts in the algorithms for BSSC with fixed values α 0 = β 0 = 0.7 : (a) the objective function values and (b) P ( X = 0 ) .
Figure 1. The maximization parts in the algorithms for BSSC with fixed values α 0 = β 0 = 0.7 : (a) the objective function values and (b) P ( X = 0 ) .
Entropy 26 00178 g001
Figure 2. The minimization parts in the algorithms for BSSC with initial values α 0 = β 0 = 0.7 : (a) the objective function values and (b) ( α k , β k ) .
Figure 2. The minimization parts in the algorithms for BSSC with initial values α 0 = β 0 = 0.7 : (a) the objective function values and (b) ( α k , β k ) .
Entropy 26 00178 g002
Figure 3. Function values of UVOB for BSSC with initial values α 0 = β 0 = 0.7 .
Figure 3. Function values of UVOB for BSSC with initial values α 0 = β 0 = 0.7 .
Entropy 26 00178 g003
Figure 4. Function values of SCIB for BSSC with the initial value α 0 = 0.7 and fixed probability vector P U = ( 0.3 , 0.7 ) : (a) 3D view and (b) contour view.
Figure 4. Function values of SCIB for BSSC with the initial value α 0 = 0.7 and fixed probability vector P U = ( 0.3 , 0.7 ) : (a) 3D view and (b) contour view.
Entropy 26 00178 g004
Table 1. Comparison of typical scenarios related to the BA algorithm.
Table 1. Comparison of typical scenarios related to the BA algorithm.
ChannelReferenceObjectiveFormAlgorithm
point-to-point [12,13]capacitymaxBA
multiple access [14]sum-ratemaxBA
multiple access [15]inner/outer boundsmaxrelaxation
degraded broadcast [16]capacity regionmaxBA
general broadcastthis paperinner/outer boundsmax–minBA + gradient
Table 2. Comparison with generic non-convex algorithms on BSSC.
Table 2. Comparison with generic non-convex algorithms on BSSC.
MethodTime (Seconds)Sum-Rate of MIB (Nats)
interior-point513.820.25060…
active-set2438.570.25061…
sqp1.76210.25067…
this paper0.06290.25067…
Table 3. Comparison with sqp on random channels with alphabet sizes | X | = 3 , 4 , 5 , 6 .
Table 3. Comparison with sqp on random channels with alphabet sizes | X | = 3 , 4 , 5 , 6 .
MethodTime (Seconds)Sum-Rate of MIB (Nats)
| X | = 3 | X | = 4 | X | = 5 | X | = 6 | X | = 3 | X | = 4 | X | = 5 | X | = 6
sqp2.634222.44168.121065.510.18400.23480.19830.2351
this paper0.07710.10310.14500.20860.18630.23750.20190.2423
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dou, Y.; Liu, Y.; Niu, X.; Bai, B.; Han, W.; Geng, Y. Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels. Entropy 2024, 26, 178. https://doi.org/10.3390/e26030178

AMA Style

Dou Y, Liu Y, Niu X, Bai B, Han W, Geng Y. Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels. Entropy. 2024; 26(3):178. https://doi.org/10.3390/e26030178

Chicago/Turabian Style

Dou, Yanan, Yanqing Liu, Xueyan Niu, Bo Bai, Wei Han, and Yanlin Geng. 2024. "Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels" Entropy 26, no. 3: 178. https://doi.org/10.3390/e26030178

APA Style

Dou, Y., Liu, Y., Niu, X., Bai, B., Han, W., & Geng, Y. (2024). Blahut–Arimoto Algorithms for Inner and Outer Bounds on Capacity Regions of Broadcast Channels. Entropy, 26(3), 178. https://doi.org/10.3390/e26030178

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop