Computational Techniques for Investigating Information Theoretic Limits of Information Systems
Abstract
:1. Introduction
2. Literature Review
3. Information Inequalities and Entropic LP
3.1. Information Inequalities
3.2. The Entropic LP Formulation
4. Symmetry and Dependence Relations
4.1. Two Examples
- The regenerating code problem [34,35] is depicted in Figure 1. It considers the situation that a message is stored in a distributed manner in n nodes, each having capacity (Figure 1a). Two coding requirements need to be satisfied: 1) the message can be recovered from any k nodes (Figure 1b), and any single node can be repaired by downloading amount of information from any d of the other nodes (Figure 1c). The fundamental limit of interest is the optimal tradeoff between the storage cost and the download cost . We will use the case as our working example. In this setting, the stored contents are , and the repair message sent from node i to repair j is denoted as . In this case, the set of the random variables in the problem areSome readers may notice that we do not include a random variable to represent the original message stored in the system. This is because it can be equivalently viewed as the collection of and can, thus, be omitted in this formulation.
- The coded caching problem [36] considers the situation that a server, which holds a total N mutually independent files of unit size each, serves a set of K users, each with a local cache of size M. The users can prefetch some content (Figure 2a), but when they reveal their requests (Figure 2b), the server must calculate and multicast a common message of size R (Figure 2c). The requests are not revealed to the server beforehand, and the prefetching must be designed to handle all cases. The fundamental limit of interest is the optimal tradeoff between the cache capacity M and the transmission size R. In this setting, the messages are denoted as , the prefetched contents as , and the transmission when the users requests is written as . We will use the case as our second running example in the sequel, and, in this case, the random variables in the problem are
4.2. The Dependency Relation
- The regenerating code problem: the relations are the following:The first equality implies that:Other dependence relations can be converted similarly. This dependence structure can also be represented as a graph, as shown in Figure 3. In this graph, a given node (random variable) is a function of others random variables with an incoming edge.
- The caching problem: the relations are the following:
4.3. The Symmetry Relation
- The regenerating code problem: exchanging the coding functions for different storage nodes. For example, if we simply let node 2 store the content for node 1, and also exchange other coding functions, the result is another code that can fulfill the same task as before this exchange. Mathematically, we can represent the symmetry relation using permutations of all the random variables, where each row indicates a permutation as follows:Note that, when we permute the storage contents , the corresponding repair information needs to be permuted accordingly. The 24 permutations clearly form a permutation group. With this representation, we can take any subset of the columns, and the collections of the random variables in each row in these columns will have the same entropy in the corresponding symmetric code. For example, if we take columns , then we have that
- Similarly, in the caching problem, there are two types of symmetry relations. The first is to exchange the coding functions for each users, and the second is to exchange the operation on different files. Intuitively, the first one is due to a permutation of the users, and the second due to the permutation of the files. As a consequence, we have the following permutations that form a group:
- For the purpose of deriving outer bounds, it is valid to ignore the symmetry relation altogether, or consider only part of the symmetry relation, as long as the remaining permutations still form a group. For example, in the caching problem, if we only consider the symmetry induced by exchanging the two messages, then we have the first 2 rows instead of the full 12 rows of permutations. Omitting some permutations means less reduction in the LP scale but does not invalidate the computed bounds.
- Admittedly, representing the symmetry relation using the above permutation representation is not the most concise approach, and there exists mathematically precise and concise language to specify such structure. We choose this permutation approach because of its simplicity and universality and, perhaps more importantly, due to its suitability for software implementation.
5. Reducing the Problem Algorithmically via the Disjoint-Set Data Structure
5.1. Equivalence Relation and Reduction
5.2. Difficulty in Identifying the Reduction
5.3. Disjoint-Set Data Structure and Algorithmic Reduction
- Symmetry step: For each singleton set (which corresponds to a subset ) in the disjoint-set structure, consider each permutation in the symmetry relation: if the permutation maps into another element (which corresponds to another subset of random variables ) not already in the same set in the disjoint-set structure, then we combine the two sets by forming their union.
- Dependence step: For each existing set in the disjoint-set structure, consider each dependence relation: if the set leader (which corresponds to a subset ) is equivalent to another element due to the given dependence (which corresponds to another subset of random variables ) not already in the same set, then we combine the two sets by forming their union.
6. Four Investigative Techniques
6.1. Bounding Plane Optimization and Queries
- If the simple sum of the storage cost and repair cost , i.e., , needs to be lower-bounded in the regenerating code problem, we can let the objective function be given as
- If we wish to lower bound the simple sum of memory and rate in the coded caching problem, the situation is somewhat subtle. Note that the rate R is a lower bound on the entropy and ; however, the symmetry relation does not imply that . For this case, we can introduce an additional LP variable R and add the constraints thatWe then set the objective function to be
6.2. Tradeoff and Convex Hull Computation
6.3. Duality and Computer-generated Proof
6.4. Sensitivity Analysis
7. JSON Problem Descriptions
7.1. Keys in PD JSON
7.2. An Example Problem Description File
7.3. Example Computation Result
8. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Yeung, R.W. A framework for linear information inequalities. IEEE Trans. Inf. Theory 1997, 43, 1924–1934. [Google Scholar] [CrossRef] [Green Version]
- Tian, C. Characterizing the rate region of the (4, 3, 3) exact-repair regenerating codes. IEEE J. Sel. Areas Commun. 2014, 32, 967–975. [Google Scholar] [CrossRef] [Green Version]
- Tian, C.; Liu, T. Multilevel diversity coding with regeneration. IEEE Trans. Inf. Theory 2016, 62, 4833–4847. [Google Scholar] [CrossRef]
- Tian, C. Symmetry, outer bounds, and code constructions: A computer-aided investigation on the fundamental limits of caching. Entropy 2018, 20, 603. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tian, C.; Chen, J. Caching and delivery via interference elimination. IEEE Trans. Inf. Theory 2018, 64, 1548–1560. [Google Scholar] [CrossRef]
- Tian, C.; Sun, H.; Chen, J. A Shannon-theoretic approach to the storage-retrieval tradeoff in PIR systems. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 1904–1908. [Google Scholar]
- Tian, C. On the storage cost of private information retrieval. IEEE Trans. Inf. Theory 2020, 66, 7539–7549. [Google Scholar] [CrossRef]
- Galler, B.A.; Fisher, M.J. An improved equivalence algorithm. Commun. ACM 1964, 7, 301–303. [Google Scholar] [CrossRef]
- IBM. IBM ILOG CPLEX Optimizer. Available online: https://www.ibm.com/analytics/cplex-optimizer (accessed on 15 February 2021).
- Gurobi. Gurobi Optimizer. Available online: https://www.gurobi.com/ (accessed on 15 February 2021).
- An Open-Source Toolbox for Computer-Aided Investigation on the Fundamental Limits of Information Systems. Available online: https://github.com/ct2641/CAI (accessed on 15 February 2021).
- Information Theoretic Inequality Prover (ITIP). Available online: http://user-www.ie.cuhk.edu.hk/~ITIP/ (accessed on 15 December 2020).
- Xitip: Information Theoretic Inequalities Prover. Available online: http://xitip.epfl.ch/ (accessed on 15 December 2020).
- Yeung, R.W.; Lee, T.T.; Ye, Z. Information-theoretic characterizations of conditional mutual independence and Markov random fields. IEEE Trans. Inf. Theory 2002, 48, 1996–2011. [Google Scholar] [CrossRef]
- Dougherty, R.; Freiling, C.; Zeger, K. Six new non-Shannon information inequalities. In Proceedings of the 2006 IEEE International Symposium on Information Theory (ISIT), Seattle, WA, USA, 9–14 July 2006; pp. 233–236. [Google Scholar]
- Dougherty, R.; Freiling, C.; Zeger, K. Non-Shannon information inequalities in four random variables. arXiv 2011, arXiv:1104.3602. [Google Scholar]
- Li, C.; Weber, S.; Walsh, J.M. Multilevel diversity coding systems: Rate regions, codes, computation, & forbidden minors. IEEE Trans. Inf. Theory 2016, 63, 230–251. [Google Scholar]
- Ho, S.W.; Ling, L.; Tan, C.W.; Yeung, R.W. Proving and disproving information inequalities: Theory and scalable algorithms. IEEE Trans. Inf. Theory 2020, 66, 5522–5536. [Google Scholar] [CrossRef]
- Chan, T.; Thakor, S.; Grant, A. Minimal characterization of Shannon-type inequalities under functional dependence and full conditional independence structures. IEEE Trans. Inf. Theory 2019, 65, 4041–4051. [Google Scholar] [CrossRef]
- Zhang, K.; Tian, C. On the symmetry reduction of information inequalities. IEEE Trans. Commun. 2018, 66, 2396–2408. [Google Scholar] [CrossRef]
- Li, C.; Weber, S.; Walsh, J.M. On multi-source networks: Enumeration, rate region computation, and hierarchy. IEEE Trans. Inf. Theory 2017, 63, 7283–7303. [Google Scholar] [CrossRef]
- Apte, J.; Walsh, J.M. Exploiting symmetry in computing polyhedral bounds on network coding rate regions. In Proceedings of the 2015 International symposium on network coding (NetCod), Sydney, Australia, 22–24 June 2015; pp. 76–80. [Google Scholar]
- Li, C.; Apte, J.; Walsh, J.M.; Weber, S. A new computational approach for determining rate regions and optimal codes for coded networks. In Proceedings of the 2013 International Symposium on Network Coding (NetCod), Calgary, AB, Canada, 7–9 June 2013; pp. 1–6. [Google Scholar]
- Apte, J.; Li, C.; Walsh, J.M. Algorithms for computing network coding rate regions via single element extensions of matroids. In Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 2306–2310. [Google Scholar]
- Gattegno, I.B.; Goldfeld, Z.; Permuter, H.H. Fourier-Motzkin elimination software for information theoretic inequalities. IEEE Inf. Theory Newsl. 2015, 65, 25–29. [Google Scholar]
- Gürpınar, E.; Romashchenko, A. How to use undiscovered information inequalities: Direct applications of the copy lemma. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1377–1381. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 1st ed.; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Yeung, R. A First Course in Information Theory; Kluwer Academic Publishers: New York, NY, USA, 2002. [Google Scholar]
- Yeung, R.W. Information Theory and Network Coding; Springer Science & Business Media: New York, NY, USA, 2008. [Google Scholar]
- Zhang, Z.; Yeung, R.W. A non-Shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory 1997, 43, 1982–1986. [Google Scholar] [CrossRef]
- Matus, F. Infinitely many information inequalities. In Proceedings of the 2007 IEEE International Symposium on Information Theory (ISIT), Nice, France, 24–29 June 2007; pp. 41–44. [Google Scholar]
- Dougherty, R.; Freiling, C.; Zeger, K. Insufficiency of linear coding in network information flow. IEEE Trans. Inf. Theory 2005, 51, 2745–2759. [Google Scholar] [CrossRef] [Green Version]
- Dougherty, R.; Freiling, C.; Zeger, K. Networks, matroids, and non-Shannon information inequalities. IEEE Trans. Inf. Theory 2007, 53, 1949–1969. [Google Scholar] [CrossRef] [Green Version]
- Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 2010, 56, 4539–4551. [Google Scholar] [CrossRef] [Green Version]
- Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Suh, C. A survey on network codes for distributed storage. Proc. IEEE 2011, 99, 476–489. [Google Scholar] [CrossRef] [Green Version]
- Maddah-Ali, M.A.; Niesen, U. Fundamental limits of caching. IEEE Trans. Inf. Theory 2014, 60, 2856–2867. [Google Scholar] [CrossRef] [Green Version]
- Lassez, C.; Lassez, J.L. Symbolic and Numerical Computation for Artificial Intelligence; Donald, B.R., Kapur, D., Mundy, J.L., Eds.; Academic Press: San Diego, CA, USA, 1992; Chapter 4 Quantifier Elimination for Conjunctions of Linear Constraints via a Convex Hull Algorithm; pp. 103–1199. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, C.; Plank, J.S.; Hurst, B.; Zhou, R. Computational Techniques for Investigating Information Theoretic Limits of Information Systems. Information 2021, 12, 82. https://doi.org/10.3390/info12020082
Tian C, Plank JS, Hurst B, Zhou R. Computational Techniques for Investigating Information Theoretic Limits of Information Systems. Information. 2021; 12(2):82. https://doi.org/10.3390/info12020082
Chicago/Turabian StyleTian, Chao, James S. Plank, Brent Hurst, and Ruida Zhou. 2021. "Computational Techniques for Investigating Information Theoretic Limits of Information Systems" Information 12, no. 2: 82. https://doi.org/10.3390/info12020082
APA StyleTian, C., Plank, J. S., Hurst, B., & Zhou, R. (2021). Computational Techniques for Investigating Information Theoretic Limits of Information Systems. Information, 12(2), 82. https://doi.org/10.3390/info12020082