Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning
Abstract
:1. Introduction
- How are EMI and ShMI related in supervised, semi-supervised, and unsupervised learning?
- Is similarity probability? If it is, why is it not normalized? If it is not, why can we bring it into Bayes’ formula (see Equation (9))?
- Can we get similarity functions, distortion functions, truth functions, or membership functions directly from samples or sampling distributions?
- Reviewing the evolutionary histories of semantic information measures and learning functions;
- Clarifying the relationship between SeMI and ShMI;
- Promoting the integration and development of the G theory and deep learning.
- Reviewing the evolution of semantic information measures;
- Reviewing the evolution of learning functions;
- Introducing the G theory and its applications to machine learning;
- Discussing some questions related to SeMI and ShMI maximization and deep learning;
- Discussing some new methods worth exploring and the limitation of the G theory;
- Conclusions with opportunities and challenges.
2. The Evolution of Semantic Information Measures
2.1. Using the Truth or Similarity Function to Approximate to Shannon’s Mutual Information
2.2. Historial Events Related to Semantic Information Measures
3. The Evolution of Learning Functions
3.1. From Likelihood Functions to Similarity and Truth Functions
- P(x) is the prior distribution of instance x, representing the source. We use Pθ(x) to approximate to it.
- P(y) is the prior distribution of label y, representing the destination. We use Pθ(yj) to approximate to it.
- P(x|yj) is the posterior distribution of x. We use the likelihood function P(x|θj) = P(x|yj, θ) to approximate to it.
- P(y|xi) is the posterior distribution of y. Since P(y|xi) = P(y)P(x|y)/P(xi), Bayesian Inference uses P(θ) P(x|y, θ)/Pθ(x) (Bayesian posterior) [57] to approximate to it.
- P(x, y) is the joint probability distribution. We use P(x, y|θ) to approximate to it.
- m(x, yj) = P(x, yj)/[P(x)P(yj)] is the relatedness function. What approximates to it is mθ(x, yj). We call mθ(x, yj) the truthlikeness function, which changes between 0 and ∞.
- m(x, yj)/max[m(x, yj)] = P(yj|x)/max[P(yj, x)] is the relative relatedness function. We use the truth function T(θj|x) or the similarity function S(x, yj) to approximate to it.
- When the number of labels is n > 2, it is difficult to construct a set of inverse probability functions because P(θj|x) should be normalized for every xi:
- Although P(yj|xi; i = j) indicates the accuracy for binary communication, when n > 2, P(yj|xi; i = j) may not mean it, especially for semantic communication. For example, x represents an age, y denotes one of the three labels: y0 = “Non-adult”, y1 = “Adult”, and y2 = “Youth”. If y2 is rarely used, both P(y2) and P(y2|x) are tiny. However, the accuracy of using y2 for x = 20 should be 1.
3.2. The Definitions of Semantic Similarity
3.3. Similarity Functions and Semantic Information Measures Used for Deep Learning
- A function proportional to m(x, yj) is used as the learning function (denoted as S(x, yj)); its maximum is generally 1, and its average is the partition function Zj.
- The semantic or estimated information between x and yj is log[S(x, yj)/Zj].
- The statistical probability distribution P(x, y) is used for the average.
- The semantic or estimated mutual information can be expressed as the coverage entropy minus the fuzzy entropy, and the fuzzy entropy is equal to the average distortion.
4. The Sematic Information G Theory and Its Applications to Machine Learning
4.1. The P-T Probability Framework and the Semantic Information G Measure
4.2. Optimizing Truth Functions and Making Probability Predictions
4.3. The Information Rate-Fidelity Function R(G)
4.4. Channels Matching Algorithms for Machine Learning
4.4.1. For Multi-Label Learning
4.4.2. For the MaxMI Classification of Unseen Instances
4.4.3. Explaining and Improving the EM Algorithm for Mixture Models
5. Discussion 1: Clarifying Some Questions
5.1. Is Mutual Information Maximization a Good Objective?
5.2. Interpreting DNNs: The R(G) Function vs. the Information Bottleneck
- DNNs often need pre-training and fine-tuning. In the pre-training stage, the RBM is used for every latent layer.
5.3. Understanding Gibbs Distributions, Partition Functions, MinMI Matching, and RBMs
5.4. Understanding Fuzzy Entropy, Coverage Entropy, Distortion, and Loss
5.5. Evaluating Learning Methods: The Information Criterion or the Accuracy Criterion?
6. Discussion 2: Exploring New Methods for Machine Learning
6.1. Optimizing Gaussian Semantic Channels with Shannon’s Channels
6.2. The Gaussian Channel Mixture Model and the Channel Mixture Model Machine
6.3. Calculating the Similarity between any Two Words with Sampling Distributions
- It is simple without needing the semantic structure such as that in WordNet;
- This similarity is similar to the improved PMI similarity [65] which varies between 0 and 1;
- This similarity function is suitable for probability predictions.
6.4. Purposive Information and the Information Value as Reward Functions for Reinforcement Learning
- Choosing an action a to reach the destination;
- Learning the system state’s change from P(x) to P(x|a);
- Setting the reward function, which is a function of the goal, P(x), and P(x|a).
- How to get P(x|a) or P(x|a, h) (h means the history)?
- How to choose an action a according to the system state and the reward function?
- How to achieve the goal economically, that is, to balance the reward maximization and the control-efficiency maximization?
6.5. The Limitations of the Semantic Information G Theory
7. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BOYL | Bootstrap Your Own Latent |
DIM | Deep InfoMax (Information Maximization) |
DNN | Deep Neural Network |
DT | DeLuca-Termini |
DV | Donsker-Varadhan |
EM | Expectation-Maximization |
EMI | Estimated Mutual Information |
EnM | Expectation-n-Maximization |
GCMM | Gaussian Channel Mixture Model |
CMMM | Channel Mixture Model Machine |
GPS | Global Positioning System |
G theory | Semantic information G theory (G means generalization) |
InfoNCE | Information Noise Contrast Estimation |
ITL | Information-Theoretic Learning |
KL | Kullback–Leibler |
LSA | Latent Semantic Analysis |
MaxMI | Maximum Mutual Information |
MinMI | Minimum Mutual Information |
MINE | Mutual Information Neural Estimation |
MoCo | Momentum Contrast |
PMI | Pointwise Mutual Information |
SeMI | Semantic Mutual Information |
ShMI | Shannon’s Mutual Information |
SimCLR | A simple framework for contrastive learning of visual representations |
Appendix A. About Formal Semantic Meaning
Appendix B. The Definitions of the Value-Added Entropy and the Information Value
References
- Belghazi, M.I.; Baratin, A.; Rajeswar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, R.D. MINE: Mutual information neural estimation. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1–44. [Google Scholar] [CrossRef]
- Oord, A.V.D.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–429, 623–656. [Google Scholar] [CrossRef]
- Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Trischler, A.; Bengio, Y. Learning Deep Representations by Mutual Information Estimation and Maximization. arXiv 2018, arXiv:1808.06670. [Google Scholar]
- Bachman, P.; Hjelm, R.D.; Buchwalter, W. Learning Representations by Maximizing Mutual Information Across Views. arXiv 2018, arXiv:1906.00910. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML, PMLR 119, Virtual Event, 13–18 July 2020; pp. 1575–1585. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; The University of Illinois Press: Urbana, IL, USA, 1963. [Google Scholar]
- Bao, J.; Basu, P.; Dean, M.; Partridge, C.; Swami, A.; Leland, W.; Hendler, J.A. Towards a theory of semantic communication. In Proceedings of the 2011 IEEE 1st International Network Science Workshop, West Point, NSW, USA, 22–24 June 2011; pp. 110–117. [Google Scholar] [CrossRef]
- Strinati, E.C.; Barbarossa, S. 6G networks: Beyond Shannon towards semantic and goal-oriented communications. Comput. Netw. 2021, 190, 107930. [Google Scholar] [CrossRef]
- Lu, C. Channels’ matching algorithm for mixture models. In Intelligence Science I, Proceedings of the ICIS 2017, Beijing, China, 27 September 2017; Shi, Z.Z., Goertel, B., Feng, J.L., Eds.; Springer: Cham, Switzerland, 2017; pp. 321–332. [Google Scholar] [CrossRef]
- Lu, C. Semantic information G theory and logical Bayesian inference for machine learning. Information 2019, 10, 261. [Google Scholar] [CrossRef]
- Lu, C. Shannon equations reform and applications. BUSEFAL 1990, 44, 45–52. Available online: https://www.listic.univ-smb.fr/production-scientifique/revue-busefal/version-electronique/ebusefal-44/ (accessed on 5 March 2019).
- Lu, C. A Generalized Information Theory; China Science and Technology University Press: Hefei, China, 1993; ISBN 7-312-00501-2. (In Chinese) [Google Scholar]
- Lu, C. A generalization of Shannon’s information theory. Int. J. Gen. Syst. 1999, 28, 453–490. [Google Scholar] [CrossRef]
- Lu, C. The P–T probability framework for semantic communication, falsification, confirmation, and Bayesian reasoning. Philosophies 2020, 5, 25. [Google Scholar] [CrossRef]
- Lu, C. Using the Semantic Information G Measure to Explain and Extend Rate-Distortion Functions and Maximum Entropy Distributions. Entropy 2021, 23, 1050. [Google Scholar] [CrossRef]
- Floridi, L. Semantic conceptions of information. In Stanford Encyclopedia of Philosophy; Stanford University: Stanford, CA, USA, 2005; Available online: http://seop.illc.uva.nl/entries/information-semantic/ (accessed on 1 March 2023).
- Tarski, A. The semantic conception of truth: And the foundations of semantics. Philos. Phenomenol. Res. 1994, 4, 341–376. [Google Scholar] [CrossRef]
- Davidson, D. Truth and meaning. Synthese 1967, 17, 304–323. [Google Scholar] [CrossRef]
- Semantic Similarity. In Wikipedia: The Free Encyclopedia. Available online: https://en.wikipedia.org/wiki/Semantic_similarity (accessed on 10 February 2023).
- Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. arXiv 1995, arXiv:cmp-lg/9511007. [Google Scholar]
- Poole, B.; Ozair, S.; Oord, A.V.D.; Alemi, A.; Tucker, G. On Variational Bounds of Mutual Information. arXiv 2019, arXiv:1905.06922. [Google Scholar]
- Tschannen, M.; Djolonga, J.; Rubenstein, P.K.; Gelly, S.; Luci, M. On Mutual Information Maximization for Representation Learning. arXiv 2019, arXiv:1907.13625. [Google Scholar]
- Tishby, N.; Pereira, F.; Bialek, W. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 22–24 September 1999; pp. 368–377. [Google Scholar]
- Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
- Hu, B.G. Information theory and its relation to machine learning. In Proceedings of the 2015 Chinese Intelligent Automation Conference; Lecture Notes in Electrical Engineering; Deng, Z., Li, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; Volume 336. [Google Scholar] [CrossRef]
- Xu, X.; Huang, S.-L.; Zheng, L.; Wornell, G.W. An information-theoretic interpretation to deep neural networks. Entropy 2022, 24, 135. [Google Scholar] [CrossRef]
- Rényi, A. On measures of information and entropy. Proc. Fourth Berkeley Symp. Math. Stat. Probab. 1960, 4, 547–561. [Google Scholar]
- Principe, J.C. Information-Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer Publishing Company: New York, NY, USA, 2010. [Google Scholar]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Irshad, M.R.; Maya, R.; Buono, F.; Longobardi, M. Kernel estimation of cumulative residual Tsallis entropy and its dynamic version under ρ-mixing dependent data. Entropy 2022, 24, 9. [Google Scholar] [CrossRef]
- Liu, W.; Pokharel, P.P.; Principe, J.C. Correntropy: A localized similarity measure. In Proceedings of the 2006 IEEE International Joint Conference on Neural Network Proceedings, Vancouver, BC, Canada, 16–21 July 2006; IEEE: Piscataway, NY, USA, 2006. [Google Scholar]
- Yu, S.; Giraldo, L.S.; Principe, J. Information-Theoretic Methods in Deep Neural Networks: Recent Advances and Emerging Opportunities. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Survey Track, Montreal, QC, Canada, 19–27 August 2021; pp. 4669–4678. [Google Scholar] [CrossRef]
- Oddie, G.T. The Stanford Encyclopedia of Philosophy, Winter 2016th ed.; Zalta, E.N., Ed.; Available online: https://plato.stanford.edu/archives/win2016/entries/truthlikeness/ (accessed on 18 May 2020).
- Floridi, L. Outline of a theory of strongly semantic information. Minds Mach. 2004, 14, 197–221. [Google Scholar] [CrossRef]
- Zhong, Y. A theory of semantic information. Proceedings 2017, 1, 129. [Google Scholar] [CrossRef]
- Popper, K. Logik der Forschung: Zur Erkenntnistheorie der Modernen Naturwissenschaft; Springer: Vienna, Austria, 1935; English translation: The Logic of Scientific Discovery, 1st ed.; Hutchinson: London, UK, 1959. [Google Scholar]
- Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Information; Technical Report No. 247; Research Laboratory of Electronics, MIT: Cambridge, MA, USA, 1952. [Google Scholar]
- Shepard, R.N. Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika 1957, 22, 325–345. [Google Scholar] [CrossRef]
- Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec. 1959, 4, 142–163. [Google Scholar]
- Theil, H. Economics and Information Theory; North-Holland Pub. Co.: Amsterdam, The Netherlands; Rand McNally: Chicago, IL, USA, 1967. [Google Scholar]
- Zadeh, L.A. Fuzzy Sets. Inf. Control. 1965, 8, 338–353. [Google Scholar] [CrossRef]
- De Luca, A.; Termini, S. A definition of a non-probabilistic entropy in setting of fuzzy sets. Inf. Control. 1972, 20, 301–312. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Thomas, S.F. Possibilistic uncertainty and statistical inference. In Proceedings of the ORSA/TIMS Meeting, Houston, TX, USA, 12–14 October 1981. [Google Scholar]
- Dubois, D.; Prade, H. Fuzzy sets and probability: Misunderstandings, bridges and gaps. In Proceedings of the 1993 Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, USA, 28 March 1993. [Google Scholar]
- Donsker, M.; Varadhan, S. Asymptotic evaluation of certain Markov process expectations for large time IV. Commun. Pure Appl. Math. 1983, 36, 183–212. [Google Scholar] [CrossRef]
- Wang, P.Z. From the fuzzy statistics to the falling fandom subsets. In Advances in Fuzzy Sets, Possibility Theory and Applications; Wang, P.P., Ed.; Plenum Press: New York, NY, USA, 1983; pp. 81–96. [Google Scholar]
- Aczel, J.; Forte, B. Generalized entropies and the maximum entropy principle. In Bayesian Entropy and Bayesian Methods in Applied Statistics; Justice, J.H., Ed.; Cambridge University Press: Cambridge, UK, 1986; pp. 95–100. [Google Scholar]
- Zadeh, L.A. Probability measures of fuzzy events. J. Math. Anal. Appl. 1986, 23, 421–427. [Google Scholar] [CrossRef]
- Lu, C. Decoding model of color vision and verifications. Acta Opt. Sin. 1989, 9, 158–163. (In Chinese) [Google Scholar]
- Lu, C. Explaining color evolution, color blindness, and color recognition by the decoding model of color vision. In 11th IFIP TC 12 International Conference, IIP 2020, Hangzhou, China; Shi, Z., Vadera, S., Chang, E., Eds.; Springer Nature: Cham, Switzerland, 2020; pp. 287–298. Available online: https://www.springer.com/gp/book/9783030469306 (accessed on 18 May 2020).
- Ohlan, A.; Ohlan, R. Fundamentals of fuzzy information measures. In Generalizations of Fuzzy Information Measures; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Fisher, R.A. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. 1922, 222, 309–368. [Google Scholar]
- Fienberg, S.E. When did Bayesian Inference become “Bayesian”? Bayesian Anal. 2006, 1, 1–40. [Google Scholar] [CrossRef]
- Zhang, M.L.; Li, Y.K.; Liu, X.Y.; Geng, X. Binary Relevance for multi-label learning: An overview. Front. Comput. Sci. 2018, 12, 191–202. [Google Scholar] [CrossRef]
- Hinton, G.E. A practical guide to training Restricted Boltzmann Machines. In Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science; Montavon, G., Orr, G.B., Müller, K.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7700, pp. 599–619. [Google Scholar] [CrossRef]
- Ashby, F.G.; Perrin, N.A. Toward a unified theory of similarity and recognition. Psychol. Rev. 1988, 95, 124–150. [Google Scholar] [CrossRef]
- Banu, A.; Fatima, S.S.; Khan, K.U.R. Information content based semantic similarity measure for concepts subsumed by multiple concepts. Int. J. Web Appl. 2015, 7, 85–94. [Google Scholar]
- Dumais, S.T. Latent semantic analysis. Annu. Rev. Inf. Sci. Technol. 2005, 38, 188–230. [Google Scholar] [CrossRef]
- Church, K.W.; Hanks, P. Word association norms, mutual information, and lexicography. Comput. Linguist. 1990, 16, 22–29. [Google Scholar]
- Islam, A.; Inkpen, D. Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data 2008, 2, 1–25. [Google Scholar] [CrossRef]
- Chandrasekaran, D.; Mago, V. Evolution of Semantic Similarity—A Survey. arXiv 2021, arXiv:2004.13820. [Google Scholar] [CrossRef]
- Costa, T.; Leal, J.P. Semantic measures: How similar? How related? In Web Engineering, Proceedings of the ICWE 2016, Lugano, Switzerland, 6–9 June 2016; Bozzon, A., Cudre-Maroux, P., Pautasso, C., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2016; Volume 9671. [Google Scholar] [CrossRef]
- Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci. 1985, 9, 147–169. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Gutmann, M.U.; Hyvärinen, A. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 2012, 13, 307–361. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS2016), Barcelona, Spain, 5–10 December 2016; pp. 1857–1865. [Google Scholar]
- Lu, C. Understanding and Accelerating EM Algorithm’s Convergence by Fair Competition Principle and Rate-Verisimilitude Function. arXiv 2021, arXiv:2104.12592. [Google Scholar]
- Lu, C. Channels’ Confirmation and Predictions’ Confirmation: From the Medical Test to the Raven Paradox. Entropy 2020, 22, 384. [Google Scholar] [CrossRef]
- Lu, C. Causal Confirmation Measures: From Simpson’s Paradox to COVID-19. Entropy 2023, 25, 143. [Google Scholar] [CrossRef]
- Lu, C. Semantic channel and Shannon channel mutually match and iterate for tests and estimations with maximum mutual information and maximum likelihood. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing, Shanghai, China, 15 January 2018; IEEE Computer Society Press Room: Piscataway, NY, USA, 2018; pp. 15–18. [Google Scholar]
- Nair, V.; Hinton, G. Implicit mixtures of Restricted Boltzmann Machines. In Proceedings of the NIPS’08: Proceedings of the 21st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 8–10 December 2008; pp. 1145–1152. [Google Scholar]
- Song, J.; Yuan, C. Learning Boltzmann Machine with EM-like Method. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
- Sow, D.M.; Alexandros Eleftheriadis, A. Complexity distortion theory. IEEE Trans. Inf. Theory 2003, 49, 604–608. [Google Scholar] [CrossRef]
- Lu, C. How Semantic Information G Measure Relates to Distortion, Freshness, Purposiveness, and Efficiency. arXiv 2022, arXiv:2304.13502. [Google Scholar]
- Still, S. Information-theoretic approach to interactive learning. Europhys. Lett. 2009, 85, 28005. [Google Scholar] [CrossRef]
- Eysenbach, B.; Salakhutdinov, R.; Levine, S. The Information Geometry of Unsupervised Reinforcement Learning. arXiv 2021, arXiv:2110.02719. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2006. [Google Scholar]
- Lu, C. The Entropy Theory of Portfolio and Information Value: On the Risk Control of Stocks and Futures; Science and Technology University Press: Hefei, China, 1997; ISBN 7-312-00952-2/F.36. (In Chinese) [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, C. Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning. Entropy 2023, 25, 802. https://doi.org/10.3390/e25050802
Lu C. Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning. Entropy. 2023; 25(5):802. https://doi.org/10.3390/e25050802
Chicago/Turabian StyleLu, Chenguang. 2023. "Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning" Entropy 25, no. 5: 802. https://doi.org/10.3390/e25050802
APA StyleLu, C. (2023). Reviewing Evolution of Learning Functions and Semantic Information Measures for Understanding Deep Learning. Entropy, 25(5), 802. https://doi.org/10.3390/e25050802