entropy-logo

Journal Browser

Journal Browser

Information-Theoretic Methods in Deep Learning: Theory and Applications

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 May 2024) | Viewed by 27057

Special Issue Editors


E-Mail Website
Guest Editor
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Interests: information theoretic learning; information bottleneck; deep learning; artificial general intelligence; correntropy
Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
Interests: information theory of deep neural network; explainable/interpretable AI; machine learning in non-stationary environments; time series analysis; brain network analysis

E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY 40506, USA
Interests: machine learning for signal processing; information theoretic learning; representation learning; computer vision; computational neuroscience
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an 710049, China
Interests: information theoretic learning; artificial intelligence; cognitive science; adaptive filtering; brain machine learning; robotics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Information theory is a mathematical infrastructure to deal with manipulation of information. It has a significant influence on the design of efficient and reliable communication systems. Information theoretic learning (ITL) has attracted increasing attention in the field of deep learning in recent years. It provides useful descriptions of the underlying behavior of random variables or processes to develop and analyze deep models. Novel ITL estimators and principles have been used for different deep learning problems, such as mutual information neural estimator for representation learning with the information maximization principle; and the principle of relevant information for redundancy compression and graph sparsification. As a vital approach to describe performance constraints and design mappings, ITL has essential applications in supervised, unsupervised and reinforcement learning problems, such as classification, clustering, and sequential decision making. In this field, information bottleneck (IB) aims at the right balance between data fit and generalization based on the mutual information as both a regularizer and a cost function. The IB theory helps to better understand the basic limits of learning problems, such as the learning performance of deep neural networks, geometric clustering, and extracting the Gaussian part of a signal, etc.

In recent years, researchers have revealed that ITL provides a powerful paradigm for analyzing neural networks by shedding light on the layered structure, generalization capabilities and learning dynamics. For example, the IB theory have demonstrated great potential to solve critical problems in deep learning, including understanding and analyzing black-box neural networks, and serving as an optimization criterion for training deep neural networks. Divergence estimation is another approach with a broad range of applications including domain shift detection, domain adaptation, generative modeling, and model regularization

With the development of ITL theory, we believe that ITL can provide new perspectives, theories, and algorithms to the challenging problems of deep learning. Therefore, this Special Issue aims at reporting the latest developments on ITL methods and their applications. Topics of interest include but are not limited to:

  • Information-Theoretic Quantities and Estimators;
  • Information-Theoretic Principles and Regularization in deep neural networks;
  • Interpretation and explanation of deep learning models with information-theoretic methods;
  • Information theoretic methods for distributed deep learning;
  • Information theoretic methods for brain inspired neural networks;
  • Information Bottleneck in deep representation learning;
  • Representation learning beyond the Information Bottleneck, such as total correlation explanation and principles of relevant information.

Dr. Shuangming Yang
Dr. Shujian Yu
Dr. Luis Gonzalo Sánchez Giraldo
Prof. Dr. Badong Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information theoretic learning
  • information bottleneck
  • deep learning
  • neural networks

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

23 pages, 927 KiB  
Article
PyDTS: A Python Toolkit for Deep Learning Time Series Modelling
by Pascal A. Schirmer and Iosif Mporas
Entropy 2024, 26(4), 311; https://doi.org/10.3390/e26040311 - 31 Mar 2024
Cited by 1 | Viewed by 2069
Abstract
In this article, the topic of time series modelling is discussed. It highlights the criticality of analysing and forecasting time series data across various sectors, identifying five primary application areas: denoising, forecasting, nonlinear transient modelling, anomaly detection, and degradation modelling. It further outlines [...] Read more.
In this article, the topic of time series modelling is discussed. It highlights the criticality of analysing and forecasting time series data across various sectors, identifying five primary application areas: denoising, forecasting, nonlinear transient modelling, anomaly detection, and degradation modelling. It further outlines the mathematical frameworks employed in a time series modelling task, categorizing them into statistical, linear algebra, and machine- or deep-learning-based approaches, with each category serving distinct dimensions and complexities of time series problems. Additionally, the article reviews the extensive literature on time series modelling, covering statistical processes, state space representations, and machine and deep learning applications in various fields. The unique contribution of this work lies in its presentation of a Python-based toolkit for time series modelling (PyDTS) that integrates popular methodologies and offers practical examples and benchmarking across diverse datasets. Full article
Show Figures

Figure 1

24 pages, 1055 KiB  
Article
A Unifying Generator Loss Function for Generative Adversarial Networks
by Justin Veiner, Fady Alajaji and Bahman Gharesifard
Entropy 2024, 26(4), 290; https://doi.org/10.3390/e26040290 - 27 Mar 2024
Cited by 1 | Viewed by 1264
Abstract
A unifying α-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN) that uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a [...] Read more.
A unifying α-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN) that uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a symmetric class probability estimation type function, Lα, and the resulting GAN system is termed Lα-GAN. Under an optimal discriminator, it is shown that the generator’s optimization problem consists of minimizing a Jensen-fα-divergence, a natural generalization of the Jensen-Shannon divergence, where fα is a convex function expressed in terms of the loss function Lα. It is also demonstrated that this Lα-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, least squares GAN (LSGAN), least kth-order GAN (LkGAN), and the recently introduced (αD,αG)-GAN with αD=1. Finally, experimental results are provided for three datasets—MNIST, CIFAR-10, and Stacked MNIST—to illustrate the performance of various examples of the Lα-GAN system. Full article
Show Figures

Figure 1

22 pages, 1728 KiB  
Article
Ensemble Transductive Propagation Network for Semi-Supervised Few-Shot Learning
by Xueling Pan, Guohe Li and Yifeng Zheng
Entropy 2024, 26(2), 135; https://doi.org/10.3390/e26020135 - 31 Jan 2024
Cited by 1 | Viewed by 1275
Abstract
Few-shot learning aims to solve the difficulty in obtaining training samples, leading to high variance, high bias, and over-fitting. Recently, graph-based transductive few-shot learning approaches supplement the deficiency of label information via unlabeled data to make a joint prediction, which has become a [...] Read more.
Few-shot learning aims to solve the difficulty in obtaining training samples, leading to high variance, high bias, and over-fitting. Recently, graph-based transductive few-shot learning approaches supplement the deficiency of label information via unlabeled data to make a joint prediction, which has become a new research hotspot. Therefore, in this paper, we propose a novel ensemble semi-supervised few-shot learning strategy via transductive network and Dempster–Shafer (D-S) evidence fusion, named ensemble transductive propagation networks (ETPN). First, we present homogeneity and heterogeneity ensemble transductive propagation networks to better use the unlabeled data, which introduce a preset weight coefficient and provide the process of iterative inferences during transductive propagation learning. Then, we combine the information entropy to improve the D-S evidence fusion method, which improves the stability of multi-model results fusion from the pre-processing of the evidence source. Third, we combine the L2 norm to improve an ensemble pruning approach to select individual learners with higher accuracy to participate in the integration of the few-shot model results. Moreover, interference sets are introduced to semi-supervised training to improve the anti-disturbance ability of the mode. Eventually, experiments indicate that the proposed approaches outperform the state-of-the-art few-shot model. The best accuracy of ETPN increases by 0.3% and 0.28% in the 5-way 5-shot, and by 3.43% and 7.6% in the 5-way 1-shot on miniImagNet and tieredImageNet, respectively. Full article
Show Figures

Figure 1

17 pages, 857 KiB  
Article
Deep Individual Active Learning: Safeguarding against Out-of-Distribution Challenges in Neural Networks
by Shachar Shayovitz, Koby Bibas and Meir Feder
Entropy 2024, 26(2), 129; https://doi.org/10.3390/e26020129 - 31 Jan 2024
Viewed by 1040
Abstract
Active learning (AL) is a paradigm focused on purposefully selecting training data to enhance a model’s performance by minimizing the need for annotated samples. Typically, strategies assume that the training pool shares the same distribution as the test set, which is not always [...] Read more.
Active learning (AL) is a paradigm focused on purposefully selecting training data to enhance a model’s performance by minimizing the need for annotated samples. Typically, strategies assume that the training pool shares the same distribution as the test set, which is not always valid in privacy-sensitive applications where annotating user data is challenging. In this study, we operate within an individual setting and leverage an active learning criterion which selects data points for labeling based on minimizing the min-max regret on a small unlabeled test set sample. Our key contribution lies in the development of an efficient algorithm, addressing the challenging computational complexity associated with approximating this criterion for neural networks. Notably, our results show that, especially in the presence of out-of-distribution data, the proposed algorithm substantially reduces the required training set size by up to 15.4%, 11%, and 35.1% for CIFAR10, EMNIST, and MNIST datasets, respectively. Full article
Show Figures

Figure 1

16 pages, 2785 KiB  
Article
Continual Reinforcement Learning for Quadruped Robot Locomotion
by Sibo Gai, Shangke Lyu, Hongyin Zhang and Donglin Wang
Entropy 2024, 26(1), 93; https://doi.org/10.3390/e26010093 - 22 Jan 2024
Cited by 1 | Viewed by 2588
Abstract
The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuously learn sub-sequential tasks (plasticity) and maintain [...] Read more.
The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuously learn sub-sequential tasks (plasticity) and maintain performance on previous tasks (stability). The policy obtained by the proposed method enables robots to learn multiple tasks sequentially, while overcoming both catastrophic forgetting and loss of plasticity. At the same time, it achieves the above goals with as little modification to the original RL learning process as possible. The proposed method uses the Piggyback algorithm to select protected parameters for each task, and reinitializes the unused parameters to increase plasticity. Meanwhile, we encourage the policy network exploring by encouraging the entropy of the soft network of the policy network. Our experiments show that traditional continual learning algorithms cannot perform well on robot locomotion problems, and our algorithm is more stable and less disruptive to the RL training progress. Several robot locomotion experiments validate the effectiveness of our method. Full article
Show Figures

Figure 1

15 pages, 656 KiB  
Article
A Deep Neural Network Regularization Measure: The Class-Based Decorrelation Method
by Chenguang Zhang, Tian Liu and Xuejiao Du
Entropy 2024, 26(1), 7; https://doi.org/10.3390/e26010007 - 20 Dec 2023
Cited by 1 | Viewed by 1571
Abstract
In response to the challenge of overfitting, which may lead to a decline in network generalization performance, this paper proposes a new regularization technique, called the class-based decorrelation method (CDM). Specifically, this method views the neurons in a specific hidden layer as base [...] Read more.
In response to the challenge of overfitting, which may lead to a decline in network generalization performance, this paper proposes a new regularization technique, called the class-based decorrelation method (CDM). Specifically, this method views the neurons in a specific hidden layer as base learners, and aims to boost network generalization as well as model accuracy by minimizing the correlation among individual base learners while simultaneously maximizing their class-conditional correlation. Intuitively, CDM not only promotes diversity among the hidden neurons, but also enhances their cohesiveness among them when processing samples from the same class. Comparative experiments conducted on various datasets using deep models demonstrate that CDM effectively reduces overfitting and improves classification performance. Full article
Show Figures

Figure 1

21 pages, 913 KiB  
Article
Analysis of Deep Convolutional Neural Networks Using Tensor Kernels and Matrix-Based Entropy
by Kristoffer K. Wickstrøm, Sigurd Løkse, Michael C. Kampffmeyer, Shujian Yu, José C. Príncipe and Robert Jenssen
Entropy 2023, 25(6), 899; https://doi.org/10.3390/e25060899 - 3 Jun 2023
Viewed by 1867
Abstract
Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs’ generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and [...] Read more.
Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs’ generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and the input/desired output to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness toward the high dimensionality associated with such layers. MI estimators should also be able to handle convolutional layers while at the same time being computationally tractable to scale to large networks. Existing IP methods have not been able to study truly deep convolutional neural networks (CNNs). We propose an IP analysis using the new matrix-based Rényi’s entropy coupled with tensor kernels, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. Our results shed new light on previous studies concerning small-scale DNNs using a completely new approach. We provide a comprehensive IP analysis of large-scale CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks. Full article
Show Figures

Figure 1

49 pages, 10680 KiB  
Article
Multivariate Time Series Information Bottleneck
by Denis Ullmann, Olga Taran and Slava Voloshynovskiy
Entropy 2023, 25(5), 831; https://doi.org/10.3390/e25050831 - 22 May 2023
Cited by 2 | Viewed by 2872
Abstract
Time series (TS) and multiple time series (MTS) predictions have historically paved the way for distinct families of deep learning models. The temporal dimension, distinguished by its evolutionary sequential aspect, is usually modeled by decomposition into the trio of “trend, seasonality, noise”, by [...] Read more.
Time series (TS) and multiple time series (MTS) predictions have historically paved the way for distinct families of deep learning models. The temporal dimension, distinguished by its evolutionary sequential aspect, is usually modeled by decomposition into the trio of “trend, seasonality, noise”, by attempts to copy the functioning of human synapses, and more recently, by transformer models with self-attention on the temporal dimension. These models may find applications in finance and e-commerce, where any increase in performance of less than 1% has large monetary repercussions, they also have potential applications in natural language processing (NLP), medicine, and physics. To the best of our knowledge, the information bottleneck (IB) framework has not received significant attention in the context of TS or MTS analyses. One can demonstrate that a compression of the temporal dimension is key in the context of MTS. We propose a new approach with partial convolution, where a time sequence is encoded into a two-dimensional representation resembling images. Accordingly, we use the recent advances made in image extension to predict an unseen part of an image from a given one. We show that our model compares well with traditional TS models, has information–theoretical foundations, and can be easily extended to more dimensions than only time and space. An evaluation of our multiple time series–information bottleneck (MTS-IB) model proves its efficiency in electricity production, road traffic, and astronomical data representing solar activity, as recorded by NASA’s interface region imaging spectrograph (IRIS) satellite. Full article
Show Figures

Figure 1

19 pages, 3673 KiB  
Article
Position-Wise Gated Res2Net-Based Convolutional Network with Selective Fusing for Sentiment Analysis
by Jinfeng Zhou, Xiaoqin Zeng, Yang Zou and Haoran Zhu
Entropy 2023, 25(5), 740; https://doi.org/10.3390/e25050740 - 30 Apr 2023
Viewed by 1537
Abstract
Sentiment analysis (SA) is an important task in natural language processing in which convolutional neural networks (CNNs) have been successfully applied. However, most existing CNNs can only extract predefined, fixed-scale sentiment features and cannot synthesize flexible, multi-scale sentiment features. Moreover, these models’ convolutional [...] Read more.
Sentiment analysis (SA) is an important task in natural language processing in which convolutional neural networks (CNNs) have been successfully applied. However, most existing CNNs can only extract predefined, fixed-scale sentiment features and cannot synthesize flexible, multi-scale sentiment features. Moreover, these models’ convolutional and pooling layers gradually lose local detailed information. In this study, a new CNN model based on residual network technology and attention mechanisms is proposed. This model exploits more abundant multi-scale sentiment features and addresses the loss of locally detailed information to enhance the accuracy of sentiment classification. It is primarily composed of a position-wise gated Res2Net (PG-Res2Net) module and a selective fusing module. The PG-Res2Net module can adaptively learn multi-scale sentiment features over a large range using multi-way convolution, residual-like connections, and position-wise gates. The selective fusing module is developed to fully reuse and selectively fuse these features for prediction. The proposed model was evaluated using five baseline datasets. The experimental results demonstrate that the proposed model surpassed the other models in performance. In the best case, the model outperforms the other models by up to 1.2%. Ablation studies and visualizations further revealed the model’s ability to extract and fuse multi-scale sentiment features. Full article
Show Figures

Figure 1

Review

Jump to: Research

28 pages, 570 KiB  
Review
To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review
by Ravid Shwartz Ziv and Yann LeCun
Entropy 2024, 26(3), 252; https://doi.org/10.3390/e26030252 - 12 Mar 2024
Cited by 33 | Viewed by 9086
Abstract
Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck [...] Read more.
Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck principle. This principle optimizes the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. This framework includes multiple encoders and decoders, suggesting that all existing work on self-supervised learning can be seen as specific instances. We aim to unify these approaches to understand their underlying principles better and address the main challenge: many works present different frameworks with differing theories that may seem contradictory. By weaving existing research into a cohesive narrative, we delve into contemporary self-supervised methodologies, spotlight potential research areas, and highlight inherent challenges. Moreover, we discuss how to estimate information-theoretic quantities and their associated empirical problems. Overall, this paper provides a comprehensive review of the intersection of information theory, self-supervised learning, and deep neural networks, aiming for a better understanding through our proposed unified approach. Full article
Show Figures

Figure 1

Back to TopTop