entropy-logo

Journal Browser

Journal Browser

Information Theory in Machine Learning and Data Science II

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 June 2021) | Viewed by 16683

Special Issue Editor


E-Mail
Guest Editor
Department of Mathematics & Statistics, Utah State University, Logan, UT 84322, USA
Interests: machine learning; data science; information theory; deep learning; manifold learning; nonparametric estimation; biomedical applications; financial applications; engineering applications

Special Issue Information

Dear Colleagues,

We are in a data revolution where nearly every field is acquiring thousands of samples with many dimensions. Machine learning and data science have grown immensely in popularity due to their many successes in analyzing complex, large data sets. Information theoretic measures such as entropy, mutual information, and information divergence, are useful in many machine learning and data science applications including model selection, structure learning, clustering, regression, classification, causality analysis, regularization, and extending machine learning algorithms to distributional features. In this Special Issue, we seek papers that discuss advances in the application of information theory in machine learning and data science problems. Possible topics of interest include, but are not limited to, estimation of information theoretic measures, deep learning approaches that incorporate information theory, fundamental limits of machine learning algorithms, optimization and learning with information theoretic constraints, information bottleneck methods, information theoretic approaches to adaptive data analysis, extending machine learning algorithms to distributional features, Bayes error rate estimation, and applications of information theory in reinforcement learning.

Dr. Kevin R. Moon
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • information theory
  • data science
  • big data
  • model selection
  • regularization
  • distributional features
  • Bayes error
  • information bottleneck
  • entropy
  • mutual information
  • information divergence
  • deep learning
  • reinforcement learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 445 KiB  
Article
A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
by Sebastian Ciobanu and Liviu Ciortuz
Entropy 2021, 23(8), 1012; https://doi.org/10.3390/e23081012 - 3 Aug 2021
Cited by 1 | Viewed by 2157
Abstract
Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors [...] Read more.
Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the number of samples because the closed-form solution contains a matrix inverse that is not defined when having more predictors than samples. The standard approach to solve this issue is using the Moore–Penrose inverse or the L2 regularization. We propose another solution starting from a machine learning model that, this time, is used in unsupervised learning performing a dimensionality reduction task or just a density estimation one—factor analysis (FA)—with one-dimensional latent space. The density estimation task represents our focus since, in this case, it can fit a Gaussian distribution even if the dimensionality of the data is greater than the number of samples; hence, we obtain this advantage when creating the supervised counterpart of factor analysis, which is linked to linear regression. We also create its semisupervised counterpart and then extend it to be usable with missing data. We prove an equivalence to linear regression and create experiments for each extension of the factor analysis model. The resulting algorithms are either a closed-form solution or an expectation–maximization (EM) algorithm. The latter is linked to information theory by optimizing a function containing a Kullback–Leibler (KL) divergence or the entropy of a random variable. Full article
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science II)
Show Figures

Figure 1

15 pages, 918 KiB  
Article
Word2vec Skip-Gram Dimensionality Selection via Sequential Normalized Maximum Likelihood
by Pham Thuc Hung and Kenji Yamanishi
Entropy 2021, 23(8), 997; https://doi.org/10.3390/e23080997 - 31 Jul 2021
Cited by 18 | Viewed by 3000
Abstract
In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered as an implicit probability distribution estimation under the assumption that there exists a true [...] Read more.
In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered as an implicit probability distribution estimation under the assumption that there exists a true contextual distribution among words. Therefore, we apply information criteria with the aim of selecting the best dimensionality so that the corresponding model can be as close as possible to the true distribution. We examine the following information criteria for the dimensionality selection problem: the Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), and Sequential Normalized Maximum Likelihood (SNML) criterion. SNML is the total codelength required for the sequential encoding of a data sequence on the basis of the minimum description length. The proposed approach is applied to both the original SG model and the SG Negative Sampling model to clarify the idea of using information criteria. Additionally, as the original SNML suffers from computational disadvantages, we introduce novel heuristics for its efficient computation. Moreover, we empirically demonstrate that SNML outperforms both BIC and AIC. In comparison with other evaluation methods for word embedding, the dimensionality selected by SNML is significantly closer to the optimal dimensionality obtained by word analogy or word similarity tasks. Full article
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science II)
Show Figures

Figure 1

29 pages, 456 KiB  
Article
Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy
by Yolanda Orenes, Alejandro Rabasa, Jesus Javier Rodriguez-Sala and Joaquin Sanchez-Soriano
Entropy 2021, 23(7), 850; https://doi.org/10.3390/e23070850 - 1 Jul 2021
Cited by 5 | Viewed by 3141
Abstract
In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark [...] Read more.
In the machine learning literature we can find numerous methods to solve classification problems. We propose two new performance measures to analyze such methods. These measures are defined by using the concept of proportional reduction of classification error with respect to three benchmark classifiers, the random and two intuitive classifiers which are based on how a non-expert person could realize classification simply by applying a frequentist approach. We show that these three simple methods are closely related to different aspects of the entropy of the dataset. Therefore, these measures account somewhat for entropy in the dataset when evaluating the performance of classifiers. This allows us to measure the improvement in the classification results compared to simple methods, and at the same time how entropy affects classification capacity. To illustrate how these new performance measures can be used to analyze classifiers taking into account the entropy of the dataset, we carry out an intensive experiment in which we use the well-known J48 algorithm, and a UCI repository dataset on which we have previously selected a subset of the most relevant attributes. Then we carry out an extensive experiment in which we consider four heuristic classifiers, and 11 datasets. Full article
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science II)
Show Figures

Figure 1

21 pages, 488 KiB  
Article
A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning
by Ivan Lopez-Arevalo, Edwin Aldana-Bobadilla, Alejandro Molina-Villegas, Hiram Galeana-Zapién, Victor Muñiz-Sanchez and Saul Gausin-Valle
Entropy 2020, 22(12), 1391; https://doi.org/10.3390/e22121391 - 9 Dec 2020
Cited by 29 | Viewed by 4249
Abstract
The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform [...] Read more.
The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon’s Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data. Full article
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science II)
Show Figures

Figure 1

19 pages, 361 KiB  
Article
On Gap-Based Lower Bounding Techniques for Best-Arm Identification
by Lan V. Truong and Jonathan Scarlett
Entropy 2020, 22(7), 788; https://doi.org/10.3390/e22070788 - 20 Jul 2020
Viewed by 2856
Abstract
In this paper, we consider techniques for establishing lower bounds on the number of arm pulls for best-arm identification in the multi-armed bandit problem. While a recent divergence-based approach was shown to provide improvements over an older gap-based approach, we show that the [...] Read more.
In this paper, we consider techniques for establishing lower bounds on the number of arm pulls for best-arm identification in the multi-armed bandit problem. While a recent divergence-based approach was shown to provide improvements over an older gap-based approach, we show that the latter can be refined to match the former (up to constant factors) in many cases of interest under Bernoulli rewards, including the case that the rewards are bounded away from zero and one. Together with existing upper bounds, this indicates that the divergence-based and gap-based approaches are both effective for establishing sample complexity lower bounds for best-arm identification. Full article
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science II)
Back to TopTop