A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets
Abstract
:1. Introduction
1.1. Formalization
- Variable Y represents a partially hidden source of information;
- The random vector represents an encoding of that partially inaccessible information in the form favoured by an (unknown) observation process;
- The recovered is the result of decoding the information in .
1.2. Some Fundamental Issues in MLC
1.2.1. Classifier Design for MLC
- Binary relevance (BR) [14], a problem transformation method that learns L binary classifiers—one for each different label in —and then transforms the original data set into L data sets that contain all examples of the original data set, labelled positively if the label set of the original example contained and negatively otherwise. To classify a new instance BR outputs the union of the labels that are positively predicted by the L classifiers.
- Classifier Chains (CC) [15,16], a transformation method that orders the labels by their decreasing predictive power on later labels and trains classifiers for each of these in order: all previous labels are used as inputs to predict later labels. Other hierarchical approaches, use lattice-based methods to define the labelset hierarchy, for example [17].
- Label Powerset (LP) [1], a simple but effective problem transformation method that considers each unique set of labels in a multilabel training set as one of the classes of a new single-label classification task. Given a new instance, the single-label classifier of LP outputs the most probable class, which is actually a set of labels. Bad initial performance results suggested the Rakel [13] variant.
1.2.2. Modelling Label Dependencies
1.2.3. Label Imbalance in MLC Datasets
1.2.4. Types of MLC Datasets
1.3. Research Goals
- What is an “easy” or “hard” dataset to carry out MLC on?This in turn involves answering two questions:
- (a)
- How “difficult” is the set of labels to learn of its own?
- (b)
- How “difficult” is it to predict the labels from the observations?
- Given the answers to the previous question, what is the most appropriate way to address the MLC problem?
1.4. Reading Guide
2. Theoretical Methods
2.1. The Classification is Information Transmission Metaphor
- Y serves as a source of information in the form of classes;
- is a type of encoding of that (hidden, inaccessible) information in the forms of observations;
- The transformed are the result of conformed, noisy transmission vectors;
- The classified is the result of decoding the received information through the classifier. as depicted in Figure 3.
2.2. The Source Multivariate Entropy Triangle
- Uniformity, , whence is maximal with . The opposite of this property is determinacy whereby , in which case there is no uncertainty about the outcome of , , and whence we may conclude:
- Orthogonality, , defined by , whence . In such case, since , we conclude that and by definition.
- Redundancy, if the value of is completely determined by the value of . This entails that .
- The following split balance equation holds for each variable individually:
- The aggregate balance equation holds:
- If then , is the geometric locus of distributions with independent marginals and has a high residual entropy.
- If then is the geometric locus of distributions with uniform marginals.
- If then is the locus of distributions with identical marginals and in general high bound information.
- The multivariate residual entropy is actually the sum of amounts of information singularly captured by each variable. Nowhere else can it be found and any later processing that ignores this quantity will incur in the deletion of that information, e.g., for transmission purposes.
- Likewise, the total bound information is highly redundant in that every portion of it resides in (at least two) different variables. Once the entropy of one feature has been processed, the part of the bound information that lies in it is redundant for further processing.
- Somewhat similar to the original interpretation, the divergence from uniformity is not available for processing. It is a potentiality—maximal randomness—of the source of information that has not been realized and therefore is not available for later processing, unlike the other entropies.
2.3. A Brief Introduction to Formal Concept Analysis
2.3.1. Formal Contextualization
2.3.2. Analysing a Formal Context into Its Formal Concepts
2.3.3. Interpreting Concept Lattices
2.3.4. Synthesising a Context for a Complete Lattice
3. Results
- lattice theory in the form of Formal Concept Analysis (FCA [34,42]), as described in Section 2.3, to extract the qualitative information in MLC data.
- Compositional Data Analysis (CoDa [49,50]) specifically as it applies to the entropic compositions of joint distributions [33,35] described in Section 2.2, to measure the quantitative information in MLC data.
3.1. An Analysis of Information Content of MLC Task Data
- The formal context is the labelling context (of samples) of , built using the set of labels L as formal attributes, with , each sample index as a formal object , with , and each bitvector-encoded sample labelset as the i-indexed row of the incidence matrix .
- The formal context is the observation context (of samples) of built with F a set of features, , the same set of formal objects G and each observation vector is the i-indexed row of the incidence .
- The labelling lattice , short for “the concept lattice of the labelling context”;
- The observation lattice , analogously.
3.1.1. Information Content of MLC Sources: A First Theoretical Analysis
- Labelsets are object intents of and they can be found through the polar of observations. As a consequence we have:Corollary 3.The labels in L are hierarchically ordered in exactly the order of the systems of intents prescribed by , that is, the dual order, and the object concepts of observations are a set of join-dense elements of the lattice, and they generate the lattice of intents by means of intent (labelset) intersection.Proof.Recall that for an observation its labelset is which is precisely its intent, so the intents of are the labelsets in the task. By the synthesis Theorem 2 are a set of join-dense elements of and after Equation (13) their intents generate , the system of intents, by intersection. □
- FCA is capable of providing previously unknown information on the set of labels through the concept lattice construction.As an example, recall that the set of intents of the labelling context is . Then we have:Proposition 3.The LP transformation and its derivatives only need to provide classifiers for the intents of the join-irreducibles of .Proof.We know that only labelsets are used by the LP transformation and its derivatives so the general setup for this task is addressed by Corollary 3. But, due to Proposition 2, to reconstruct the information we only need one of the representatives of each block of the partition. Finally, due to Corollary 2 we only need the labelsets of the join-irreducible blocks in order to reconstruct . □Several remarks are in order here. First, depending on the dataset, this may or may not be a good reduction in the modelling effort. Also, note that the information about occurrence counts is lost, therefore:Guideline 1.Naive information fusion strategies would only work in the 100% accuracy case—e.g., for a given observation use the classifiers for the intents of the meet-irreducibles to obtain individual characterizations and then intersect them.
3.1.2. Qualitative Information Content of MLC Sources: An Exploration
- The set of labels it represents is the union of all labels in the order filter of the concept, that is, looking upwards in the lattice.
- The set of instances covered is the union of all instances in the order ideal of the concept, that is, looking downwards in the lattice.
- We would expect BR-like transformations to be good for a nominal labelling context.
- We would expect CC-based strategies to be good for ordinal labelling contexts, provided the implication order between labels, as manifested in the concept lattice, was known at training time and, somehow, profited from.
- It is difficult to know what strategy could be good for a contra-nominal labelling context. As a first intuition, considering that it is the contrary context to the nominal scale of the same order, we would expect BR to be also effective.
3.1.3. Quantitative Information Content of Boolean Contexts: A Theoretical Analysis
- For instance, we expect the sampling to be good enough so it is safe to suppose that no two identical labels are predicated of the same set of objects.Guideline 2.MLC datasets should be label-clarified, that is, no two labels should describe the instances in the same way.
- Regarding the equivalence in , in [55] we introduced a general framework to interpret the structure of the set of labels in terms of FCA and used it to improve a standard resampling technique in ML: n-fold validation. The rationale of this technique and an experiment demonstrating it can be found in Section 3.2.
- Finally, the existence of and the probability measure introduced in (15) on its blocks warrants the validity of the source multivariate entropy decompositions of labelling contexts and their Source Multivariate Entropy Triangles (SMET) of Section 2.2.
- As expected, nominal and contra-nominal scales have the same, totally redundant, average information content—since they lie on the line in Figure 9—and both show a tendency to a decreasing average information content as the order of the scale increases, from an initial high average information content, but still redundant.
- However, ordinal scales start from an intermediate level of irredundant information and randomness and slowly mount towards higher but more correlated average information contents. By the time the order reaches the information is totally redundant with high degree of randomness.
- For nominal and contra-nominal scales, all the labels have exactly the average information content. This is immediate for nominal labels, and would be expected to follow by the relation between nominal and contra-nominal scales and the symmetry properties of entropy. Note that one singular label can, in principle, be perfectly predicted from the rest since each is completely redundant, that is, they lie in the line . Note also that labels belonging to high order scales have very little information content: that is, they resemble detection phenomena—one majority vs. one minority class.
- For ordinal scales, for the same order, there is a rough line for the label information parallel to the left-hand side of the triangle, ending in the bottom vertex. The information is the more correlated the higher the order . Note that some pairs of labels have the same information content—e.g., those with complementary distributions of 0 and 1. Clearly, the higher the proportions of 1 (respectively 0) the less information a label bears, and this reaches the bottom apex since the last label is a deterministic signal (always on).
3.2. FCA-Induced Stratified Sampling
- Since the samples are supposed to be independent and identically distributed, the order of these contexts in the subposition, as indeed the reordering of the rows in the incidence, is irrelevant.
- The resampling of the labelset context is tied to the resampling of the observation context : we decide on the labelset information and this carries over to the observations.
- (Qualitative intuition) A necessary condition for the resampling of the data into training part and testing part to be meaningful for the MLC task, is that the concept lattice of all of the induced labelling subcontexts and be isomorphic:
- (Quantitative intuition) The frequencies of occurrence of the different labelsets in the blocks of are also important.
- The relative frequency of the hapaxes will be distorted (overrepresented) with respect to other labelsets.
- We will be using some data (the hapaxes) both for training and testing, which is known to obtain too optimistic performance results in whichever measure.
3.3. Experimental Validation
3.3.1. Exploring a Clustering Proposal on MLC Datasets
- Limited clustering: except for cluster D7—and perhaps D3—the rest of the clusters show great entropic dispersion.
- Overlapping: sometimes, exemplars of one cluster lie beside an immediate neighbour or another—e.g., instances of D1 and D2.
- Extreme dispersion: it does not seem to be justified calling D5—or perhaps even D8–a cluster from the entropic point of view.
3.3.2. Exploring the Clustering Hypothesis at the Dataset Level
- eurelexev is an extreme case of a dataset with many redundant features most of which are heavily imbalanced. This is a dataset of multilabel detection, not classification. Furthermore, its average and the coordinates of the individual labels suggest that it resembles either a nominal or a contra-nominal scale, that is, labels appear in any possible combination (contra-nominal scale) or mutually exclusively (nominal scale, cfr. Figure 9).
- To a certain extent, this is also the classification for rcv1sub1, although the slight separation of many values may suggest that there are substructures in the form of ordinal scales.
- birds, enron and slashdot are eminently label detection tasks with a minority of labels—the ones with higher bound information—which might be subject to classification. The distinction between them is in the amount of bound information overall: the more bound information the farther to the right the cloud of points is.
- Specifically, the birds task clearly has mostly detection labels. Not only is the empty labelset the majority class, but also, there are many hapaxes for the individual labels. Some labelsets may be distilled for poorly balance detection tasks disguising as binary classification tasks.
- flags and emotions [57] seem to be purely MLC tasks with fairly uniform label distributions and some degree of bound information between them. As per the previous discussion on the whole set of labels, they might even be considered in the same cluster.
3.3.3. Stratified Sampling in MLC Tasks
- As applied to the estimation of the entropies, the -fold validation yields the same result in train and test, the sought-for result.
- We can see the general drift towards increased correlation in all labels, but much more in, say, ‘angry-aggressive’ than in ‘quiet-still’.
- For this particular dataset, a threshold of with 5-fold validation seems to be a good compromise for attaining statistical validity vs. dataset fidelity.
3.4. Extending the Classification is Information Transmission Metaphor to MLC Tasks
- is a Source of information in the form of a partially accessible random vector of binary variables.
- is the encoding of that information in the form of vectors of observations, .
- The transformed are the result of conformed, noisy transmission of observation vectors.
- The classified is a random rector, the result of decoding the received information through the classifier, considered as a Presentation of information for downstream use.
3.5. Discussion
- A purely MLC dataset cluster with flags and emotions, with stochastic labels of high irredundancy.
- A cluster of datasets of mixed detection- and classification-oriented features with varying degrees of redundancy, as in birds, enron and slashdot, and
- A cluster of datasets of (almost purely) detection tasks with detection-oriented features, viz. eurelexcd and rc1sub1.
4. Conclusions
- A refinement of a meta-model for MLC tasks: the information channel model that includes joint but distinct characterizations of qualitative and quantitative aspects of information sources (see Figure 18) including:
- -
- An methodology for modelling and exploration of MLC labelling contexts based on FCA.
- -
- Novel measures and exploratory techniques for MLC dataset characterization from first principles based on information theory—the aggregated and multisplit SMETs—which are representations of the balance equation in three variables .
- This joint quantitative and qualitative model has allowed us to state:
- -
- Several Propositions and Corollaries about the characterization of MLC tasks with FCA- and entropic decomposition-related tools.
- -
- Several Hypotheses on the inner workings of MLC tasks—e.g., Hypotheses 1–4.
- -
- Several Guidelines for the development of “good” datasets for MLC—e.g., as in Guidelines 1–5.
- A challenging of previous results on clustering MLC datasets on the grounds of the data analysis carried out with the newly introduced qualitative and quantitative techniques.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
BR | Binary Relevance |
CC | Classifier Chains |
CDA | Confirmatory Data Analysis |
CMET | Channel Multivariate Entropy Triangle |
CoDa | Compositional Data (Analysis) |
EDA | Exploratory Data Analysis |
LP | Label Powerset |
MI | Mutual Information |
MLC | Multilabel Classification |
P | Presentation (in Figures) |
PCC | Probabilistic Classifier Chains |
S | Source (in Figures) |
SMET | Source Multivariate Entropy Triangle |
References
- Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M. Learning multi-label scene classification. Pattern Recognit. 2004, 37, 1757–1771. [Google Scholar] [CrossRef]
- Hafeez, A.; Ali, T.; Nawaz, A.; Rehman, S.U.; Mudasir, A.I.; Alsulami, A.A.; Alqahtani, A. Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles. IEEE Access 2023, 11, 74500–74516. [Google Scholar] [CrossRef]
- Priyadharshini, M.; Banu, A.F.; Sharma, B.; Chowdhury, S.; Rabie, K.; Shongwe, T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors 2023, 23, 6836. [Google Scholar] [CrossRef] [PubMed]
- Stoimchev, M.; Kocev, D.; Džeroski, S. Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images. Remote Sens. 2023, 15, 538. [Google Scholar] [CrossRef]
- Bogatinovski, J.; Todorovski, L.; Džeroski, S.; Kocev, D. Comprehensive Comparative Study of Multi-Label Classification Methods. Expert Syst. Appl. 2022, 203, 117215. [Google Scholar] [CrossRef]
- Zhang, M.L.; Zhou, Z.H. A Review On Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
- Gibaja, E.; Ventura, S. A Tutorial on Multilabel Learning. ACM Comput. Surv. 2015, 47, 38–52. [Google Scholar] [CrossRef]
- Herrera, F.; Charte, F.; Rivera, A.J.; del Jesus, M.J. Multilabel Classification; Problem Analysis, Metrics and Techniques; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Waegeman, W.; Dembczynski, K.; Hulermeier, E. Multi-Target Prediction: A Unifying View on Problems and Methods. Data Min. Knowl. Discov. 2019, 33, 293–324. [Google Scholar] [CrossRef]
- Murphy, K.P. Machine Learning; A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Lakoff, G.; Johnson, M. Metaphors We Live by; University of Chicago Press: Chicago, IL, USA, 1996. [Google Scholar]
- Núñez, R.; Lakoff, G. The Cognitive Foundations of Mathematics: The Role of Conceptual Metaphor. In The Handbook of Mathematical Cognition; Campbell, J.I., Ed.; Psychology Press: New York, NY, USA, 2005; pp. 127–142. [Google Scholar]
- Tsoumakas, G.; Katakis, I.; Vlahavas, I. Random K-Labelsets for Multi-Label Classification. IEEE Trans. Knowl. Discov. Data Eng. 2010, 23, 1079–1089. [Google Scholar] [CrossRef]
- Zhang, M.L.; Li, Y.K.; Liu, X.Y.; Geng, X. Binary Relevance for Multi-Label Learning: An Overview. Front. Comput. Sci. 2018, 12, 191–202. [Google Scholar] [CrossRef]
- Kajdanowicz, T.; Kazienko, P. Hybrid Repayment Prediction for Debt Portfolio. In Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems; Nguyen, N.T., Kowalczyk, R., Chen, S.M., Eds.; Lecture Notes in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5796, pp. 850–857. [Google Scholar] [CrossRef]
- Read, J.; Pfahringer, B.; Holmes, G.; Frank, E. Classifier Chains: A Review and Perspectives. J. Artif. Intell. Res. 2021, 70, 683–718. [Google Scholar] [CrossRef]
- Ferrandin, M.; Cerri, R. Multi-Label Classification via Closed Frequent Labelsets and Label Taxonomies. Soft Comput. 2023, 27, 8627–8660. [Google Scholar] [CrossRef]
- Dembczyński, K.; Waegeman, W.; Cheng, W.; Hüllermeier, E. Regret analysis for performance metrics in multi-label classification: The case of hamming and subset zero-one loss. In Proceedings of the European Conference on Machine Learning, (ECML PKDD 2010), Barcelona, Spain, 20–24 September 2010; pp. 280–295. [Google Scholar]
- Read, J. Scalable Multi-Label Classification. Ph.D. Thesis, The University of Waikato, Hamilton, New Zealand, 2010. Available online: http://researchcommons.waikato.ac.nz/handle/10289/4645 (accessed on 28 April 2021).
- Valverde-Albacete, F.J.; Peláez-Moreno, C. 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox. PLoS ONE 2014, 9, e84217. [Google Scholar] [CrossRef] [PubMed]
- Tarekegn, A.N.; Giacobini, M.; Michalak, K. A Review of Methods for Imbalanced Multi-Label Classification. Pattern Recognit. 2021, 118, 107965. [Google Scholar] [CrossRef]
- Japkowicz, N.; Stephen, S. The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
- Charte, F.; Rivera, A.; del Jesus, M.J.; Herrera, F. A First Approach to Deal with Imbalance in Multi-label Datasets. In Proceedings of the Hybrid Artificial Intelligent Systems; Pan, J.S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E., Eds.; Lecture Notes in Artificial Intelligence. Springer: Berlin/Heidelberg, Germany, 2013; pp. 150–160. [Google Scholar] [CrossRef]
- Luo, Y.; Tao, D.; Xu, C.; Xu, C.; Liu, H.; Wen, Y. Multiview Vector-Valued Manifold Regularization for Multilabel Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 709–722. [Google Scholar] [CrossRef]
- Kostovska, A.; Bogatinovski, J.; Dzeroski, S.; Kocev, D.; Panov, P. A Catalogue with Semantic Annotations Makes Multilabel Datasets FAIR. Sci. Rep. 2022, 12, 7267. [Google Scholar] [CrossRef]
- Charte, F.; Charte, F.D. Working with multilabel datasets in R: The mldr package. R. J. 2015, 7, 149–162. [Google Scholar] [CrossRef]
- Charte, F.; Rivera, A.J. mldr.datasets: R Ultimate Multilabel Dataset Repository. 2019. Available online: https://CRAN.R-project.org/package=mldr.datasets (accessed on 30 November 2023).
- Birkhoff, G. Lattice Theory, 3rd ed.; American Mathematical Society: Providence, RI, USA, 1967. [Google Scholar]
- Bogatinovski, J.; Todorovski, L.; Dzeroski, S.; Kocev, D. Explaining the Performance of Multilabel Classification Methods with Data Set Properties. Int. J. Intell. Syst. 2022, 37, 6080–6122. [Google Scholar] [CrossRef]
- Kostovska, A.; Bogatinovski, J.; Treven, A.; Dzeroski, S.; Kocev, D.; Panov, P. FAIRification of MLC Data. arXiv 2022, arXiv:cs/2211.12757. [Google Scholar]
- Davey, B.; Priestley, H. Introduction to Lattices and Order, 2nd ed.; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Shannon, C.E. A mathematical theory of Communication. Bell Syst. Tech. J. 1948, XXVII, 379–423, 623–656. [Google Scholar] [CrossRef]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. The Evaluation of Data Sources using Multivariate Entropy Tools. Expert Syst. Appl. 2017, 78, 145–157. [Google Scholar] [CrossRef]
- Ganter, B.; Wille, R. Formal Concept Analysis: Mathematical Foundations; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. Two information-theoretic tools to assess the performance of multi-class classifiers. Pattern Recognit. Lett. 2010, 31, 1665–1671. [Google Scholar] [CrossRef]
- Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
- Tukey, J.W. We need both exploratory and confirmatory. Am. Stat. 1980, 34, 23–25. [Google Scholar]
- Meila, M. Comparing clusterings—An information based distance. J. Multivar. Anal. 2007, 28, 875–893. [Google Scholar] [CrossRef]
- James, R.G.; Ellison, C.J.; Crutchfield, J.P. Anatomy of a bit: Information in a time series observation. Chaos 2011, 21, 037109. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, N.E.; Ferry, M. ggtern: Ternary Diagrams Using ggplot2. J. Stat. Softw. Code Snippets 2018, 87, 1–17. [Google Scholar] [CrossRef]
- Valverde-Albacete, F.J. Entropies—Entropy Triangles. Available online: https://github.com/FJValverde/entropies (accessed on 14 January 2024).
- Wille, R. Restructuring lattice theory: An approach based on hierarchies of concepts. In Ordered Sets, Proceedings of the NATO Advanced Study Institute, Banff, AB, Canada, 28 August–12 September 1981; Reidel: Dordrecht, The Netherlands; Boston, MA, USA; London, UK, 1982; pp. 314–339. [Google Scholar]
- Ganter, B.; Obiedkov, S. Conceptual Exploration; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
- Poelmans, J.; Kuznetsov, S.O.; Ignatov, D.I.; Dedene, G. Formal Concept Analysis in Knowledge Processing: A Survey on Models and Techniques. Expert Syst. Appl. 2013, 40, 6601–6623. [Google Scholar] [CrossRef]
- Valverde-Albacete, F.J.; González-Calabozo, J.M.; Peñas, A.; Peláez-Moreno, C. Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis. Expert Syst. Appl. 2016, 44, 198–216. [Google Scholar] [CrossRef]
- González-Calabozo, J.M.; Valverde-Albacete, F.J.; Peláez-Moreno, C. Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis. BMC Bioinform. 2016, 17, 374. [Google Scholar] [CrossRef]
- Peláez-Moreno, C.; García-Moral, A.I.; Valverde-Albacete, F.J. Analyzing phonetic confusions using Formal Concept Analysis. J. Acoust. Soc. Am. 2010, 128, 1377–1390. [Google Scholar] [CrossRef] [PubMed]
- Erné, M.; Koslowski, J.; Melton, A.; Strecker, G.E. A Primer on Galois Connections. Ann. N. Y. Acad. Sci. 1993, 704, 103–125. [Google Scholar] [CrossRef]
- Aitchison, J. The Statistical Analysis of Compositional Data; The Blackburn Press: Caldwell, NJ, USA, 1986. [Google Scholar]
- Pawlowsky-Glahn, V.; Egozcue, J.J.; Tolosana-Delgado, R. Modelling and Analysis of Compositional Data; Pawlowsky-Glahn/Modelling and Analysis of Compositional Data; John Wiley & Sons, Ltd.: Chichester, UK, 2015. [Google Scholar]
- Burusco, A.; Fuentes-González, R. The Study of the L-fuzzy Concept Lattice. Mathw. Soft Comput. 1994, 3, 209–218. [Google Scholar]
- Belohlavek, R. Fuzzy Galois Connections; Technical Report, Institute for Research and Application of Fuzzy Modeling; University of Ostrava: Ostrava, Czech Republic, 1998. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. Extending conceptualisation modes for generalised Formal Concept Analysis. Inf. Sci. 2011, 181, 1888–1909. [Google Scholar] [CrossRef]
- Wille, R. Conceptual landscapes of knowledge: A pragmatic paradigm for knowledge processing. In Proceedings of the Second International Symposium on Knowledge Retrieval, Use and Storage for Efficiency, Vancouver, BC, Canada, 11–13 August 1997; Mineau, G., Fall, A., Eds.; pp. 2–13. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. Leveraging Formal Concept Analysis to Improve N-Fold Validation in Multilabel Classification. In Proceedings of the Workshop Analyzing Real Data with Formal Concept Analysis (RealDataFCA 2021), Strasbourg, France, 29 June 2021; Braud, A., Dolquès, X., Missaoui, R., Eds.; Volume 3151, pp. 44–51. [Google Scholar]
- Valverde Albacete, F.J.; Peláez-Moreno, C.; Cabrera, I.P.; Cordero, P.; Ojeda-Aciego, M. Exploratory Data Analysis of Multi-Label Classification Tasks with Formal Context Analysis. In Proceedings of the Concept Lattices and Their Applications CLA, Tallinn, Estonia, 29 June–1 July 2020; Trnecka, M., Valverde Albacete, F.J., Eds.; pp. 171–183. [Google Scholar]
- Wieczorkowska, A.; Synak, P.; Raś, Z.W. Multi-Label Classification of Emotions in Music. In Proceedings of the Intelligent Information Processing and Web Mining Conference, Advances in Intelligent and Soft Computing. Ustron, Poland, 19–22 June 2006; Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 35, pp. 307–315. [Google Scholar] [CrossRef]
- Briggs, F.; Lakshminarayanan, B.; Neal, L.; Fern, X.Z.; Raich, R.; Hadley, S.J.K.; Hadley, A.S.; Betts, M.G. Acoustic Classification of Multiple Simultaneous Bird Species: A Multi-Instance Multi-Label Approach. J. Acoust. Soc. Am. 2012, 131, 4640. [Google Scholar] [CrossRef]
- Cordero, P.; Lopez Rodriguez, M.E.D.; Mora, A. fcaR: Formal Concept Analysis with R. R. J. 2022, 14, 341–361. [Google Scholar] [CrossRef]
Dataset Name | d | n | ||||
---|---|---|---|---|---|---|
1 | flags | 79 | 54 | 7 | 194 | 19 |
2 | yeast | 686 | 198 | 14 | 2417 | 103 |
3 | ng20 | 58 | 55 | 20 | 19,300 | 1006 |
4 | emotions | 30 | 27 | 6 | 593 | 72 |
5 | scene | 17 | 15 | 6 | 2407 | 294 |
6 | bookmarks | 150,337 | 18,716 | 208 | 87,856 | 2150 |
7 | delicious | 9,343,385 | 15,806 | 983 | 16,105 | 500 |
8 | enron | 1595 | 753 | 53 | 1702 | 1001 |
9 | bibtex | 6298 | 2856 | 159 | 7395 | 1836 |
10 | corel5k | 5702 | 3175 | 374 | 5000 | 499 |
11 | corel16k002 | 6498 | 4868 | 164 | 13,761 | 500 |
12 | corel16k003 | 6354 | 4812 | 154 | 13,760 | 500 |
13 | corel16k010 | 6245 | 4692 | 144 | 13,618 | 500 |
14 | corel16k004 | 6547 | 4860 | 162 | 13,837 | 500 |
15 | corel16k001 | 6478 | 4803 | 153 | 13,766 | 500 |
16 | corel16k006 | 6649 | 5009 | 162 | 13,859 | 500 |
17 | corel16k007 | 7017 | 5158 | 174 | 13,915 | 500 |
18 | corel16k005 | 6841 | 5034 | 160 | 13,847 | 500 |
19 | corel16k008 | 6479 | 4956 | 168 | 13,864 | 500 |
20 | corel16k009 | 6972 | 5175 | 173 | 13,884 | 500 |
21 | genbase | 39 | 32 | 27 | 662 | 1186 |
22 | tmc2007 | 2072 | 1341 | 22 | 28,596 | 49,060 |
23 | medical | 98 | 94 | 45 | 978 | 1449 |
24 | tmc2007_500 | 1820 | 1172 | 22 | 28,596 | 500 |
25 | eurlexev | 54,479 | 16,467 | 3993 | 19,348 | 5000 |
26 | eurlexdc | 1712 | 1615 | 412 | 19,348 | 5000 |
27 | birds | 154 | 133 | 19 | 645 | 260 |
28 | foodtruck | 250 | 116 | 12 | 407 | 21 |
29 | langlog | 337 | 304 | 75 | 1460 | 1004 |
30 | cal500 | 2,560,365 | 502 | 174 | 502 | 68 |
31 | mediamill | 20,013 | 6555 | 101 | 43,907 | 120 |
32 | stackex_coffee | 207 | 174 | 123 | 225 | 1763 |
33 | stackex_cooking | 8070 | 6386 | 400 | 10,491 | 577 |
34 | stackex_cs | 6528 | 4749 | 274 | 9270 | 635 |
35 | stackex_chess | 1573 | 1078 | 227 | 1675 | 585 |
36 | stackex_chemistry | 3890 | 3032 | 175 | 6961 | 540 |
37 | stackex_philosophy | 3168 | 2249 | 233 | 3971 | 842 |
38 | rcv1sub4 | 1429 | 816 | 101 | 6000 | 47,229 |
39 | rcv1sub1 | 2012 | 1028 | 101 | 6000 | 47,236 |
40 | rcv1sub5 | 1828 | 946 | 101 | 6000 | 47,235 |
41 | rcv1sub3 | 1645 | 939 | 101 | 6000 | 47,236 |
42 | rcv1sub2 | 1781 | 954 | 101 | 6000 | 47,236 |
43 | yahoo_reference | 327 | 275 | 33 | 8027 | 39,679 |
44 | yahoo_business | 335 | 233 | 30 | 11,214 | 21,924 |
45 | yahoo_social | 479 | 361 | 39 | 12,111 | 52,350 |
46 | yahoo_health | 510 | 335 | 32 | 9205 | 30,605 |
47 | yahoo_education | 663 | 511 | 33 | 12,030 | 27,534 |
48 | imdb | 7273 | 4503 | 28 | 120,919 | 1001 |
49 | ohsumed | 1335 | 1147 | 23 | 13,929 | 1002 |
50 | yahoo_recreation | 1120 | 530 | 22 | 12,828 | 30,324 |
51 | yahoo_science | 601 | 457 | 40 | 6428 | 37,187 |
52 | yahoo_society | 2418 | 1054 | 27 | 14,512 | 31,802 |
53 | yahoo_entertainment | 490 | 337 | 21 | 12,730 | 32,001 |
54 | reutersk500 | 956 | 811 | 103 | 6000 | 500 |
55 | slashdot | 159 | 156 | 22 | 3782 | 1079 |
56 | yahoo_arts | 1071 | 599 | 26 | 7484 | 23,146 |
Cluster | Name | Actual | n | |||
---|---|---|---|---|---|---|
1 | flags | 79 | 54 | 7 | 194 | 19 |
2 | emotions | 30 | 27 | 6 | 593 | 72 |
3 | enron | 1595 | 753 | 53 | 1702 | 1001 |
4 | eurlexdc | 1712 | 1615 | 412 | 19,348 | 5000 |
5 | birds | 154 | 133 | 19 | 645 | 260 |
7 | rcv1sub1 | 2012 | 1028 | 101 | 6000 | 47,236 |
8 | slashdot | 159 | 156 | 22 | 3782 | 1079 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Valverde-Albacete, F.J.; Peláez-Moreno, C. A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets. Mathematics 2024, 12, 346. https://doi.org/10.3390/math12020346
Valverde-Albacete FJ, Peláez-Moreno C. A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets. Mathematics. 2024; 12(2):346. https://doi.org/10.3390/math12020346
Chicago/Turabian StyleValverde-Albacete, Francisco J., and Carmen Peláez-Moreno. 2024. "A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets" Mathematics 12, no. 2: 346. https://doi.org/10.3390/math12020346
APA StyleValverde-Albacete, F. J., & Peláez-Moreno, C. (2024). A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets. Mathematics, 12(2), 346. https://doi.org/10.3390/math12020346