A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT)

Ghasemkhani, Bita; Balbal, Kadriye Filiz; Birant, Derya

doi:10.3390/math12182825

Open AccessArticle

A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT)

by

Bita Ghasemkhani

¹

,

Kadriye Filiz Balbal

²

and

Derya Birant

^3,*

¹

Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey

²

Department of Computer Science, Dokuz Eylul University, Izmir 35390, Turkey

³

Department of Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2825; https://doi.org/10.3390/math12182825

Submission received: 4 August 2024 / Revised: 4 September 2024 / Accepted: 10 September 2024 / Published: 12 September 2024

(This article belongs to the Special Issue Advances in Machine Learning and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a novel classification method for multi-class multi-label datasets, named multi-class multi-label logistic model tree (MMLMT). Our approach supports multi-label learning to predict multiple class labels simultaneously, thereby enhancing the model’s capacity to capture complex relationships within the data. The primary goal is to improve the accuracy of classification tasks involving multiple classes and labels. MMLMT integrates the logistic regression (LR) and decision tree (DT) algorithms, yielding interpretable models with high predictive performance. By combining the strengths of LR and DT, our method offers a flexible and powerful framework for handling multi-class multi-label data. Extensive experiments demonstrated the effectiveness of MMLMT across a range of well-known datasets with an average accuracy of 85.90%. Furthermore, our method achieved an average of 9.87% improvement compared to the results of state-of-the-art studies in the literature. These results highlight MMLMT’s potential as a valuable approach to multi-label learning.

Keywords:

machine learning; multi-label learning; classification; mathematics; multi-class multi-label dataset; logistic model tree; artificial intelligence

MSC:

68T01

1. Introduction

Artificial intelligence (AI) is transforming our world by advancing systems’ capabilities to perform tasks traditionally demanding human intelligence, such as analyzing data, understanding patterns, and predicting complex behaviors. Machine learning (ML), a key discipline within AI, empowers systems to derive insights from data and continually enhance their performance. Rooted in mathematics, ML utilizes statistics, algebra, and optimization to develop robust algorithms that identify patterns, make predictions, and learn from extensive data. It contributes to solving a wide array of problems and driving innovation across several industries [1].

Supervised learning (SL) in ML is used when labeled data are available, enabling applications such as medical diagnostics, image recognition, and fraud detection. Within the realm of supervised learning, classification aims at assigning labels or categories to input data instances based on their features. The process involves training a model on a labeled dataset, where the correct output is known, allowing the model to learn patterns and relationships within the data. Once trained, the model can predict labels for new, unseen data instances. Classification is broadly used in several applications, such as image recognition [2], medical diagnosis [3], crop identification [4], fault detection [5], and quality control [6]. Common algorithms for classification tasks include the decision tree, support vector machine (SVM), neural network (NN), logistic regression (LR), k-nearest neighbors (KNN), reduced error pruning tree (REPTree), and ensemble methods like random forest (RF) and extreme gradient boosting (XGBoost), each offering unique strengths. Additionally, the convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory network (LSTM), and deep belief network (DBN) are used, along with random tree (RT) and k-star. Probabilistic graphical models such as the Bayesian network (BN) and Gaussian mixture model (GMM) are applied for tasks requiring the capture of relationships between variables and handling mixed data types, as well as naïve Bayes (NB). For instance-based learning, locally weighted learning (LWL) is utilized, and the Hoeffding tree (HT) is employed for various scenarios.

The logistic model tree (LMT) [7] is another significant classification algorithm in machine learning, which employs logistic regression models at the leaves of a decision tree. This integration allows LMT to benefit from the transparency of decision trees and the probabilistic modeling proficiency of logistic regression simultaneously. The hybrid structure of LMT contributes to achieving both linear and nonlinear relationships between inputs and the target feature. Additionally, LMT has the capability to generate probabilistic outputs rather than simple class labels, which is suitable for problems that require such predictions and threshold adjustments. LMT inherently deals with both continuous and categorical data, without the need for encoding during preprocessing. Furthermore, LMT typically applies regularization techniques, such as pruning to reduce overfitting and increase generalization. These advantages make the LMT algorithm noteworthy in practical applications across multiple domains, e.g., hydrology [8], seismology [9], geography [10], healthcare [11], forestry [12], biometrics [13], energy [14], cybersecurity [15], agriculture [16], and more.

Classification tasks in machine learning can also be broadly categorized based on the number of targets and the range of possible values for each target. Binary classification deals with assigning instances to one of two possible categories for a single target variable. Multi-class classification extends this by allowing the single target to belong to one of several distinct categories. Multi-label classification is characterized by the assignment of multiple binary labels to each instance, reflecting that an instance can belong to several categories simultaneously. Multi-class multi-label classification (also known as multi-target classification or multi-output classification) is a task in which the dataset has multiple target attributes, where each target attribute in the dataset can have two or more different values. The versatility of multi-label classification proves valuable in actual circumstances where objects belong to multiple categories or exhibit diverse attributes concurrently [17].

The multi-label paradigm is widely applied in domains where instances possess multiple inherent characteristics, such as physical activity recognition [18], image identification [19], text classification [20], mathematics [21], educational application [22], recommendation systems [23], sentiment analysis [24], radiology [25], and more. For example, genomics aids in classifying genes into multiple functional categories based on their sequences, reflecting their involvement in diverse biological processes. Medical imaging assists in diagnosing diseases by analyzing images that might contain multiple abnormalities or conditions, each requiring a separate label concurrently. In text classification, multiple tags or categories are assigned to documents, optimizing the precision of content classification across numerous topics or themes. In recommendation systems, multi-label classification plays a pivotal role in suggesting multiple items or objects adapted to the multi-dimensional preferences and interests of users. In e-commerce platforms, it recommends a variety of products based on customers’ browsing history and past purchases, involving multiple product categories simultaneously. These applications highlight the efficacy of multi-label classification in different environments, where the richness of the data cannot be captured by a single label.

Combining multi-class multi-label classification with logistic model trees enriches the classification process by leveraging the strengths of both approaches. Multi-class multi-label classification involves assigning multiple classes–labels to each instance. Logistic model trees enhance the predictive performance and robustness of each classification problem by integrating the decision tree’s ability to model complex interactions with logistic regression’s capacity to manage various data types. Each logistic model tree provides comprehensible models for individual labels, facilitating easier understanding and trust in predictions. LMTs usually achieve high predictive accuracy, with their pruning mechanism contributing significantly to the optimization of classification performance. This combination offers flexibility in model design, allowing customization of the logistic model trees to match the specific characteristics of the data distribution for each label. This suitability for different scenarios provides an efficient solution for multi-label classification tasks.

This study introduces a new method, multi-class multi-label logistic model tree (MMLMT), designed to address complex classification problems. MMLMT offers a promising avenue for advancing the field of multi-label classification. This study is distinguished from the previous studies through the following key contributions:

(i): Novel method: It proposes a novel approach (MMLMT) that integrates a logistic model tree with a multi-class multi-label classification technique, marking its first introduction in the literature.
(ii): Comprehensive integration: MMLMT incorporates both multi-class and multi-label classifications, offering a more comprehensive approach to handling multiple target attributes and predicting one class among the multiple potential classes for each target attribute. This integration offers a more detailed categorization, enhancing the model’s ability to address complex multi-label problems effectively. The proposed method enhances the model’s capability to manage a wide range of label combinations within multi-class multi-label datasets.
(iii): Performance improvement over counterparts: Achieving an average accuracy of 85.90% across a range of well-known datasets, our method demonstrated a 3.91% improvement compared to the counterpart methods, based on mathematical evaluations.
(iv): Higher accuracy results than the state-of-the-art studies: Experiments on the same datasets showed that MMLMT achieved a 9.87% improvement in average accuracy compared to the results of state-of-the-art studies in the literature.
(v): Methodological advantages: It implements the LMT classifier to provide model interpretability, understandability, and explainability while maintaining high predictive accuracy. Since it constructs a decision tree-based model like a flowchart, it can be considered a highly explainable artificial intelligence (XAI) method.
(vi): Practical implications: Demonstrating MMLMT’s effectiveness across a diverse set of eight datasets spanning multiple domains underscores its applicability and reliability in undertaking multi-class multi-label classification challenges in various fields. This highlights its potential for advanced multi-label learning applications.

This paper is structured as follows: Section 2 covers a review of related works, followed by Section 3 detailing the methods and materials. Section 4 presents the experimental studies that were conducted on eight publicly available datasets, while Section 5 discusses the comparative results. Section 6 outlines the conclusions drawn from the findings and provides future research directions.

2. Related Works

Although many successful algorithms have been proposed for single-label learning problems in the literature, different algorithms should be implemented to provide effective solutions for multi-label problems, so research on multi-label learning (MLL) problems is needed [26]. MLL has been applied in different areas such as disease prediction [27,28], drug repurposing [29], biomedical applications [30], image classification [31,32,33,34], natural language processing [35,36,37,38], education [39], industry [40], and transportation [41].

In [27], researchers applied multi-label active learning algorithms on a heart disease dataset. The adaptive synthetic data-based multi-label classification (ASDMLC) approach was proposed in [28] and it was tested with three different disease datasets. Although the computation time was longer, higher performance was achieved than other models. In another study [29], a multi-label learning framework was proposed for drug repurposing. Their results showed that it could generalize well in a huge drug space without the information of drug target protein and chemical structures. In [30], researchers used multi-label extreme learning machines for taxonomic categorization of DNA sequences. Their study drew attention to the deep learning methods to increase classification performance in biological taxonomy.

In the field of image classification, a number of multi-label learning methods have been successfully developed. For example, the classification task was performed on noisy multi-label food images [31]. The data were evaluated with three different deep neural networks: ResNet-50, attentive feature mixup (AFM), and the proposed attentive feature cam-driven mixup (AFCM). With the AFCM method, more successful results were obtained compared to other methods in noisy multi-label image data. In [32], the researchers focused on the problem of classifying multi-label chest X-ray images. In their study, MLL was performed with the EfficientNet transfer learning technique. In another study [33], deep learning frameworks were used to classify movie genres. Successful results were achieved with VGG16, ResNet, DenseNet, Inception, MobileNet, and ConvNeXt models that were trained on a dataset consisting of multi-label movie posters. In [34], researchers proposed an ensemble model based on gradient-weighted class activation mapping (Grad-CAM) and they applied it to the retinal fundus multi-disease image dataset (RFMiD). It was stated in the study that the Grad-CAM method was effective in lesion detection.

One of the MLL studies conducted in the field of natural language processing is by [35], which was carried out to evaluate comments with more than one tag. In [36], MLL techniques were used to analyze people’s emotional states based on their posts on social media. A deep learning-based approach was used as a solution to the multiple emotion classification problem. In [37], an effective MLL approach was proposed for multi-label Arabic text classification with ensemble learning and the genetic algorithm-based metaheuristic feature selection method.

MLL has also been used in the field of education to predict students’ learning styles. For example, in [39], the authors used a multi-label classification method by considering that a student may have more than one dominant learning style. In [40], the problem of classifying multi-label unbalanced big data in the industrial field has been addressed and evaluated with the extreme gradient boosting classifier and the histogram gradient boosting classifier. It was stated that their study provided practical insights into solving problems that may be encountered in real industrial data. Deep learning studies have been conducted on multi-label datasets to support transportation in smart cities. For instance, in [41], which was carried out with nine different datasets, it was concluded that YOLO versions (YOLOv8, v7, v6, and v5) were more successful than other deep learning methods, with an overall accuracy rate of 90%.

Multi-class multi-label classification tasks have garnered significant attention in recent research due to their ability to handle complex datasets that have multiple target attributes, each of which can have two or more different values. Traditional classification methods often fall short in these scenarios, necessitating the development of specialized approaches in various domains such as natural disaster management [42], ophthalmology [43], medical diagnosis [44], and pedestrian attribute recognition [45]. In [42], a two-stage BERT-based model for multi-class multi-label classification of typhoon damage in social media texts. The first stage identifies damage-related texts using sentence vectors, and the second stage classifies these texts into damage categories with word matrices. In [43], CNN-based models were developed for multi-class multi-label classification of ophthalmological diseases using fundus images, and the experiments showed that the VGG16 architecture with a stochastic gradient descent optimizer yielded the best performance.

In [44], the Stacked Dark COVID-Net was introduced, which preprocesses images using contrast-limited adaptive histogram equalization (CLAHE) and classifies them to assist in COVID-19 detection. The model addresses challenges like overfitting and computational overhead, achieving high accuracy in diagnosing COVID-19 from chest X-rays. In [45], a CNN-based approach was represented for recognizing pedestrian attributes from images. It experimented with various convolutional layer depths, demonstrating that deeper models achieved higher performance on the dataset with a range of features.

In studies comparing the performances of machine learning algorithms in the literature, it has been indicated that the LMT algorithm for classification tasks was more successful than other machine learning techniques [46,47,48,49]. To create a landslide susceptibility map, it was suggested to use the LMT algorithm in [46], where the performances of five different machine learning algorithms, namely SVM, ANN, LR, NBT, and LMT, were compared.

Similarly, the authors emphasized that the LMT approach in landslide susceptibility mapping is more promising than other machine learning approaches, according to the results obtained from the AUC metric [47]. In [48], where four different machine learning algorithms were compared to create flash flood susceptibility maps, it was stated that the LMT method displayed the highest performance. In [49], where the LMT algorithm was evaluated to determine underground column stability, more successful results were obtained compared to the other machine learning algorithms in the literature. Based on the success of the LMT method in these previous studies, we preferred to use this method in our study.

In summary, Figure 1 illustrates the timeline of the investigated related works from 2019 to 2024. This figure includes the references, highlighting the authors’ names and the domain of study for each, providing a comprehensive overview of the developments and contributions in this field over the specified period. Despite advancements in classification techniques, developing methods for multi-class multi-label datasets is still limited in the literature. There is a need for novel approaches that provide satisfactory results in different domains. The combination of multi-class multi-label classification with LMT remains unexplored in the existing literature. Implementing LMT for multi-class multi-label classification tasks could offer significant improvements in accuracy and insights for complex datasets.

3. Materials and Methods

3.1. Proposed Method

This paper proposes a new method for multi-class multi-label datasets, named the multi-class multi-label logistic model tree (MMLMT). Our approach leverages the strengths of multi-label learning, which permits the simultaneous prediction of multiple classes–labels, enhancing the ability of the model to capture complex relationships within the data. The core of MMLMT is the logistic model tree (LMT) algorithm, which combines logistic regression and decision tree algorithms. LMT offers the advantage of producing understandable models for explainable artificial intelligence with high predictive accuracy. By combining the strengths of decision trees and logistic regression, our method provides a flexible and powerful framework for handling multi-class multi-label data, resulting in an efficient model for classification tasks. MMLMT, supported by strong mathematical foundations, is crafted to make precise predictions. Extensive experiments demonstrated the efficiency of MMLMT across a range of well-known datasets, showcasing its potential as a valuable approach in multi-label learning.

The general structure of the proposed MMLMT method is illustrated in Figure 2 and described as follows. The multi-class multi-label dataset, consisting of

N

samples with

k

features and

p

target attributes, undergoes a data preparation phase such as data cleaning. After that, it is converted into

p

individual datasets where each of these includes

k

features and one target attribute. Each of these datasets is then used to train an LMT model. Specifically, a logistic model tree is constructed for each dataset, resulting in

p

distinct trees corresponding to the

p

target attributes. These trees are then aggregated to form the final predictive model. The model aggregation step combines the outputs of the individual trees to generate the final prediction. MMLMT tends to make accurate multi-label classifications, benefiting from the strengths of both logistic regression and decision trees. The next step evaluates the aggregated model’s performance, confirming the capability of the MMLMT in dealing with complex multi-class multi-label datasets and making accurate predictions. In this step, diverse mathematical metrics such as accuracy, sensitivity, and precision–recall curve area can be used to comprehensively evaluate performance. The final step involves the usage of the model to make predictions based on the query input.

The logistic model tree (LMT) algorithm can be effectively employed in multi-label classification tasks by applying it individually to each set of class labels. The MMLMT approach allows LMT to efficiently address multi-class multi-label datasets and potentially can improve classification accuracies across multiple labels. The LMT algorithm integrates logistic regression with decision tree principles, forming a flexible tree structure that adapts well to various data types, including binary, nominal, and numeric data. LMT is capable of predicting class labels using both qualitative and quantitative predictors. Additionally, it allows for the extraction of rule sequences from the tree to generate predictions based on input values, making it a versatile approach to handling complex datasets. Moreover, LMT can be considered a highly explainable artificial intelligence (XAI) since it constructs a decision tree-based model like a flowchart that can be easily interpretable. The limitation of the LMT method lies in addressing missing values since it utilizes a simple global imputation scheme for filling in those blanks [7]. Even though our experimental studies did not show any drawback in this regard, a more advanced scheme for handling missing values might be used for further analysis in the different domains where those frequently appear.

3.2. Formal Expression

In this section, we detail the theoretical foundation of the proposed MMLMT method for classification tasks in machine learning. Traditional supervised learning algorithms are designed for single-label scenarios, where each training sample is associated with only one label that defines its characteristics. Conversely, multi-label learning algorithms deal with training samples linked to multiple labels and the aim is to accurately predict the set of labels for new samples. The final goal of multi-label learning algorithms is to develop a machine learning classifier

C

that, for a given unlabeled instance

S = (x, ?)

, predicts its subset of labels

Y

accurately. This prediction is indicated as

C (S) \to Y,

where

Y

consists of the labels associated with instance

S

. From the perspective of mathematics, a multi-label classification algorithm aims to learn the function

f : X \to Y

that maps feature vectors

x \in X

to subsets of labels

y \subseteq Y,

such that

f (x) \subseteq Y

function predicts the correct subset

y

for unseen instances. The concept of the proposed MMLMT method is formally defined as follows.

Let

D

denote the multi-class multi-label dataset, consisting of

N

instances

S_{i} = (x_{i}, Y_{i}),

where

i = 1, 2, \dots, N

. Each instance

S_{i}

includes a feature vector

x_{i} = (x_{i 1}, x_{i 2}, \dots x_{i k})

with

k

elements and is associated with a subset of labels

Y_{i} \subseteq L

. Here,

L = \{L_{1}, L_{2}, \dots, L_{p}\} = \{y_{j}| j = 1 t o q}

symbolizes a total set of

q

possible class labels organized into

p

target attributes. This data representation is illustrated in Table 1. It shows a multi-class multi-label dataset in which each instance

S

is linked to a subset of labels represented by

Y

. For example, the

S_{1}

instance is linked to the label set

Y_{1}

, indicating that this instance has labels

y_{1},

y_{3},

and

y_{7}

. Here, the label set

Y_{1}

comprises the concatenation of labels, including

y_{1}

,

y_{3}

, and

y_{7}

. These representations underscore the dataset’s multi-label characteristics, illustrating instances that can be assigned multiple labels concurrently. Moreover, the classes of target attributes involve

L_{1} = \{y_{1}, y_{2}\}, L_{2} = \{y_{3}, y_{4}, y_{5}\},

and

L_{3} = \{y_{6}, y_{7}, y_{8}, y_{9}, y_{10}\}

. Therefore,

L = \{L_{1}, L_{2}, L_{3}\} = \{{y_{1}, y_{2}, y_{3}, y_{4}, y_{5}, y}_{6}, y_{7}, y_{8}, y_{9}, y_{10}\}

is the total set of classes of labels for all

N

instances, where each triple subset of

L

, e.g.,

Y_{1} = {y_{1}, y_{3}, y_{7}}

, is assigned to the first instance,

Y_{2} = {y_{2}, y_{4}, y_{9}}

is assigned to the second instance, and so on. Here,

Y = \{Y_{1}, Y_{2}, Y_{3}, \dots, Y_{N}\}

is the total set of labels. In this example,

p

and

q

are equal to 3 and 10, respectively (

p

= 3 and

q

= 10).

In the MMLMT method, the original task, which has

q

class-labels in set

L = \{y_{1}, y_{2}, {\dots, y}_{q}\}

organized into

p

target attributes

L = \{L_{1}, L_{2}, \dots, L_{p}\},

is decomposed into multiple classification tasks. Essentially, this method transforms the original multi-class multi-label training dataset into

p

individual datasets

D_{j}

, for

j = 1, 2, \dots, p

. Each dataset

D_{j}

includes all instances from the original dataset but with the corresponding target attribute (

L_{j}

).

Table 2 presents three individual training datasets obtained after converting the multi-class multi-label dataset given in Table 1. Each dataset

D_{j}

includes a different target attribute

L_{j}

from the original dataset. Each row in Table 2 pertains to an instance

(S_{1}, S_{2}, \dots, S_{N})

from the initial dataset, while each outcome column is a distinct class–label

(L_{1}, L_{2}, {o r L}_{3})

. For each dataset

D_{j}

, the instances are labeled with the corresponding class values in

L_{j}

. For example, in Table 1,

S_{1}

is linked to the label set

Y_{1} = \{y_{1}, y_{3}, y_{6}\}

, indicating that this instance has labels

y_{1}

in

D_{1}

,

y_{3}

in

D_{2}

, and

y_{6}

in

D_{3}

. This is reflected in Table 2 by the presence of

y_{1}

,

y_{3}

, and

y_{6}

, in

L_{1}, L_{2}

, and

L_{3}

output columns, respectively. This transformation allows traditional classification algorithms to be applied to each individual dataset, simplifying the multi-class multi-label classification task.

After the data transformation process,

p

independent classifiers

C_{j}

for

j = 1, 2, \dots, p

are established using the datasets

D_{j}

. These models are then combined to form the general classifier

C

, as presented in Equation (1).

C = \{C_{j} (x) \to {\hat{y}}_{j} | {\hat{y}}_{j} \in L : j = 1 \dots p\}

(1)

LMT employs the LogitBoost [50] algorithm for building additive logistic regression functions at each tree node by selecting the most relevant attributes in the data. LMT also uses the classification and regression tree (CART) algorithm for pruning, leading to improved classification performance. A key advantage of LMT is its combination of logistic regression and classification with a validation technique to determine the optimal number of LogitBoost iterations, thereby preventing overfitting. The algorithm employs a least-squares

f i t (B_{c} (x))

for each class

c

, as mathematically detailed in Equation (2).

B_{c} (x) = \sum_{i = 1}^{k} β_{i} x_{i} + β_{0}

(2)

where

$x$ : the input vector containing $k$ features or predictors;
$β_{i}$ : the coefficient associated with the ith feature $x_{i}$ ;
$β_{0} :$ the intercept term added to the weighted sum;
$B_{c} (x)$ : the logit or linear combination of input features $x_{i}$ weighted by coefficients $β_{i}$ for class $c$ .

The algorithm also employs logistic regression to calculate the probabilities assigned after observing the data at each node of the tree. This process is defined mathematically by Equation (3).

P r (c | x) = \frac{e x p (B_{c} (x))}{\sum_{c^{'} = 1}^{q} e x p (B_{c^{'}} (x))}

(3)

where

$q$ : the total number of possible labels;
$P r (c | x)$ : the probability of the instance belonging to class $c$ given the input vector $x$ , normalized by the sum of the exponential values of the logits across all possible labels $c^{'}$ .

This theoretical framework is straightforwardly applicable for configuring the prediction process in machine learning models. It is supported by strong mathematical foundations to make precise predictions.

3.3. Algorithm

The proposed multi-class multi-label logistic model tree (MMLMT) algorithm is designed to tackle multi-label classification problems by transforming the problem into multiple classification tasks. This approach leverages logistic model trees to handle each classification task separately, thus efficiently managing the complexity associated with multi-label learning. The detailed process of the MMLMT algorithm is outlined in Algorithm 1. The algorithm operates on dataset

D

, which consists of

N

instances

(x_{i}, Y_{i})

, where each instance has a feature vector

x_{i}

and a corresponding set of labels

Y_{i}

. The labels

Y_{i}

are subsets of a larger set of all possible labels

L = \{L_{1}, L_{2}, \dots, L_{p}\}

, which are grouped into

p

target attributes. The main goal of the MMLMT algorithm is to predict the class labels for a test set

T

. This is achieved by first generating an individual dataset

D_{j}

for each target attribute

j

. Each dataset

D_{j}

corresponds to a classification problem focused on predicting the different possible values of the label

y_{j}

. The algorithm proceeds as follows:

Dataset generation: The algorithm iterates through each of the $p$ target attributes to generate individual datasets $D_{j}$ . This is accomplished by scanning the entire dataset $D$ and, for each instance $(x_{i}, y_{j})$ , adding the feature vector $x_{i}$ to $D_{j}$ along with the corresponding label $y_{j}$ . This step effectively transforms the original multi-label problem into $p$ separate classification tasks, each focusing on a specific target attribute.
Model training: For each individual dataset $D_{j}$ , the algorithm builds a logistic model tree classifier $C_{j}$ . By training a separate classifier for each target attribute, the algorithm ensures that each label is predicted independently.
Classification phase: During the classification phase, the algorithm applies each classifier $C_{j}$ to predict the corresponding label for each instance in the test set $T$ . Specifically, for each test instance $x$ , each classifier $C_{j}$ generates a predicted label $y_{j}$ . The outputs of all classifiers are aggregated to form a vector $\hat{Y_{j}}$ for each test instance. This aggregation step effectively reconstructs the multi-label nature of the problem by combining the individual predictions into a final set of predicted labels.
Final output: The final set of predicted labels $\hat{Y}$ for the test set $T$ is stored in a list. This list represents the algorithm’s best estimate of the true labels for each test instance, considering the independencies between the labels through the individual classification tasks.

Overall, the MMLMT algorithm provides a structured approach to managing the complexity of multi-label classification by decomposing it into more manageable sub-problems. This method not only simplifies the problem but also utilizes the strength of logistic model trees to provide accurate and interpretable predictions.

Algorithm 1: Multi-class Multi-label Logistic Model Tree (MMLMT)
Inputs:
$D$ = ${(x_{i}, Y_{i})}_{i = 1}^{N}$ : dataset with $N$ instances, containing features $x_{i}$ and labels $Y_{i} \subseteq L$
$L = \{L_{1}, L_{2}, \dots, L_{p}\} = \{y_{1}, y_{2}, {\dots, y}_{q}\}$ : set of all possible class labels, grouped into $p$ target attributes
$p$ : the number of target attributes
$q$ : the number of total class labels
$T$ : testing set to be predicted
Outputs:
$\hat{Y} :$ predicted labels for the instances in $T$
Begin:
for each $(x_{i}, Y_{i})$ in $D$	// Generation of individual datasets
for $j$ = 1 to $p$
$D_{j} . I n s e r t (x_{i}, y_{j})$
end for
end for each
for $j$ = 1 to $p$
$C_{j}$ = $L M T (D_{j})$	// Build classifiers
end for
for each $x$ in $T$	// Classification
for $j$ = 1 to $p$
$y$ = $C_{j} (x)$
$\hat{Y_{j}}$ = $\hat{Y_{j}} \cup y$
end for
$\hat{Y}$ = $\hat{Y} \cup \hat{Y_{j}}$
end for each
End

In practice, with

N

number of samples, the scalability of MMLMT increases linearly with the size

p

of the label set

L

. Since the complexity is limited to the base classifier LMT, denoted by

O (C)

, the overall complexity of MMLMT is therefore

p \times O (C)

, from a mathematical perspective. Hence, the MMLMT method is particularly well suited for scenarios where the number of labels is relatively moderate.

4. Experimental Studies

4.1. Dataset Description

This research utilizes eight real-world multi-class multi-label datasets [51,52,53,54,55,56,57,58] to showcase the functionalities of the presented MMLMT method. Table 3 provides an overview of the characteristics of these datasets. Each dataset—Drug-Consumption, Enron, HackerEarth-Adopt-A-Buddy, Music-Emotions, Scene, Solar-Flare-2, Thyroid-L7, and Yeast—is publicly available from various machine learning repositories. These datasets encompass a wide range of features, spanning from 10 to 1001, with instances varying from 593 to 18,834, and label numbers from 2 to 53. They originate from diverse domains including drugs, text, animals, music, image, physics, healthcare, and biology. The datasets contain categorical, numerical, and mixed-type values, reflecting their varied nature and suitability for different analytical machine-learning techniques.

4.1.1. Drug-Consumption Dataset

The Drug-Consumption dataset consists of 1885 records, each representing an individual respondent characterized by 12 real-valued features. These features primarily include personality measurements to evaluate the risk factors associated with drug consumption. The dataset addresses 18 classification tasks, targeting different drug usage behaviors. Each classification task involves categorizing respondents into one of seven classes based on their reported drug usage frequencies such as never used, used in last year, used in last week, and so on.

4.1.2. Enron Dataset

The Enron dataset is primarily concerned with classifying emails into various categories. This dataset comprises 1702 instances and 53 labels with a cardinality of 3.78. The dataset has 1001 attributes, making it a valuable resource for research in machine learning and natural language processing. It encompasses the content and metadata of emails, making it useful for tasks like text classification, social network analysis, anomaly detection, and studying organizational communication patterns.

4.1.3. HackerEarth-Adopt-A-Buddy

The HackerEarth-Adopt-A-Buddy dataset, sourced from Kaggle, is a multi-label dataset consisting of 18,834 instances with 11 features. This dataset was designed to support virtual pet adoption efforts during the pandemic, helping to engage potential pet owners by providing a virtual experience of animals available for adoption. Machine learning methods can utilize this dataset to predict pet types and breeds based on various attributes. This comprehensive dataset offers a robust foundation for developing and evaluating predictive models.

4.1.4. Music-Emotions Dataset

The Music-Emotions dataset was designed to study the classification of emotions elicited by music. It comprises 593 instances, each characterized by 72 attributes that are divided into 64 timbre features and 8 rhythmic features. Each instance in the dataset has been labeled by a group of experts based on the emotions the music produces such as amazed–surprised, relaxing–calm, happy–pleased, sad–lonely, and so on. This dataset is an outstanding resource in the field of music information retrieval and affective computing. It supports applications in automatic music emotion recognition, personalized music recommendation, and adaptive music therapy systems.

4.1.5. Scene Dataset

The Scene dataset consists of 2407 instances representing 6 distinct classes, including beach, mountain, urban, field, fall foliage, and sunset. Each image is subdivided into 49 blocks, and the dataset includes 294 numeric attributes derived from spatial color moments. With a cardinality of 1074, this dataset is utilized for multi-label classification tasks in computer vision and pattern recognition. Researchers preprocessed the dataset by normalizing feature values to ensure consistency across images and optimize classification accuracy. Its applications span various domains, such as remote sensing, image retrieval, and environmental monitoring, emphasizing its relevance in academic research and practical implementations.

4.1.6. Solar-Flare-2 Dataset

The Solar-Flare-2 dataset is a multivariate dataset used to predict solar flare activities. It consists of 1066 instances, each representing features extracted from active regions on the sun. These features, categorized into ten attributes, include sunspot area, magnetic field complexity, and historical flare occurrences data. Each categorical feature records the frequency of specific types of solar flares within 24 h. Researchers utilize this dataset to explore correlations between solar activities and flare occurrences, contributing to advancement in solar physics and space weather forecasting capabilities.

4.1.7. Thyroid-L7 Dataset

The Thyroid-L7 focuses on various aspects of thyroid health, involving a range of medical attributes and conditions. It consists of 29 attributes, 9172 instances, and 7 seven labels for thyroid-related conditions, including hyperthyroid, hypothyroid, binding protein disorders, general health issues, replacement therapy needs, anti-thyroid treatments, and discordant results. The dataset also encompasses various demographic and medical attributes such as age, sex, and measurements of thyroid-related hormones. The referral source for each instance is categorized into six possible values, including WEST, STMW, and SVI. The dataset structure supports comprehensive analysis and prediction of thyroid disorders, focusing on identifying multiple co-occurring issues within individual patients.

4.1.8. Yeast Dataset

The Yeast dataset aims to predict functional classes in the genome of the yeast Saccharomyces cerevisiae and has been used for physiological data modeling contests. It contains a total of 2417 instances, each representing a gene characterized by 103 numeric feature attributes. In this dataset, biological functions are represented by 14 target variables that indicate various gene functional groups, such as metabolism and energy. This dataset provides information about several types of genes within particular organisms, facilitating multi-label classification tasks and enabling researchers to identify and categorize the functional roles of different genes within the yeast genome.

4.2. Experiment Details

This study introduces a novel method, named MMLMT, specifically designed for multi-class multi-label classification tasks. This method integrates insights from logistic model trees to handle the inherent structure of multi-class multi-label classification models, making them efficient to implement. The potency of the MMLMT method was demonstrated through validation on specialized datasets, including Drug-Consumption, Enron, HackerEarth-Adopt-A-Buddy, Music-Emotions, Scene, Solar-Flare-2, Thyroid-L7, and Yeast. Our method was implemented in the C# programming language by using the Weka library [59]. To ensure the reproducibility of our results, the source codes are publicly available in the GitHub repository (https://github.com/BitaGhasemkhani/MMLMT, accessed on 4 July 2024). This repository includes all relevant codes and documentation, providing the complete implementation of the algorithm along with detailed instructions for replicating the experiments. The LMT classifier was constructed in our experiments with hyperparameters set, as represented in Table 4.

One of the hyperparameters of the LMT algorithm is the “MinNumInstances”, which refers to the minimum number of instances at which a tree node is considered for splitting. For instance, if it is set to 15, this means that a node is not split if it includes less than 15 samples. Therefore, changing this value can affect the size of the tree (i.e., the number of nodes). Another hyperparameter is “NumBoostingIterations”, which identifies the number of times that the process will be performed. It is reported that it seems that this hyperparameter does depend on the domain; however, it does not change so much for different subsets of a particular dataset since it is encountered in lower levels in the tree [7]. Although a large number of iterations may not change the accuracy so much, it increases the computational cost. For this reason, the best idea is to set it to −1, which automatically determines the number of boosting iterations.

During the experiments, we utilized the 10-fold cross-validation approach to train and evaluate our classification model. We employed a comprehensive range of standard metrics to assess the performance of the MMLMT method across various dimensions in the evaluation step. These metrics include accuracy, sensitivity, and precision–recall curve (PRC) area. Each metric provides unique visions into different aspects of the model’s impact on multi-label classification scenarios. The mathematical formulas for these metrics, which use true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), are presented in Equations (4) to (6) as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

Furthermore, we employed several non-parametrical statistical analyses, including Friedman Aligned Ranks, Quade, and Wilcoxon tests to validate the significance of experimental results with acceptable p-values.

4.3. Results

Table 5 summarizes the accuracy of eight classification algorithms across eight datasets, highlighting the superior performance of our proposed MMLMT algorithm. These methods are multi-class multi-label random tree (MMRT), multi-class multi-label naïve Bayes (MMNB), multi-class multi-label k-nearest neighbors (MMKNN), multi-class multi-label logistic regression (MMLR), multi-class multi-label k-star (MMK-Star), multi-class multi-label locally weighted learning (MMLWL), and multi-class multi-label Hoeffding tree (MMHT). These algorithms were selected for their popularity and proven effectiveness in classification tasks, particularly within the context of multi-class and multi-label problems.

The MMRT algorithm utilizes the random tree [60] classifier, known for its robustness and ability to handle large datasets with high-dimensional feature spaces. MMNB is based on the naïve Bayes [61] classifier, favored for its simplicity and computational efficiency. The MMKNN algorithm employs the k-nearest neighbors [62] method, a non-parametric approach widely used due to its effectiveness in capturing local patterns in data. MMLR leverages logistic regression [63], a linear model well regarded for its interpretability and reliable performance in both binary and multi-class classification tasks. The MMK-Star algorithm involves k-star [64] which offers a flexible similarity measure based on entropy. MMLWL implements locally weighted learning [65], an instance-based method that adapts well to varying data distributions by giving more weight to nearby instances. Lastly, MMHT utilizes the Hoeffding tree [66], a decision tree algorithm designed for streaming data classification.

These methods were chosen not only for their prevalence in the literature but also for their complementary strengths across different types of datasets. By applying these algorithms within a multi-class multi-label framework, we provide a thorough comparison, ensuring that the evaluation of our MMLMT method is both robust and comprehensive. The proposed MMLMT method outperformed other algorithms by achieving the highest average accuracy of 85.90%. It can be noted here that it obtained a 3.91% improvement on average compared to other methods.

Additionally, the computational complexity of the discussed classification algorithms is summarized in Table 6. This table outlines the training and prediction complexities of each method, offering a clear view of their computational demands. Here, the mathematical symbols including

n

,

k

, and

c

represent the number of instances, features, and classes, respectively, within each algorithm. Understanding these complexities helps in evaluating the practical applicability of the algorithms, especially when dealing with large datasets or resource constraints. This comparative overview aids in appreciating the balance between computational requirements and the performance of the algorithms, providing context for their use in various scenarios.

The experimental results revealed that the MMLMT method consistently achieved equal to or higher accuracy across all the datasets compared to the other methods. Notably, the highest accuracy (99.03%) was obtained by MMLMT on the Thyroid-L7 dataset. Following this, MMLMT demonstrated peak performance by achieving top accuracies of 95.40% and 93.03% on the Enron and Solar-Flare-2 datasets, respectively. Additionally, MMLMT showed strong performance across other datasets, reinforcing its adaptability. Comparative analysis shows that MMLR, MMLWL, and MMHT also performed well with average accuracies of 83.86%, 83.27%, and 83.10%, respectively, but were reliably outperformed by MMLMT. The algorithms like MMNB and MMRT had lower average accuracies of 78.15% and 80.81%, respectively, demonstrating their relative inefficacy. These results underscore the efficiency of MMLMT in attaining high accuracy in multi-class multi-label classification tasks across diverse datasets.

The sensitivity results obtained for the MMLMT method across various datasets are presented in Figure 3. The sensitivity metric indicates the true positive rate for each dataset, ranging from 0 to 1. The results show that the highest sensitivity (0.99) was achieved on the Thyroid-L7 dataset, followed closely on Enron with a sensitivity of 0.95. The performances on the Scene and Solar_Flare_2 datasets also have high sensitivity values in the range of 0.90 to 0.93. The HackerEarth-Adopt-A-Buddy, Yeast, and Music-Emotions datasets have sensitivity values between 0.80 and 0.87, while the Drug-Consumption dataset has the lowest sensitivity of 0.63. These sensitivity metric results showcase the MMLMT method’s impact on accurately identifying true positives across various datasets.

The results for the PRC Area metric using the MMLMT method revealed varying levels of performance across different datasets, as presented in Figure 4. The highest one (0.99) was obtained on the Thyroid-L7 dataset, followed closely by the HackerEarth-Adopt-A-Buddy, Scene, and Enron datasets with approximately 0.95 scores. This demonstrates the MMLMT method’s potency in handling multi-class multi-label classification tasks for these datasets. Additionally, high scores were also observed for Yeast, Music-Emotions, and Solar-Flare-2, ranging from 0.78 to 0.88, reflecting acceptable performance. Conversely, the Drug-Consumption dataset exhibited the lowest PRC Area value.

4.4. Statistical Analysis of Results

In this study, we rigorously assess the performance of our proposed MMLMT method against several established classification approaches, including MMRT, MMNB, MMKNN, MMLR, MMK-Star, MMLWL, and MMHT. Our comparative analysis spans multiple datasets to ensure reliable and comprehensive results. To substantiate our findings, we applied a series of non-parametric statistical tests, including Friedman Aligned Ranks [67], Quade [68], and Wilcoxon [69]. The Friedman Aligned Ranks test aligns dataset rankings to competently compare multiple algorithms, while the Quade test adjusts ranks to account for variations between distinct blocks of data. With a significance level of 0.05, we obtained p-values of 0.00086 and 0.00013 for the Friedman Aligned Ranks and Quade tests, respectively. The results underscore the robustness of our findings, indicating that MMLMT offers a statistically significant enhancement in predictive accuracy compared to the other approaches tested in our experimental setup.

The mathematical formulas of Friedman Aligned Ranks and Quade tests are presented in Equations (7) and (8), respectively.

Q = \frac{12}{n k (k + 1)} \sum_{j = 1}^{k} {(R_{j} - \frac{n (k + 1)}{2})}^{2}

(7)

where

$n$ : the number of blocks in the experimental design;
$k$ : the number of methods being compared;
$R_{j}$ : the sum of ranks for method $j$ , which is calculated based on how well each method performs across the blocks;
$Q$ : The Friedman Aligned Ranks test result.

T = \frac{\sum_{i = 1}^{n} {(\bar{R_{i}} - \bar{R})}^{2}}{\frac{1}{k - 1} \sum_{j = 1}^{k} {(\bar{R_{j}} - \bar{R})}^{2}}

(8)

where

$n$ : the total number of observations or data points;
$k$ : the number of methods (treatments) being compared;
$\bar{R_{i}}$ : the average rank of the $i$ th block, indicating how well methods perform within each block;
$\bar{R} :$ the overall average rank across all blocks, providing a baseline for comparison;
$\bar{R_{j}}$ : the average rank of method $j$ across all blocks, indicating the overall performance of each method averaged across blocks;
$T$ : test statistic representing the Quade test result.

The Wilcoxon test evaluates whether the mean ranks of related samples differ significantly. This test validates the superior performance of MMLMT, ensuring that the observed advances in predictive accuracy are statistically significant and reflect genuine enhancements, rather than random chance. The result of the Wilcoxon test is shown in Table 7, which demonstrates that MMLMT consistently outperformed its counterparts with all p-values below the significance level of 0.05. Specifically, the Wilcoxon test yields p-values as low as 0.01172 for MMRT, MMNB, MMKNN, MMLR, and MMK-Star, and shares 0.01796 for MMLWL and MMHT. This underscores the sturdiness and statistical significance of MMLMT’s performance advancements.

The mathematical formula for the Wilcoxon test is presented in Equation (9).

W = \sum_{i = 1}^{n} {R_{i}}^{+}

(9)

where

$n$ : the total number of paired observations or data points;
${R_{i}}^{+}$ : the rank assigned to the positive differences between paired observations, indicating how much each pair contributes to the test statistic;
$W$ : the Wilcoxon test statistic, which is calculated based on the sum of ranks of positive differences between paired observations.

4.5. LMT Structure Analysis

Figure 5 shows an example logistic model tree built by the proposed MMLMT method that was applied to the Thyroid-L7 dataset for a target attribute. This LMT is a tree of height 7, containing 27 nodes, including 13 internal nodes and 14 leaves, spanning across levels. The class is processed with LMT configured with no iteration limits and a minimum of 15 instances per leaf. The structure begins with a split on thyroid-stimulating hormone (TSH) values at the root, leading to further splits on attributes such as free thyroxine index (FTI), query_hypothyroid, T3, and on_thyroxine. Each internal node represents a decision based on an attribute, and the leaf nodes contain logistic regression models predicting class probabilities. For example, if TSH ≤ 6 and FTI ≤ 61 mathematically, the tree further splits on query_hypothyroid and T3, leading to leaf nodes LM_1 and LM_2. In LMT, each leaf node is represented by a logistic model identifier like as LM_1, LM_2, and so on. The numbers in a leaf give statistical information about it such as the total number of instances reaching this leaf. According to the tree, if TSH > 6, the tree splits on FTI and further on T3, on_antithyroid_medication, thyroid_surgery, and TT4, leading to various logistic models like LM_5, LM_6, and LM_7. This hybrid approach offers the understandability of decision trees and the predictive power of logistic regression, capably undertaking diverse attributes and providing valuable insights into medical diagnoses.

A significant advantage of a tree model is its easy interpretability. A path in a tree essentially corresponds to a conjunction of Boolean expressions of the form ‘attribute ≤ value’ (for numeric attributes) or ‘attribute = value’ (for nominal attributes). Therefore, a tree model can be seen as a collection of rules that explain how to classify instances. This explanatory power is particularly beneficial in applications, where understanding the rationale behind predictions is crucial for gaining the trust of professionals and ensuring alignment with informed decisions. By examining the specific paths and splits in the tree structure, experts can identify the most influential factors in the data. In real-world scenarios, LMT has demonstrated its interpretability across various domains, providing clear, rule-based insights that enhance decision-making processes. For instance, in finance, LMT can assist institutions in creating transparent models for credit scoring, where the model clearly outlines the factors leading to loan approval or rejection. This transparency ensures that stakeholders understand the reasoning behind financial decisions, fostering trust and compliance with regulatory standards. Such capabilities make LMT a versatile tool in many fields where clear, understandable decision-making is essential. The easy comprehensibility of the model is an important property in terms of explainable artificial intelligence (XAI) requirements. The LMT’s ability to provide concise rules, combined with robust prediction capabilities, makes it an excellent tool for applications where both accuracy and interpretability are paramount.

In the context of the proposed MMLMT method, the logistic regression model applied to the target attribute of the Thyroid-L7 dataset offers a clear and quantitative representation of the predictive factors influencing diseases. As an example, the mathematical expression of logistic regression for Class NEG in LM_1 is given in Equation (10). This logistic model captures the complex relationships among various medical attributes, quantifying their contributions to the likelihood of a disease diagnosis. For instance, being on thyroxine significantly increases the log odds of a positive classification (6.15), whereas higher T3 levels decrease these odds (−15.62). The model’s parameters provide valuable insights, such as the strong positive impact of querying hyperthyroidism (38.41) and the substantial negative influence of T4U (−7.39). These coefficients, derived from the logistic regression model at leaf nodes of the decision tree, exemplify the hybrid approach’s strength. Notably, Equation (10) is a specific instance of the more general logit formula presented in Equation (2), mathematically outlining the general form of logistic regression used across various leaf nodes in the LMT structure.

\begin{matrix} LM_1 \\ Class NEG = - 8.3 & + 0.15 \times [age] + \\ - 0.12 \times [sex = F] + \\ 6.15 \times [on_thyroxine = t] + \\ 1.6 \times [query_on_thyroxine = t] + \\ 1.6 \times [on_antithyroid_medication = t] + \\ 6.77 \times [sick = t] + \\ 4.24 \times [thyroid_surgery = t] + \\ - 0.54 \times [query_hypothyroid = t] + \\ 38.41 \times [query_hyperthyroid = t] + \\ - 0.8 \times [TSH_measured = t] + \\ - 1.1 \times [TSH] + \\ - 3.21 \times [T3_measured = t] + \\ - 15.62 \times [T 3] + \\ 0.87 \times [TT 4] + \\ - 7.39 \times [T 4 U] + \\ 0.06 \times [FTI] + \\ - 22.04 \times [referral_source = other] \end{matrix}

(10)

Table 8 illustrates an example confusion matrix for the target attribute with class-labels L, M, N, and NEG. The matrix shows the classifier’s performance with the diagonal elements representing the number of instances correctly classified for each label, e.g., M was accurately classified 121 times and NEG 8792 times. Off-diagonal elements indicate misclassifications, such that 8 instances of M are classified as NEG. The confusion matrix provides a detailed view of how the model performs across these labels, clarifying the accuracy and areas of misclassification. This analysis is essential for understanding the model’s effectiveness in managing the label structures of the dataset.

5. Discussion

In this section, MMLMT is compared with several state-of-the-art methods [70,71,72,73,74,75,76,77,78,79,80,81] in the field. Our analysis encompasses the accuracy metric over the Drug-Consumption, Enron, Music-Emotions, Scene, Solar-Flare-2, Thyroid-L7, and Yeast datasets, shown in Table 9. According to the results, our method showed a 9.87% improvement on average compared to state-of-the-art methods, underscoring the efficacy of MMLMT on the same datasets. MMLMT achieved higher accuracy than the naïve Bayes method [70] on the Drug-Consumption dataset. Similarly, the MMLMT method outperformed its counterpart [71], namely lblMLTC, with its perfect accuracy of 95.40% on the Enron dataset. MMLMT also showed its superiority over a wide range of techniques [72,73,74,75], including MLkNN, DASMLKNN, PLDLDSA, and more, for the Music-Emotions dataset, with a substantial improvement of 8.19% on average. Similarly, MMLMT revealed superior performance of 93.03% accuracy with a 16.11% enhancement compared to the previous approaches, such as 1R, KNN, RIPPER, and more [77,78,79,80], within the range of 66.04% to 85.40% accuracies across the Solar-Flare-2 dataset. This outcome highlighted the superior performance of the proposed method in correctly classifying instances in the dataset. The MMLMT method attained a small improvement on the Thyroid-L7 dataset when compared to its state-of-the-art peer, ELM. Finally, MMLMT presented a considerable improvement of 5.75% across the previous multi-label-based techniques (e.g., PLDLDSA, ML-kNN) on the Yeast dataset. These results underscore the efficacy of MMLMT across various datasets, demonstrating important improvements over the state-of-the-art methods in terms of accuracy measurement.

6. Conclusions and Future Works

This paper presents a novel classification method, named the multi-class multi-label logistic model tree (MMLMT). This method enhances flexibility in multi-label classification by integrating a multi-class approach. By utilizing the logistic model tree (LMT) algorithm, MMLMT effectively addresses the challenges inherent in multi-class multi-label classification tasks. This method combines the interpretability of decision trees with the powerful predictive accuracy of logistic regression, providing an efficient solution for complex classification problems. In the experimental studies, comprehensive evaluation using various mathematical metrics such as accuracy, sensitivity, and PRC Area, along with non-parametrical statistical analyses including Friedman Aligned Ranks, Quade, and Wilcoxon tests, confirmed the reliability and significance of our results with acceptable p-values.

The main findings of this study can be summarized as follows:

This study proposes the MMLMT method, which uniquely integrates logistic model trees with a multi-class multi-label classification technique, establishing a novel approach in the literature.
MMLMT addresses the classification tasks in which the dataset has multiple target attributes, where each target attribute in the dataset can have two or more different values.
According to the experimental results, the MMLMT method achieved an average accuracy of 85.90% across eight well-known datasets: Drug-Consumption, Enron, HackerEarth-Adopt-A-Buddy, Music-Emotions, Scene, Solar-Flare-2, Thyroid-L7, and Yeast.
MMLMT showed a 3.91% improvement over previous methods (MMRT, MMNB, MMKNN, MMLR, MMK-Star, MMLWL, and MMHT), reflecting its enhanced performance.
Comparative research with existing studies provided valuable benchmarks, demonstrating the superior performance of MMLMT. It achieved a 9.87% higher average accuracy compared to the results reported in the state-of-the-art studies, underscoring its advanced predictive capability.
The use of the logistic model tree classifier ensures a balance between high accuracy and model interpretability, making MMLMT a notably explainable artificial intelligence (XAI) method.
Demonstrating effectiveness across a range of datasets from various domains, the applicability and reliability of MMLMT have been proven in multi-class multi-label classification tasks.

In conclusion, our method offers a practical and accessible solution for tackling multi-label problems, effectively balancing interpretability and accuracy in classification tasks. The MMLMT method has been proven to have robust performance across various domains, making it a versatile approach for diverse applications in advanced multi-label learning. Its ability to handle different datasets with high accuracy while maintaining transparency in its decision-making process emphasizes its potential as a valuable approach for researchers and practitioners alike. This method contributes significantly to multi-label classification and establishes a solid foundation for addressing real-world challenges in the machine learning field.

While the MMLMT method has shown promising results, several avenues for future research may remain. First, ensemble learning solutions can be further integrated with MMLTM by aggregating the outputs of multiple models in handling more complex datasets. Moreover, a web application can be developed to provide an interface to the MMLMT model that enables users to perform analyses. Another potential area of exploration is the application of MMLMT to other real-world problems in diverse fields beyond those addressed in this study. Conducting domain-specific studies could provide valuable insights into the practical benefits of implementing the MMLMT method in different areas.

Author Contributions

Conceptualization, B.G. and K.F.B.; methodology, B.G. and K.F.B.; software, B.G.; validation, B.G.; formal analysis, K.F.B.; investigation, K.F.B.; resources, B.G. and K.F.B.; data curation, K.F.B.; writing—original draft preparation, B.G. and K.F.B.; writing—review and editing, D.B.; visualization, B.G.; supervision, D.B.; project administration, D.B.; funding acquisition, K.F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The “Drug-Consumption” dataset [51] is publicly available in the UCI machine learning repository (https://archive.ics.uci.edu/dataset/373/drug+consumption+quantified, accessed on 4 July 2024). The “Enron” dataset [52] is publicly available in the Mulan machine learning repository (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 4 July 2024). The “HackerEarth-Adopt-A-Buddy” dataset [53] is publicly available in the Kaggle machine learning repository (https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption, accessed on 4 July 2024). The “Music-Emotions” dataset [54] is publicly available in the Mulan machine learning repository (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 4 July 2024). The “Scene” dataset [55] is publicly available in the Mulan machine learning repository (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 4 July 2024). The “Solar-Flare-2” dataset [56] is publicly available in the UCL machine learning repository (https://archive.ics.uci.edu/dataset/89/solar+flare, accessed on 4 July 2024). The “Thyroid-L7” dataset [57] is publicly available on the MEKA platform (https://osdn.net/projects/sfnet_meka/downloads/Datasets/thyroid-L7.arff/, accessed on 4 July 2024). The “Yeast” dataset [58] is publicly available in the Mulan machine learning repository (http://mulan.sourceforge.net/datasets-mlc.html, accessed on 4 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this paper:

AFCM	Attentive feature class activation mapping-driven mixup
AFM	Attentive feature mixup
AI	Artificial intelligence
AIC	Akaike information criterion
ASDMLC	Adaptive synthetic data-based multi-label classification
AUC	Area under the ROC curve
BDT	Boosted decision tree
BERT	Bidirectional encoder representations from transformers
BN	Bayesian network
CART	Classification and regression tree
CC	Classifier chain
CLAHE	Contrast limited adaptive histogram equalization
CNN	Convolutional neural network
DASMLKNN	Discriminative adaptive sets multi-label k-nearest neighbors
DBN	Deep belief network
DT	Decision tree
ELM	Extreme learning machines
ENN	Edited nearest neighbors
GMM	Gaussian mixture model
GNB	Gaussian naïve Bayes
Grad-CAM	Gradient-weighted class activation mapping
ID3	Iterative dichotomiser 3
KNN	K-nearest neighbors
lblMLTC	Multi-label text classification
LMT	Logistic model tree
LR	Logistic regression
LSTM	Long short-term memory
LSTSVM	Least squares twin support vector machine
MLC-ACL	Multi-label classification approach based on correlation among labels
ML	Machine learning
ML-kNN	Multi-label k-nearest neighbor
MLL	Multi-label learning
MMHT	Multi-class multi-label Hoeffding tree
MMKNN	Multi-class multi-label k-nearest neighbors
MMK-Star	Multi-class multi-label k-star
MMLMT	Multi-class multi-label logistic model tree
MMLR	Multi-label logistic regression
MMLWL	Multi-class multi-label locally weighted learning
MMNB	Multi-class multi-label naïve Bayes
MMRT	Multi-class multi-label random tree
NB	Naïve Bayes
NBT	Naïve Bayes tree
NN	Neural network
PLDLDSA	Partial multi-label dependence learning via deep supervised autoencoder
PLDLDSA-DT	Partial multi-label dependence learning via deep supervised autoencoder–decision tree
PRC	Precision–recall curve
QWML	Quick weighted algorithm for multi-label learning
RAkEL	Random k-labelsets
REPTree	Reduced error pruning tree
RF	Random forest
RFMiD	Retinal fundus multi-disease image dataset
RIPPER	Repeated incremental pruning to produce error reduction
RNN	Recurrent neural network
SVM	Support vector machine
TPR	True positive rate
VOGAC-PC	Variable Ordering Genetic Algorithm–PC (Peter and Clark)
XGBoost	Extreme gradient boosting

References

Talaei Khoei, T.; Kaabouch, N. Machine Learning: Models, Challenges, and Research Directions. Future Internet 2023, 15, 332. [Google Scholar] [CrossRef]
Wang, Y.; Dong, H.; Bai, S.; Yu, Y.; Duan, Q. Image Recognition and Classification of Farmland Pests Based on Improved Yolox-tiny Algorithm. Appl. Sci. 2024, 14, 5568. [Google Scholar] [CrossRef]
Xu, X.; Li, J.; Zhu, Z.; Zhao, L.; Wang, H.; Song, C.; Chen, Y.; Zhao, Q.; Yang, J.; Pei, Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering 2024, 11, 219. [Google Scholar] [CrossRef] [PubMed]
Hoppe, H.; Dietrich, P.; Marzahn, P.; Weiß, T.; Nitzsche, C.; Freiherr von Lukas, U.; Wengerek, T.; Borg, E. Transferability of Machine Learning Models for Crop Classification in Remote Sensing Imagery Using a New Test Methodology: A Study on Phenological, Temporal, and Spatial Influences. Remote Sens. 2024, 16, 1493. [Google Scholar] [CrossRef]
Maldonado-Correa, J.; Valdiviezo-Condolo, M.; Artigao, E.; Martín-Martínez, S.; Gómez-Lázaro, E. Classification of Highly Imbalanced Supervisory Control and Data Acquisition Data for Fault Detection of Wind Turbine Generators. Energies 2024, 17, 1590. [Google Scholar] [CrossRef]
Shim, H.; Kim, S.K. Classification of LED Packages for Quality Control by Discriminant Analysis, Neural Network and Decision Tree. Micromachines 2024, 15, 457. [Google Scholar] [CrossRef]
Landwehr, N.; Hall, M.; Frank, E. Logistic model trees. Mach. Learn. 2005, 59, 161–205. [Google Scholar] [CrossRef]
Kamali Maskooni, E.; Naghibi, S.A.; Hashemi, H.; Berndtsson, R. Application of Advanced Machine Learning Algorithms to Assess Groundwater Potential Using Remote Sensing-Derived Data. Remote Sens. 2020, 12, 2742. [Google Scholar] [CrossRef]
Debnath, P.; Chittora, P.; Chakrabarti, T.; Chakrabarti, P.; Leonowicz, Z.; Jasinski, M.; Gono, R.; Jasińska, E. Analysis of Earthquake Forecasting in India Using Supervised Machine Learning Classifiers. Sustainability 2021, 13, 971. [Google Scholar] [CrossRef]
Zhao, X.; Chen, W. Optimization of Computational Intelligence Models for Landslide Susceptibility Evaluation. Remote Sens. 2020, 12, 2180. [Google Scholar] [CrossRef]
Lee, S.-W.; Kung, H.-C.; Huang, J.-F.; Hsu, C.-P.; Wang, C.-C.; Wu, Y.-T.; Wen, M.-S.; Cheng, C.-T.; Liao, C.-H. The Clinical Application of Machine Learning-Based Models for Early Prediction of Hemorrhage in Trauma Intensive Care Units. J. Pers. Med. 2022, 12, 1901. [Google Scholar] [CrossRef] [PubMed]
Reyes-Bueno, F.; Loján-Córdova, J. Assessment of Three Machine Learning Techniques with Open-Access Geographic Data for Forest Fire Susceptibility Monitoring—Evidence from Southern Ecuador. Forests 2022, 13, 474. [Google Scholar] [CrossRef]
Gorka, M.; Thomas, A.; Bécue, A. Differentiating Individuals through the Chemical Composition of Their Fingermarks. Forensic Sci. Int. 2023, 346, 111645. [Google Scholar] [CrossRef] [PubMed]
Togay, B.O.; Firat, C. Comprehensive Faults Analysis on the Direct Current Side of Photovoltaic Systems Using Logistic Model Tree Algorithm. SSRN-Social Sci. Res. Network 2024, 4819154. [Google Scholar] [CrossRef]
Binsawad, M. Enhancing PDF Malware Detection through Logistic Model Trees. CMC-Comput. Mater. Continua 2024, 78, 3645–3663. [Google Scholar] [CrossRef]
Amirruddin, A.D.; Muharam, F.M.; Ismail, M.H.; Tan, N.P.; Ismail, M.F. Synthetic Minority Over-Sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting Algorithms for Classifying Imbalanced Datasets of Nutrient and Chlorophyll Sufficiency Levels of Oil Palm (Elaeis Guineensis) Using Spectroradiometers and Unmanned Aerial Vehicles. Comput. Electron. Agric. 2022, 193, 106646. [Google Scholar] [CrossRef]
Cambuí, B.G. Neural Networks for Feature-Extraction in Multi-Target Classification. Master’s Thesis, Federal University of São Carlos, São Carlos, Brazil, 2020. Available online: https://repositorio.ufscar.br/handle/ufscar/13795 (accessed on 4 July 2024).
Mo, L.; Zhu, Y.; Zeng, L. A Multi-label based physical activity recognition via cascade classifier. Sensors 2023, 23, 2593. [Google Scholar] [CrossRef]
Wu, R.; Liu, X.; Zhang, T.; Xia, J.; Li, J.; Zhu, M.; Gu, G. An Efficient Multi-Label Classification-Based Municipal Waste Image Identification. Processes 2024, 12, 1075. [Google Scholar] [CrossRef]
Alfaro, R.; Allende-Cid, H.; Allende, H. Multilabel Text Classification with Label-Dependent Representation. Appl. Sci. 2023, 13, 3594. [Google Scholar] [CrossRef]
Valverde-Albacete, F.J.; Peláez-Moreno, C. A Formalization of Multilabel Classification in Terms of Lattice Theory and Information Theory: Concerning Datasets. Mathematics 2024, 12, 346. [Google Scholar] [CrossRef]
Zhang, P.; Ma, Z.; Ren, Z.; Wang, H.; Zhang, C.; Wan, Q.; Sun, D. Design of an Automatic Classification System for Educational Reform Documents Based on Naive Bayes Algorithm. Mathematics 2024, 12, 1127. [Google Scholar] [CrossRef]
Janrao, S.; Shah, K.; Pavate, A.; Patil, R.; Bankar, S.; Vasoya, A. Conglomerate Crop Recommendation by Using Multi-Label Learning via Ensemble Supervised Clustering Techniques. Int. Res. J. Multidiscip. Technovation 2024, 6, 90–100. [Google Scholar] [CrossRef]
Kang, E.; Choi, Y.; Kim, J. Advancements in Korean Emotion Classification: A Comparative Approach Using Attention Mechanism. Mathematics 2024, 12, 1637. [Google Scholar] [CrossRef]
Katona, T.; Tóth, G.; Petró, M.; Harangi, B. Developing New Fully Connected Layers for Convolutional Neural Networks with Hyperparameter Optimization for Improved Multi-Label Image Classification. Mathematics 2024, 12, 806. [Google Scholar] [CrossRef]
Filippakis, P.; Ougiaroglou, S.; Evangelidis, G. Prototype Selection for Multilabel Instance-Based Learning. Information 2023, 14, 572. [Google Scholar] [CrossRef]
El-Hasnony, I.M.; Elzeki, O.M.; Alshehri, A.; Salem, H. Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction. Sensors 2022, 22, 1184. [Google Scholar] [CrossRef]
Priyadharshini, M.; Banu, A.F.; Sharma, B.; Chowdhury, S.; Rabie, K.; Shongwe, T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors 2023, 23, 6836. [Google Scholar] [CrossRef]
Mei, S.; Zhang, K. A Multi-Label Learning Framework for Drug Repurposing. Pharmaceutics 2019, 11, 466. [Google Scholar] [CrossRef]
Hossain, P.S.; Kim, K.; Uddin, J.; Samad, M.A.; Choi, K. Enhancing Taxonomic Categorization of DNA Sequences with Deep Learning: A Multi-Label Approach. Bioengineering 2023, 10, 1293. [Google Scholar] [CrossRef]
Morales, R.; Martinez-Arroyo, A.; Aguilar, E. Robust Deep Neural Network for Learning in Noisy Multi-Label Food Images. Sensors 2024, 24, 2034. [Google Scholar] [CrossRef]
Kufel, J.; Bielówka, M.; Rojek, M.; Mitręga, A.; Lewandowski, P.; Cebula, M.; Krawczyk, D.; Bielówka, M.; Kondoł, D.; Bargieł-Łączek, K.; et al. Multi-Label Classification of Chest X-ray Abnormalities Using Transfer Learning Techniques. J. Pers. Med. 2023, 13, 1426. [Google Scholar] [CrossRef] [PubMed]
Unal, F.Z.; Guzel, M.S.; Bostanci, E.; Acici, K.; Asuroglu, T. Multilabel Genre Prediction Using Deep-Learning Frameworks. Appl. Sci. 2023, 13, 8665. [Google Scholar] [CrossRef]
Li, Z.; Xu, M.; Yang, X.; Han, Y.; Wang, J. A Multi-Label Detection Deep Learning Model with Attention-Guided Image Enhancement for Retinal Images. Micromachines 2023, 14, 705. [Google Scholar] [CrossRef] [PubMed]
Deniz, E.; Erbay, H.; Coşar, M. Multi-Label Classification of E-Commerce Customer Reviews via Machine Learning. Axioms 2022, 11, 436. [Google Scholar] [CrossRef]
Jabreel, M.; Moreno, A. A Deep Learning-Based Approach for Multi-Label Emotion Classification in Tweets. Appl. Sci. 2019, 9, 1123. [Google Scholar] [CrossRef]
Alzanin, S.M.; Gumaei, A.; Haque, M.A.; Muaad, A.Y. An Optimized Arabic Multilabel Text Classification Approach Using Genetic Algorithm and Ensemble Learning. Appl. Sci. 2023, 13, 10264. [Google Scholar] [CrossRef]
Ahanin, Z.; Ismail, M.A.; Singh, N.S.S.; AL-Ashmori, A. Hybrid Feature Extraction for Multi-Label Emotion Classification in English Text Messages. Sustainability 2023, 15, 12539. [Google Scholar] [CrossRef]
Goštautaitė, D.; Sakalauskas, L. Multi-Label Classification and Explanation Methods for Students’ Learning Style Prediction and Interpretation. Appl. Sci. 2022, 12, 5396. [Google Scholar] [CrossRef]
Ho, M.H.; Ponchet Durupt, A.; Vu, H.C.; Boudaoud, N.; Caracciolo, A.; Sieg-Zieba, S.; Xu, Y.; Leduc, P. Ensemble Learning for Multi-Label Classification with Unbalanced Classes: A Case Study of a Curing Oven in Glass Wool Production. Mathematics 2023, 11, 4602. [Google Scholar] [CrossRef]
Shokri, D.; Larouche, C.; Homayouni, S. A Comparative Analysis of Multi-Label Deep Learning Classifiers for Real-Time Vehicle Detection to Support Intelligent Transportation Systems. Smart Cities 2023, 6, 2982–3004. [Google Scholar] [CrossRef]
Zou, L.; He, Z.; Zhou, C.; Zhu, W. Multi-Class Multi-Label Classification of Social Media Texts for Typhoon Damage Assessment: A Two-Stage Model Fully Integrating the Outputs of the Hidden Layers of BERT. Int. J. Digit. Earth 2024, 17, 2348668. [Google Scholar] [CrossRef]
Gour, N.; Khanna, P. Multi-class multi-label ophthalmological disease detection using transfer learning based convolutional neural network. Biomed. Signal Process. Control 2021, 66, 102329. [Google Scholar] [CrossRef]
Anila Glory, H.; Meghana, S.; Kesav Kumar, J.S.; Shankar Sriram, V.S. Stacked Dark COVID-Net: A Multi-Class Multi-Label Classification Approach for Diagnosing COVID-19 Using Chest X-ray Images. In Recent Trends in Image Processing and Pattern Recognition; Santosh, K., Hegadi, R., Pal, U., Eds.; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2022; Volume 1576, pp. 61–75. [Google Scholar] [CrossRef]
Wardana, W.A.; Siradjuddin, I.A.; Muntasa, A. Identification of Pedestrians Attributes Based on Multi-Class Multi-Label Classification Using Convolutional Neural Network (CNN). J. Data Sci. Appl. 2020, 3, 8–18. [Google Scholar]
Nhu, V.-H.; Shirzadi, A.; Shahabi, H.; Singh, S.K.; Al-Ansari, N.; Clague, J.J.; Jaafari, A.; Chen, W.; Miraki, S.; Dou, J.; et al. Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine Algorithms. Int. J. Environ. Res. Public Health 2020, 17, 2749. [Google Scholar] [CrossRef]
Nhu, V.-H.; Mohammadi, A.; Shahabi, H.; Ahmad, B.B.; Al-Ansari, N.; Shirzadi, A.; Geertsema, M.; Kress, V.R.; Karimzadeh, S.; Valizadeh Kamran, K.; et al. Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms. Forests 2020, 11, 830. [Google Scholar] [CrossRef]
Pham, B.T.; Phong, T.V.; Nguyen, H.D.; Qi, C.; Al-Ansari, N.; Amini, A.; Ho, L.S.; Tuyen, T.T.; Yen, H.P.H.; Ly, H.-B.; et al. A Comparative Study of Kernel Logistic Regression, Radial Basis Function Classifier, Multinomial Naïve Bayes, and Logistic Model Tree for Flash Flood Susceptibility Mapping. Water 2020, 12, 239. [Google Scholar] [CrossRef]
Li, N.; Zare, M.; Yi, C.; Jimenez, R. Stability Risk Assessment of Underground Rock Pillars Using Logistic Model Trees. Int. J. Environ. Res. Public Health 2022, 19, 2136. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
Fehrman, E.; Muhammad, A.K.; Mirkes, E.M.; Egan, V.; Gorban, A.N. The five factor model of personality and evaluation of drug consumption risk. In Data Science; Springer: Berlin/Heidelberg, Germany, 2017; pp. 231–242. [Google Scholar]
Carnegie Mellon University. Enron Email Dataset. Available online: https://www.cs.cmu.edu/~enron/ (accessed on 4 July 2024).
Kaggle. HackerEarth ML Challenge: Adopt a Buddy. Available online: https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption (accessed on 4 July 2024).
Mulan Multi-Label Dataset Repository. Emotions Dataset. Available online: http://mulan.sourceforge.net/datasets-mlc.html (accessed on 4 July 2024).
Mulan Multi-Label Dataset Repository. Scene Dataset. Available online: http://mulan.sourceforge.net/datasets-mlc.html (accessed on 4 July 2024).
UCI Machine Learning Repository. Solar Flare Dataset. Available online: https://archive.ics.uci.edu/dataset/89/solar+flare (accessed on 4 July 2024).
MEKA. Thyroid-L7 Dataset. Available online: https://osdn.net/projects/sfnet_meka/downloads/Datasets/thyroid-L7.arff/ (accessed on 4 July 2024).
Elisseeff, A.; Weston, J. A kernel method for multi-labelled classification. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; pp. 681–687. [Google Scholar] [CrossRef]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann: Cambridge, MA, USA, 2016; pp. 1–664. [Google Scholar]
Drmota, M. Random Trees: An Interplay between Combinatorics and Probability; Springer: New York, NY, USA, 2009; pp. 1–458. [Google Scholar]
Webb, G.I. Naïve Bayes. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2010. [Google Scholar]
Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; Volume 51, pp. 13–23. [Google Scholar] [CrossRef]
Bisong, E.; Bisong, E. Logistic regression. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Springer: Berlin/Heidelberg, Germany, 2019; pp. 243–250. [Google Scholar]
Cleary, J.G.; Trigg, L.E. K*: An instance-based learner using an entropic distance measure. In Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; pp. 108–114. [Google Scholar]
Atkeson, C.G.; Moorey, A.W.; Schaalz, S.; Moore, A.W.; Schaal, S. Locally Weighted Learning. Artif. Intell. 1997, 11, 11–73. [Google Scholar] [CrossRef]
Fahringer, B.; Holmes, G.; Kirkby, R. New Options for Hoeffding Trees. In AI 2007: Advances in Artificial Intelligence; Orgun, M.A., Thornton, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 90–99. [Google Scholar]
Eisinga, R.; Heskes, T.; Pelzer, B.; Te Grotenhuis, M. Exact P-Values for Pairwise Comparison of Friedman Rank Sums, with Application to Comparing Classifiers. BMC Bioinform. 2017, 18, 68. [Google Scholar] [CrossRef]
Quade, D. Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 1979, 74, 680–683. [Google Scholar] [CrossRef]
Zimmerman, D.W.; Zumbo, B.D. Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J. Exp. Educ. 1993, 62, 75–86. [Google Scholar] [CrossRef]
Rizal, K.; Adinugroho, S.; Rahayudi, B. Penentuan Waktu Terakhir Penggunaan Ganja Menggunakan. J. Pengemb. Teknol. Inf. Dan Ilmu Komput. 2019, 3, 9341–9347. [Google Scholar]
Dharmadhikari, S.C.; Ingle, M.; Kulkarni, P. A novel multi label text classification model using semi supervised learning. Int. J. Data Min. Knowl. Manag. Process 2012, 2, 11–20. [Google Scholar] [CrossRef]
Ghani, M.U.; Rafi, M.; Tahir, M.A. Discriminative Adaptive Sets for Multi-Label Classification. IEEE Access 2020, 8, 227579–227595. [Google Scholar] [CrossRef]
Lian, S.-M.; Liu, J.-W.; Lu, R.-K.; Luo, X.-L. Captured multi-label relations via joint deep supervised autoencoder. Appl. Soft Comput. 2019, 74, 709–728. [Google Scholar] [CrossRef]
Resende, V.H.; Carneiro, M.G. Towards a High-Level Multi-Label Classification from Complex Networks. In Proceedings of the IEEE 31st International Conference on Tools with Artificial Intelligence, Portland, OR, USA, 4–6 November 2019; pp. 1140–1147. [Google Scholar] [CrossRef]
Alazaidah, R.; Thabtah, F.; Al-Radaideh, Q. A Multi-Label Classification Approach Based on Correlations Among Labels. Int. J. Adv. Comput. Sci. Appl. 2015, 6, 52–59. [Google Scholar] [CrossRef]
Tomar, D.; Agarwal, S. A Multilabel Approach Using Binary Relevance and One-versus-Rest Least Squares Twin Support Vector Machine for Scene Classification. In Proceedings of the Second International Conference on Computational Intelligence & Communication Technology (CICT), Ghaziabad, India, 12–13 February 2016; pp. 37–42. [Google Scholar] [CrossRef]
Mendialdua, I.; Arruti, A.; Jauregi, E.; Lazkano, E.; Sierra, B. Classifier Subset Selection to construct multi-classifiers by means of estimation of distribution algorithms. Neurocomputing 2015, 157, 46–60. [Google Scholar] [CrossRef]
Hruschka, E.R., Jr.; dos Santos, E.B.; Galvao, S.D.C.d.O. Variable Ordering in the Conditional Independence Bayesian Classifier Induction Process: An Evolutionary Approach. In Proceedings of the 7th International Conference on Hybrid Intelligent Systems (HIS 2007), Kaiserslautern, Germany, 17–19 September 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 204–209. [Google Scholar] [CrossRef]
Schetinin, V.; Zharkova, V.; Zharkov, S. Bayesian Decision Tree Averaging for the Probabilistic Interpretation of Solar Flare Occurrences. In Proceedings of the KES 2006 Knowledge-Based Intelligent Information and Engineering Systems, Bournemouth, UK, 9–11 October 2006; Gabrys, B., Howlett, R.J., Jain, L.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 523–532. [Google Scholar] [CrossRef]
Bylander, T. Estimating generalization error on two-class datasets using out-of-bag estimates. Mach. Learn. 2002, 48, 287–297. [Google Scholar] [CrossRef]
Juneja, K. Expanded and Filtered Features Based ELM Model for Thyroid Disease Classification. Wireless Pers. Commun. 2022, 126, 1805–1842. [Google Scholar] [CrossRef]

Figure 1. Timeline of related works for the MMLMT method across various domains, including (a) disease prediction [27,28], (b) drug repurposing [29], (c) biomedical applications [30], (d) image classification [31,32,33,34], (e) natural language processing [35,36,37,38], (f) education [39], (g) industry [40], (h) transportation [41], (i) natural disaster management [42], (j) ophthalmology [43], (k) medical diagnosis [44], (l) pedestrian attribute recognition [45], (m) landslide susceptibility mapping [46,47], (n) flash flood susceptibility map-ping [48], (o) underground column stability assessment [49].

Figure 2. An overview of the MMLMT method.

Figure 3. The MMLMT performance over various datasets in the sensitivity metric.

Figure 4. The MMLMT performance over various datasets in the PRC Area metric.

Figure 5. An example LMT tree structure from the Thyroid-L7 dataset.

Table 1. Example illustration of instances in a multi-class multi-label dataset.

Instances (S)	Features (X)					Target Attributes (L)			Labels (Y)
Instances (S)	Features (X)					$L_{1}$	$L_{2}$	$L_{3}$	Labels (Y)
$S_{1}$	$x_{11}$	$x_{12}$	$x_{13}$	…	$x_{1 k}$	$y_{1}$	$y_{3}$	$y_{7}$	$Y_{1} = {y_{1}, y_{3}, y_{7}}$
$S_{2}$	$x_{21}$	$x_{22}$	$x_{23}$	…	$x_{2 k}$	$y_{2}$	$y_{4}$	$y_{9}$	$Y_{2} = {y_{2}, y_{4}, y_{9}}$
$S_{3}$	$x_{31}$	$x_{32}$	$x_{33}$	…	$x_{3 k}$	$y_{1}$	$y_{3}$	$y_{6}$	$Y_{3} = {y_{1}, y_{3}, y_{6}}$
$S_{4}$	$x_{41}$	$x_{42}$	$x_{43}$	…	$x_{4 k}$	$y_{2}$	$y_{5}$	$y_{10}$	$Y_{4} = {y_{2}, y_{5}, y_{10}}$
…	…	…	…	…	…	…	…	…	…
$S_{N}$	$x_{N 1}$	$x_{N 2}$	$x_{N 3}$	…	$x_{N k}$	$y_{1}$	$y_{5}$	$y_{8}$	$Y_{N} = {y_{1}, y_{5}, y_{8}}$

Table 2. Illustration of individual training datasets.

$D_{1}$	X	$L_{1}$	$D_{2}$	X	$L_{2}$	$D_{3}$	X	$L_{3}$
$S_{1}$	$(x_{11} \dots x_{1 K})$	$y_{1}$	$S_{1}$	$(x_{11} \dots x_{1 K})$	$y_{3}$	$S_{1}$	$(x_{11} \dots x_{1 K})$	$y_{7}$
$S_{2}$	$(x_{21} \dots x_{2 K})$	$y_{2}$	$S_{2}$	$(x_{21} \dots x_{2 K})$	$y_{4}$	$S_{2}$	$(x_{21} \dots x_{2 K})$	$y_{9}$
$S_{3}$	$(x_{31} \dots x_{3 K})$	$y_{1}$	$S_{3}$	$(x_{31} \dots x_{3 K})$	$y_{3}$	$S_{3}$	$(x_{31} \dots x_{3 K})$	$y_{6}$
$S_{4}$	$(x_{41} \dots x_{4 K})$	$y_{2}$	$S_{4}$	$(x_{41} \dots x_{4 K})$	$y_{5}$	$S_{4}$	$(x_{41} \dots x_{4 K})$	$y_{10}$
…	…	…	…	…	…	…	…	…
$S_{N}$	$(x_{N 1} \dots x_{N K})$	$y_{1}$	$S_{N}$	$(x_{N 1} \dots x_{N K})$	$y_{5}$	$S_{N}$	$(x_{N 1} \dots x_{N K})$	$y_{8}$

Table 3. Characteristics of multi-class multi-label datasets used in the study.

ID	Ref.	Dataset Name	#Features	#Instances	#Labels	Domain	Source	Link
1	[51]	Drug-Consumption	12	1885	18	Drugs	UCI	https://archive.ics.uci.edu/dataset/373/drug+consumption+quantified
2	[52]	Enron	1001	1702	53	Text	Mulan	http://mulan.sourceforge.net/datasets-mlc.html
3	[53]	HackerEarth-Adopt-A-Buddy	11	18,834	2	Animals	Kaggle	https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption
4	[54]	Music-Emotions	72	593	6	Music	Mulan	http://mulan.sourceforge.net/datasets-mlc.html
5	[55]	Scene	294	2407	6	Image	Mulan	http://mulan.sourceforge.net/datasets-mlc.html
6	[56]	Solar-Flare-2	10	1066	3	Physics	UCI	https://archive.ics.uci.edu/dataset/89/solar+flare
7	[57]	Thyroid-L7	29	9172	7	Healthcare	MEKA	https://osdn.net/projects/sfnet_meka/downloads/Datasets/thyroid-L7.arff/
8	[58]	Yeast	103	2417	14	Biology	Mulan	http://mulan.sourceforge.net/datasets-mlc.html all accessed on 4 July 2024

Table 4. LMT hyperparameters.

Hyperparameter	Description	Value
BatchSize	The number of instances to process per batch	100
ConvertNominal	Indicates whether nominal attributes should be converted to binary	false
debug parameter	When true, outputs additional debug information	false
DoNotCheckCapabilities	Skips the capability check if true	false
DoNotMakeSplitPointActualValue	Controls whether split points are adjusted to actual data values	false
errorOnProbabilities	If true, outputs probabilities instead of classifications	false
FastRegression	Allows the use of a faster regression algorithm	true
MinNumInstances	Sets the minimum number of instances per leaf	15
NumBoostingIterations	Implies the use of an automatic determination of the number of boosting iterations	−1
NumDecimalPlaces	Controls the number of decimal places for output	2
SplitOnResiduals	Determines if splits are made on residuals	false
UseAIC	When true, uses the Akaike information criterion (AIC) for model selection	false
weightTrimBeta	Defines the threshold for trimming weights in the boosting algorithm	0.0

Table 5. Comparison of MMLMT with various classifiers in accuracy (%) metric.

ID	Dataset	MMRT	MMNB	MMKNN	MMLR	MMK-Star	MMLWL	MMHT	MMLMT
1	Drug-Consumption	51.31	59.53	53.22	62.99	54.85	62.54	62.10	63.12
2	Enron	93.28	80.87	93.58	89.05	93.64	94.48	93.58	95.40
3	HackerEarth-Adopt-A-Buddy	81.94	82.91	84.15	85.41	82.85	81.06	82.01	86.80
4	Music-Emotions	72.21	75.03	76.32	78.60	76.52	75.14	76.15	79.79
5	Scene	85.76	75.90	88.99	85.94	84.08	84.54	83.15	89.99
6	Solar-Flare-2	91.74	88.59	91.96	92.09	92.37	93.03	93.03	93.03
7	Thyroid-L7	98.24	92.44	96.98	97.31	97.40	96.87	96.08	99.03
8	Yeast	71.98	69.95	75.62	79.48	75.13	78.50	78.68	80.02
	Average	80.81	78.15	82.60	83.86	82.11	83.27	83.10	85.90

Table 6. Computational complexity of various classification algorithms.

Algorithm	Time Complexity for Training	Time Complexity for Prediction	Space Complexity
Logistic model tree	$O (n \times k \times c)$	$O (l o g n + k)$	$O (n \times k)$
Random tree	$O (n \times l o g n)$	$O (l o g n)$	$O (n)$
Naïve Bayes	$O (n \times k)$	$O (c \times k)$	$O (c \times k)$
K-nearest neighbors	$O (1)$	$O (n \times k)$	$O (n \times k)$
Logistic regression	$O (n \times k)$	$O (k)$	$O (k)$
K-star	$O (1)$	$O (n)$	$O (n \times k)$
Locally weighted learning	$O (1)$	$O (n \times k)$	$O (n \times k)$
Hoeffding tree	$O (l o g n)$	$O (l o g n)$	$O (n \times k)$

Table 7. The results of the Wilcoxon test for comparing MMLMT with its counterparts.

Method	Wilcoxon Test (p-Value)
MMRT	0.01172
MMNB	0.01172
MMKNN	0.01172
MMLR	0.01172
MMK-Star	0.01172
MMLWL	0.01796
MMHT	0.01796

Table 8. An example confusion matrix from the Thyroid-L7 dataset.

	L	M	N	NEG
L	75	0	19	22
M	0	121	0	8
N	17	0	90	3
NEG	20	2	3	8792

Table 9. The comparison of MMLMT with state-of-the-art methods on various datasets.

Dataset	Ref	Year	Method	Accuracy (%)
Drug-Consumption	[70]	2019	Naïve Bayes	42.86
Drug-Consumption			Proposed (MMLMT)	63.12
Enron	[71]	2012	lblMLTC	90.00
			Proposed (MMLMT)	95.40
Music-Emotions	[72]	2020	MLkNN	68.20
	[72]	2020	DASMLKNN	64.10
	[73]	2019	PLDLDSA	68.00
	[73]	2019	PLDLDSA-DT	68.00
	[74]	2019	ML-kNN	72.10
			BR (SVM)	74.30
			CC (SVM)	74.50
			BR (GNB)	74.20
			CC (GNB)	75.90
	[75]	2015	MLC-ACL	76.70
			Proposed (MMLMT)	79.79
Scene	[73]	2019	PLDLDSA	85.00
	[73]	2019	PLDLDSA-DT	73.00
	[74]	2019	ML-kNN	91.00
			BR (SVM)	90.10
			CC (SVM)	88.80
			BR (GNB)	85.70
			CC (GNB)	86.10
	[76]	2016	ML-kNN	62.90
			ML C4.5	56.90
			RAkEL	73.40
			QWML	68.30
			CC	72.00
			BR-SVM	68.90
			BR One-versus-rest LSTSVM	72.60
			Proposed (MMLMT)	89.99
Solar-Flare-2	[77]	2015	1R	66.04
			KNN	75.05
			RIPPER	73.64
			NB	73.45
			C4.5	75.23
			K*	75.33
			BN	72.89
			NBT	74.39
			RF	75.33
			SVM	76.36
	[78]	2007	VOGAC-PC	83.30
	[78]	2007	VOGAC-MarkovPC	82.83
	[79]	2006	BDT1	82.10
	[79]	2006	BDT2	82.50
	[80]	2002	ID3	85.40
			Proposed (MMLMT)	93.03
Thyroid-L7	[81]	2022	ELM	98.88
			Proposed (MMLMT)	99.03
Yeast	[73]	2019	PLDLDSA	61.00
	[73]	2019	PLDLDSA-DT	78.00
	[74]	2019	ML-kNN	79.10
			BR (SVM)	79.80
			CC (SVM)	79.10
			BR (GNB)	72.40
			CC (GNB)	70.50
			Proposed (MMLMT)	80.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghasemkhani, B.; Balbal, K.F.; Birant, D. A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT). Mathematics 2024, 12, 2825. https://doi.org/10.3390/math12182825

AMA Style

Ghasemkhani B, Balbal KF, Birant D. A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT). Mathematics. 2024; 12(18):2825. https://doi.org/10.3390/math12182825

Chicago/Turabian Style

Ghasemkhani, Bita, Kadriye Filiz Balbal, and Derya Birant. 2024. "A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT)" Mathematics 12, no. 18: 2825. https://doi.org/10.3390/math12182825

APA Style

Ghasemkhani, B., Balbal, K. F., & Birant, D. (2024). A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT). Mathematics, 12(18), 2825. https://doi.org/10.3390/math12182825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Predictive Method for Classification Tasks in Machine Learning: Multi-Class Multi-Label Logistic Model Tree (MMLMT)

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Proposed Method

3.2. Formal Expression

3.3. Algorithm

4. Experimental Studies

4.1. Dataset Description

4.1.1. Drug-Consumption Dataset

4.1.2. Enron Dataset

4.1.3. HackerEarth-Adopt-A-Buddy

4.1.4. Music-Emotions Dataset

4.1.5. Scene Dataset

4.1.6. Solar-Flare-2 Dataset

4.1.7. Thyroid-L7 Dataset

4.1.8. Yeast Dataset

4.2. Experiment Details

4.3. Results

4.4. Statistical Analysis of Results

4.5. LMT Structure Analysis

5. Discussion

6. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI