Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers

Yağanoğlu, Mete; Öztürk, Gürkan; Bozkurt, Ferhat; Bilen, Zeynep; Yetiş Demir, Zühal; Kul, Sinan; Şimşek, Emrah; Kara, Salih; Eygu, Hakan; Altundaş, Necip; Aksungur, Nurhak; Korkut, Ercan; Başar, Mehmet Sinan; Öztürk, Nurinnisa

doi:10.3390/app15031248

Open AccessArticle

Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers

by

Mete Yağanoğlu

^1,*

,

Gürkan Öztürk

²

,

Ferhat Bozkurt

¹

,

Zeynep Bilen

³

,

Zühal Yetiş Demir

²

,

Sinan Kul

⁴

,

Emrah Şimşek

⁵

,

Salih Kara

²

,

Hakan Eygu

⁶

,

Necip Altundaş

²

,

Nurhak Aksungur

²

,

Ercan Korkut

²

,

Mehmet Sinan Başar

⁴

and

Nurinnisa Öztürk

⁷

¹

Department of Computer Engineering, Faculty of Engineering, Ataturk University, Erzurum 25240, Turkey

²

Ataturk University Organ Transplant Center, Ataturk University, Erzurum 25240, Turkey

³

Department of Computer Technologies, Vocational School of Technical Sciences, Bayburt University, Bayburt 69000, Turkey

⁴

Faculty of Open and Distance Education, Ataturk University, Erzurum 25240, Turkey

⁵

Department of Computer Engineering, Faculty of Engineering, Erzurum Technical University, Erzurum 25050, Turkey

⁶

Department of Statistics, Faculty of Economics and Administrative Sciences, Ataturk University, Erzurum 25240, Turkey

⁷

Department of Medical Biochemistry, Faculty of Medicine, Ataturk University, Erzurum 25240, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1248; https://doi.org/10.3390/app15031248

Submission received: 17 December 2024 / Revised: 17 January 2025 / Accepted: 23 January 2025 / Published: 26 January 2025

(This article belongs to the Special Issue Artificial Intelligence for Healthcare)

Download

Browse Figures

Versions Notes

Abstract

:

The objective of this study is to utilize artificial intelligence techniques for the diagnosis of complications and diseases that may arise after liver transplantation, as well as for the identification of patients in need of transplantation. To achieve this, an interface was developed to collect patient information from Atatürk University Research Hospital, specifically focusing on individuals who have undergone liver transplantation. The collected data were subsequently entered into a comprehensive database. Additionally, relevant patient information was obtained through the hospital’s information processing system, which was used to create a data pool. The classification of data was based on four dependent variables, namely, the presence or absence of death (“exitus”), recurrence location, tumor recurrence, and cause of death. Techniques such as Principal Component Analysis and Linear Discriminant Analysis (LDA) were employed to enhance the performance of the models. Among the various methods employed, the LDA method consistently yielded superior results in terms of accuracy during k-fold cross-validation. Following k-fold cross-validation, the model achieved the highest accuracy of 98% for the dependent variable “exitus”. For the dependent variable “recurrence location”, the highest accuracy obtained after k-fold cross-validation was 91%. Furthermore, the highest accuracy of 99% was achieved for both the dependent variables “tumor recurrence” and “cause of death” after k-fold cross-validation.

Keywords:

deep learning; liver; machine learning; transplantation

1. Introduction

The liver is situated beneath the rib cage, in the upper right portion of the abdomen. The weight of the liver is higher in men compared to women. It is the largest organ in the human body, responsible for converting externally sourced sugar into glycogen and storing it. Additionally, it participates in tasks such as protein and fat synthesis, as well as their storage [1].

Liver diseases can manifest either as congenital conditions present from birth or as acquired disorders that develop later in life. While congenital liver diseases are commonly observed in children and are often linked to complications during maternal pregnancy, liver diseases in adults tend to emerge through various factors. Preventive measures play a critical role in averting the development of liver diseases in adults. Essential considerations include ensuring complete vaccination against hepatitis, abstaining from alcohol consumption and avoiding obesity, maintaining a regular and nutritious diet, and avoiding unnecessary and excessive medication usage. Timely intervention for gallbladder stones, which have the potential to obstruct the bile ducts and inflict liver damage, is imperative [2,3].

Liver transplantation is a surgical procedure performed using two primary methods: live donor transplantation and cadaver donor transplantation. In cadaver donor transplantation, liver tissue is procured from individuals who have suffered brain death while their vital bodily functions persist. The diseased liver tissue is then replaced with the donated liver tissue. Alternatively, live donor transplantation involves the surgical removal of a portion of the liver from compatible individuals, which is subsequently transplanted into the recipient.

Several crucial factors require careful consideration before and after liver transplantation. These factors include patient selection, surgical techniques utilized during the transplantation procedure, optimal storage conditions for the donated organ, graft size, post-transplantation medication regimen, and the meticulous selection of both the donor and recipient [4].

Following the transplantation procedure, patients undergo regular assessments that involve monitoring their blood pressure, pulse, body temperature, level of consciousness, and other general examination findings. Furthermore, liver function tests, infection-related tests, radiological evaluations, such as ultrasound, tomography, and magnetic resonance imaging (MRI), as well as pathological examinations, such as biopsies and tissue analyses, are conducted to identify any potential complications.

Specialist physicians determine the timing and evaluation of these tests, which can be influenced by their previous experience or by analyzing data obtained from national and international studies. The reliability of the obtained results increases with the availability of more data. Based on these findings, the physician establishes an individualized treatment program. Throughout the treatment process, regular examinations are conducted, and the progress of the patient is closely monitored. Depending on the patient’s condition, follow-up and treatment may be continued or modified accordingly. The development of a machine or deep learning model would follow a similar modeling approach as described above.

2. Related Works

In the study by Jin et al. [5], liver disease prediction was carried out using machine learning (ML) algorithms. The Indian Liver Patient Dataset (ILPD) was used as the dataset. The ML algorithms employed in the study included Decision Tree (DT), K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), Pure Bayes (Naive Bayes—NB), Logistic Regression (LR), and Random Forest (RF). Accuracy, precision, sensitivity, and specificity were used as performance criteria for these algorithms. The average success rates for these criteria were as follows: specificity (98.6%), accuracy (96.55%), precision (93.698%), and sensitivity (92.1%).

Ayeldeen et al. [6] conducted a study on the presence of liver fibrosis and the estimation of its degree in patients with hepatitis C virus. The study analyzed data from 100 Egyptian patients with hepatitis C virus. A machine learning technique based on a Decision Tree (DT) classifier was employed. The classification achieved an accuracy rate of 93.7%.

Abdar et al. [7] performed a study utilizing the ILPD dataset and employed two algorithms, Boosted C5.0 and CHAID, for analysis. Comparison of the performance of these algorithms revealed that the Boosted C5.0 algorithm achieved an accuracy of 93.75%, indicating superior performance compared to the CHAID algorithm, which exhibited an accuracy of 65.00%.

Yu et al. [8] aimed to compare the performance of traditional statistical models with machine learning approaches to predict survival after liver transplants. In their study, five machine learning methods and four traditional statistical models were compared. The AUC-ROC value was calculated as 0.86 for the 3 months.

Ramirez et al. [9] estimated the probability that a liver from a donor would belong to the survival graft class after one year of transplantation with a decision support system. In this study, a feed-forward neural network with generalized radial basis functions in the hidden layer was used.

Hashem et al. [10] conducted a study on the diagnosis of the most common type of hepatitis C disease in the liver using machine learning algorithms. Data were collected by experts from Cairo University and Kasr Al-Aini Hospital in Egypt. Linear Regression, Alternative Decision Tree, Classification and Regression Trees (CART), and REPTree classifier algorithms were employed. The performance of the classifiers was assessed using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, and AUC-ROC. The alternative decision tree algorithm achieved the highest accuracy of 95.6% and the highest AUC value of 99%.

In the Briceno et al. study [11], an ANN was used for donor-recipient matching in liver transplantation and its accuracy was compared with validated scores. The best performance was obtained with multiple regression in estimating the probability of graft-survival (90.79%) and loss (71.42%) for each donor–recipient match.

In the Chen et al. study [12], postoperative sepsis was predicted in liver transplant patients using machine learning algorithms. The random forest classifier model estimated 0.731 AUC with 71.6% accuracy.

Ozer conducted a study on hepatitis diagnosis using recurrent neural networks. The dataset was categorized into four classes: blood donor, hepatitis, fibrosis, and cirrhosis. Models such as Feed-forward Neural Network (FFNN), Simple RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) were employed. Performance evaluation criteria included accuracy, precision, recall, and the F1-score. Among these models, Simple RNN achieved the best performance with an accuracy of 97.72%. Additionally, the Simple RNN model exhibited the highest F1-score for the classification of all diseases [13].

Various studies in the literature have pursued the same objective using different methods. Different machine learning (ML) and deep learning (DL) algorithms have been employed to enhance the performance of the models. Jin et al. [5] emphasized the significance of selecting an effective algorithm, as it directly influences the predictive performance of the model. This inference should be considered in future studies. Ayeldan et al. [6] focused exclusively on hepatitis C disease, diverging from the approach of other studies. Consequently, examining various liver diseases would be beneficial in terms of the study’s scope. The performance of the models in study [6] may have been influenced by the relatively small dataset size of 100. Increasing the dataset size is likely to contribute to improved success rates. Abdar et al. [7] found that men were more susceptible to the disease than women, as indicated by the rules generated by the Boosted C5.0 algorithm. These derived rules provided an advantage for the study.

Zhang et al. [14] developed a lightweight and efficient convolutional neural network (CNN) model for industrial surface defect detection. In the perspective of image processing, the model combines the inverse residual architecture and coordinate attention mechanism to provide better extraction of multi-scale features. The applied multi-scale strategy reduces the difficulties of small object detection and improves the accuracy of the model, achieving similar accuracies with fewer parameters and computations compared to existing state-of-the-art techniques.

Zhang et al. [15] proposed a new deep convolutional neural network algorithm for surface defect detection. The proposed model provides feature extraction with a DCP-Darknet background network based on dense connections and offers higher accuracy and speed in detecting surface defects while reducing computational parameters. It also integrates a new hierarchy designed using a cross-stage feature fusion strategy and deep separable convolution technique. Experiments show that the proposed algorithm provides significant performance improvement over existing models in industrial applications.

Serban et al. [16] recently emphasized in their review that liver transplantation offers a vital solution for patients with end-stage liver disease. However, the mismatch between the waiting list and available organs has led to the development of expanded donation criteria and various scoring systems (Child-Pugh, MELD, DRI, SOFT). However, current scores do not predict survival after transplantation. Recently, artificial intelligence has become an important tool in the liver field, with various methods of this field (random forest, artificial neural networks, decision trees, Bayesian networks and support vector machines) showing better results in predicting survival after transplantation than traditional statistical models.

Estimating the risk of death after liver transplantation is an important process for medical experts to make decisions, determine the risk, and select patients. Although traditional scoring systems have limitations, machine learning models have recently become useful in estimating mortality rates. Prasad and Kasiemobi [17] estimated the risk of death using feed-forward neural networks (FNNs) and the results obtained showed that the proposed model was more effective than other advanced algorithms with an accuracy rate of approximately 95.5%.

3. Materials and Method

This section presents the details of the classification algorithms utilized for model creation in the study, encompassing K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). Moreover, the performance criteria employed to assess the models and the dataset utilized are meticulously described. To provide a comprehensive understanding, Figure 1 offers an overview of the workflow at this stage, elucidating each step with explanations.

3.1. Data Input

The study involved a comprehensive analysis of the hospital database, patient files, and laboratory values to uncover critical data pertaining to liver transplant patients before and after surgery. While some data on all previous liver transplant patients were recorded in the hospital information system, certain diagnosis and treatment information was physically documented in the patient files. Consequently, a database was established to input patient information from the files. Additionally, Hospital Information Management System (HIMS) Web services were developed to facilitate the entry of data for new patients into the system.

Figure 1. The workflow of the performance analysis of various supervised machine learning algorithms.

3.2. Dataset

During this phase of the study, the dataset was manually entered, encompassing comprehensive information concerning liver transplant patients, including details about the patient, the donor, as well as laboratory, radiological, and pathological procedures. The dataset consists of 188 independent variables and 4 dependent variables. The dependent variables include Ex (indicating the presence or absence of death), Recurrence site (categorizing into no tumor, tumor without recurrence, or tumor recurrence), Tumor recurrence (classifying into no recurrence, local recurrence, or distant recurrence), and Cause of death (distinguishing between cardiac arrest, multi-organ failure, or no specific cause of death). The main objective of the study was to perform classification processes on these 4 dependent variables. To accomplish this, the Python programming language was utilized for dataset separation, preprocessing, and feature extraction. Various Python libraries were employed throughout the study. The dataset was divided into 80% for training purposes and 20% for testing. As an example, Table 1 showcases the average, maximum, and minimum values of the first 10 features extracted from the dataset.

3.3. Data Preprocessing

To improve the training and testing phase of the dataset and obtain better results, data preprocessing was performed to address irrelevant and redundant data. The data preprocessing stage involved identifying repetitive values, empty data, and outliers in the dataset. Empty data points were detected and filled with the average value of the attribute to which they belonged. Outliers, which are extreme values at the endpoints of the attribute range, were identified. These outlier values were replaced with the average value of the corresponding attribute. In order to ensure that all values have an equal contribution to the performance of the algorithms used in this study, it is important to scale the data to a standardized range.

The dataset used in this study was small in size and had a non-normal distribution. Therefore, the mean method was preferred for imputing missing values. This method can make the impact of outliers more pronounced in datasets with a non-normal distribution; however, since the mean is not affected by outliers, the data structure remained intact during the imputation process. In small datasets, the mean method quickly filled in the missing values. This method provided a more accurate representation of the data during model training. The mean method was chosen because it is effective for datasets with a skewed distribution, is not sensitive to outliers, and works efficiently on small datasets.

When detecting outliers, factors such as dataset size, structure, computational capacity, and application domain should be considered. If the dataset is small, statistical methods such as the Z-score and IQR are used. These methods are fast and require minimal computation. The structure of the dataset relates to whether the data distribution is normal. If the dataset has a normal distribution, the Z-score method is more effective. If the distribution is non-normal, the IQR method is more suitable. Statistical methods are appropriate for small to medium-sized datasets because they require low computational power. For basic data analyses in specific application domains, statistical methods are usually sufficient.

These factors were considered when selecting the method. The IQR method was chosen based on the dataset’s small size, non-normal distribution, low computational power, and basic data analysis requirements. The IQR method determines outliers by using the interquartile range of the data. Values greater than or less than 1.5 times the IQR are considered outliers. The presence of outliers in the dataset’s attributes was identified, and the outlier data were replaced with the mean of the respective attribute. Figure 2 provides an example histogram showing the data distribution of the ’donor weight’ attribute from the dataset used in this study. The distribution of the data is visualized in this graph.

Outliers are identified as follows: For each attribute, the 0.25 (first quartile) and 0.75 (third quartile) values are calculated, and their difference is used to determine the interquartile range (IQR). Values that fall outside 1.5 times the IQR are considered outliers. After detecting outliers and performing data preprocessing, the distribution of donor weight data changed as shown in Figure 3.

To achieve this, feature standardization was applied, which scales the data in the dataset from 0 to 1. The standardization process involves subtracting the mean value (

μ

) of each column from the variable values (x) and dividing them by the standard deviation (

σ

) of the column, as shown in Formula (1).

z = \frac{x - μ}{σ}

(1)

3.4. Feature Extraction

The dataset utilized in this study faced the challenge of high dimensionality within its feature space, posing potential issues for subsequent analyses. To address this challenge, effective feature selection and feature extraction methods were employed to mitigate the inclusion of irrelevant features and reduce the dimensionality of the feature space. Among the commonly employed techniques for this purpose, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were selected as the preferred methods in this study.

PCA is a straightforward and widely used dimensionality reduction technique. Its objective is to reduce the dimensionality of a high-dimensional dataset by eliminating irrelevant features while preserving the underlying structure of the data. It is crucial to determine the number of principal components in PCA. The optimal number of components, denoted as p, should be selected to best represent the data [18]. In this study, the optimal value of p was determined as 80. As a result, the original dataset with dimensions 109 × 188 was reduced to a dataset with dimensions 109 × 80.

In contrast to PCA, LDA takes into account the class membership information when reducing dimensionality. LDA is a dimensionality reduction technique commonly used in classification problems. It aims to find the optimal set of projection vectors that map the original data to a lower-dimensional feature space. LDA effectively handles situations where class frequencies are imbalanced and performances are random [19,20]. Given the presence of numerous features in the dataset, the ’svd’ parameter was used, which automatically adjusts the size of the data. Consequently, the original dataset with dimensions 109 × 188 was reduced to a dataset with dimensions 109 × 2.

To understand the effectiveness of the features obtained through PCA and LDA, a feature importance analysis was conducted. This analysis was used to enhance model performance and provide better explanations. Among feature importance methods, the Shapley Additive Explanations (SHAP) method was used, which is a powerful and commonly used technique in ML models. It is also model-agnostic. Figure 4 illustrates the feature importance of the dataset on the Logistic Regression (LR) model without any feature extraction method as an example. In the graph, the bars located on the positive (upper) and negative (lower) axes indicate the impact of each feature on the model. Features on the positive axis contribute positively to the model, while those on the negative axis contribute negatively. This can also be interpreted as positive and negative effects on the model’s output. Features close to zero have no meaningful impact on model performance.

Due to the large number of features in the dataset and the fact that many of them have values close to zero, feature selection was performed. Feature extraction methods were used to remove unnecessary features and optimize the model, making it more efficient and faster. Figure 5a shows the feature importance on the LR model after feature extraction using the PCA method. Figure 5b illustrates the feature importance on the LR model after feature extraction using the LDA method.

3.5. Machine and Deep Learning Techniques

ML (Machine Learning) is a field within artificial intelligence that aims to enable machines to learn, make decisions, and take actions autonomously without the need for explicit programming or external intervention. It focuses on developing algorithms and models that can automatically learn and improve from experience or data [21]. Over the years, ML has experienced remarkable growth and has found widespread applications across various domains. It involves the automatic identification of meaningful patterns and relationships within data, empowering systems to make predictions or take actions based on learned patterns. ML algorithms possess the ability to adapt and enhance their performance over time through the assimilation of new data [22].

Machine learning-based predictive models independently examine the fundamental relationships between donor and recipient factors to forecast graft survival in transplantation. These models also facilitate the simultaneous testing of multiple transplant scenarios. ML methods offer versatile and practical tools for predicting graft survival [23]. Consequently, numerous studies have been conducted using ML techniques for graft survival prediction. A review of these studies reveals the use of twenty-nine different machine learning models for this purpose [24]. According to a study conducted in Italy in 2010, ML can serve as a suitable alternative to traditional statistical methods for predicting 5-year graft loss, as it allows for the analysis of interactions among various risk factors, achieving 88.2% sensitivity and 73.8% specificity [25]. Another study corroborates this, highlighting ML’s superior performance over traditional statistical models in predicting graft survival [23]. In a study involving over 100,000 patients, ML predictions paired with conventional statistical models outperformed in forecasting post-transplant patient survival [26].

DL, a subfield of ML, revolves around ANNs. Unlike traditional ML approaches, DL networks do not rely on explicit feature extraction and classification. Instead, they directly learn hierarchical representations of data from raw inputs, enabling them to autonomously discover intricate patterns and relationships within the data [27,28]. ANNs draw inspiration from the structure and functionality of biological neural networks. They consist of interconnected artificial neurons arranged in layers, including an input layer, one or more hidden layers, and an output layer. These networks employ weighted connections between neurons and learn by adjusting these weights to facilitate parallel distributed processing and solve complex tasks [27].

Artificial neural networks can detect complex, nonlinear relationships between dependent and independent variables and identify all possible interactions among variables [29]. A study conducted in Yugoslavia in 1999 found that an ANN was a reliable method for predicting chronic kidney rejection compared to traditional statistical approaches [30]. Many studies involving post-organ transplant patients have utilized ANN models [23,24,25,26,31,32,33,34,35]. In a study with 717 kidney transplant patients, ANNs and Support Vector Machines were effectively used to predict survival outcomes in transplant recipients. The SVM, MLP-ANN, and Logistic Regression models demonstrated sensitivity rates of 98.2%, 97.3%, and 97.5%, and specificity rates of 49.6%, 26.1%, and 17.4%, respectively [34]. A study involving 519 patients used the Random Forest algorithm to predict delayed graft function, achieving a sensitivity of 0.67, specificity of 0.97, and a positive likelihood ratio of 22.33, indicating superior predictive performance [36]. Additionally, in a study with 531 patients to predict the likelihood of severe pneumonia after kidney transplantation, ML models such as SVM, LR, and RF were used, with RF showing excellent performance with 97% specificity [37]. Numerous studies have incorporated the Random Forest algorithm in this context [23,24,25,26,31,32,33,34,35]. Some studies have identified SVM as the most suitable model for predicting graft functionality [38]. Other frequently used algorithms in the literature include SVM [23,24,25,31,32,34], DT [23,24,25,29,39], LR [23,24,25,29,39], KNN [24,34,36], RNN [35,40], NB [34,41], and LSTM networks [35,42].

ANNs (artificial neural networks) can be divided into four main types: single-layer perceptrons, multilayer perceptrons (MLPs), feed-forward neural networks, and feedback neural networks. A single-layer perceptron consists of only one input layer and one output layer. It is restricted to linearly separable problems and has a simple structure. The MLP, on the other hand, is a widely used model designed for learning nonlinear patterns. It consists of an input layer, one or more hidden layers, and an output layer. MLPs allow complex mappings between inputs and outputs. Feed-forward neural networks, as shown in Figure 6, have a unidirectional flow of information. Input data, such as in a multilayer perceptron, are passed through the hidden layers and finally to the output layer. In contrast, feedback neural networks have a bidirectional flow of information between the hidden and output layers. These networks have dynamic memory and can retain information from previous steps, as an output may serve as input in subsequent iterations [43].

In this study, both machine learning algorithms (SVM, NB, KNN, LR, RF, DT) and deep learning algorithms (RNN and LSTM) were utilized.

3.5.1. K-Nearest Neighbors

The K-Nearest Neighbors (KNN) algorithm is a supervised learning technique employed for classification tasks. It is recognized as one of the simplest and oldest classification algorithms. Unlike the NB (Naive Bayes) approach, the KNN algorithm does not rely on probability values but directly utilizes the data for classification without constructing a pre-defined model. In the KNN algorithm, the adjustable parameter is denoted as “k”, which represents the number of nearest neighbors to consider when estimating the class membership of an item. By adjusting the value of “k”, the algorithm determines the number of neighbors to be examined during the classification process.

In this study, the KNN algorithm was applied with the parameter n_neighbors = 7, indicating that it considered the seven nearest neighbors for classification. The range of “k” values is explored, starting from 1 up to the maximum number of neighbors, which corresponds to the number of data points in the training set. The algorithm’s performance is assessed for each “k” value, and the “k” value associated with the highest accuracy is determined by identifying the maximum element and its index in the evaluation array.

Regarding the calculation of distances between neighbors, the study utilizes the Minkowski [44] method, which computes the distance between two neighbors based on multiple variables. Formula (2) displays the formula for computing the distance using the Minkowski method for N variables. In this particular case, a value of p = 1 is chosen for the Minkowski distance calculation. The selection of the p-value impacts the model’s performance speed, and by opting for a p-value of 1, the model achieves a trade-off between speed and accuracy.

{(\sum_{i = 1}^{n} ∣ p_{i} - q_{i} ∣^{p})}^{1 / p}

(2)

3.5.2. Logistic Regression

Logistic regression is a supervised learning algorithm commonly used for binary classification problems [45]. It determines the boundary between classes and calculates the class probabilities based on the distance from this boundary. It is considered a regression model that predicts the class membership of a data item or input using a regression equation [22]. Logistic regression offers several advantages, including simplicity of implementation, computational efficiency, effectiveness in terms of training, and ease of regularization. Unlike some other algorithms, logistic regression does not require scaling of input features [21]. One of the challenges of logistic regression is that it struggles with solving nonlinear problems. It can be overly adaptive to the training data, which may lead to overfitting. Logistic regression utilizes the sigmoid function, also known as the logistic function, for classification purposes. The sigmoid function compresses the output of the regression equation between 0 and 1, representing the probabilities of belonging to a particular class. The mathematical formula for logistic regression, as shown in Equation (3), involves the input data represented by X and the coefficients represented by

θ

. The sigmoid activation function used in artificial neural networks (ANNs) is also based on the sigmoid function used in logistic regression.

σ (x) = \frac{1}{e^{- (θ x)}}

(3)

3.5.3. Naive Bayes

Naive Bayes (NB) is a supervised learning algorithm, where the model is trained using labeled data. It is a classification technique based on Bayes’ theorem. This theorem can determine the likelihood of an event using prior information about the conditions associated with that event.

NB classifiers are a type of basic probabilistic classifier derived from Bayes’ Theorem, which relies on the assumption that features are strongly independent of each other. They are particularly suitable for cases where the input dimensions are high [46].

3.5.4. Support Vector Machine

Support Vector Machine (SVM) is a type of supervised learning algorithm that is highly effective for solving a wide range of classification tasks. SVM models are closely related to traditional multilayer perceptron neural networks. They can classify both linear and nonlinear data. SVM is applicable to both classification and regression tasks. The algorithm operates by calculating the margin. In this process, each data point is represented in an n-dimensional space, where n corresponds to the number of features in the dataset. The value of each feature represents the corresponding coordinate value. It separates the data into distinct classes by identifying a line (hyperplane) that divides the training datasets. The algorithm operates by maximizing the gap, known as the margin, between the nearest data points from both classes and the hyperplane [46].

3.5.5. Decision Tree

Decision Tree (DT) is a supervised learning algorithm and one of the earliest and most well-known machine learning algorithms. DT is employed to address regression and classification issues by repeatedly partitioning data according to a particular feature. The data are split into nodes, with the tree’s leaves representing the final outcomes. The primary objective of a Decision Tree is to learn straightforward decision rules based on the training data to build a model capable of predicting the target variable. The tree is built during the training process using the training data. Leaf nodes hold the class label, while decision nodes are non-leaf nodes. DT handles both categorical and numerical data. A DT models decision logic, i.e., it tests outcomes and maps them in a tree-like structure to classify data points [21].

3.5.6. Random Forest

Random Forest (RF) is a supervised learning algorithm. In this algorithm, an input is provided at the top, and the data are collected in smaller subsets as it moves through the trees. RF takes the concept of decision trees and elevates it by combining them into a forest. The advantage of the RF classifier is its short training times, ability to handle imbalanced datasets, and capability to manage missing data. In RF, the new dataset or test data are distributed across all the decision trees created. Each decision tree in the forest is given the opportunity to determine the class of the data. The model then chooses the most appropriate class through majority voting. This approach can be applied to both regression and classification tasks [21].

3.5.7. Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of neural network that mimics neurons with connections (unidirectional) directed towards every other node. Each node has a real-valued activation that changes over time. Each connection (synapse) has a real-valued weight, which can be modified during each iteration. The nodes can be either input nodes, which receive data from outside the network, or output nodes, which produce results, or hidden nodes, which modify data passing through them via paths from input to output. RNNs are derived from feed-forward neural networks. Unlike traditional feed-forward neural networks, RNNs can use their internal state, also known as memory, to process input sequences [47].

3.5.8. Long Short-Term Memory

Long Short-Term Memory (LSTM) is an improved version of RNN. These networks are designed to overcome the long-term dependency problem of recurrent networks. RNNs do not retain long-term memories, which is why a different architecture was needed. With the development of LSTM architecture, this issue has been resolved. LSTMs are particularly good at remembering information over long periods. Since previous information can significantly affect the accuracy of the model, LSTMs naturally become the preferred choice for tasks requiring long-term memory [48].

4. Experimental Results and Discussion

4.1. Statistical Analysis of Data

In this study, four dependent variables were analyzed, and each variable was subjected to statistical analysis to create a mathematical model. These models provide information about the independent variables that influence the dependent variables. The two-state logit and the ordinal logit methods used in our study provide a direct fit to the type of dependent variables in the dataset (e.g., mortality status or causes of death). The two-state Logit model is a highly effective method for understanding binary decision processes. Binary outcomes, such as variables like mortality risk or survival status, are commonly modeled using this approach. It adds value to the decision-making process by providing probabilistic predictions. The model’s robustness and ease of interpretability make it particularly valuable in addressing sensitive issues such as mortality prediction. In cases where ordinal outcome variables are present, this model is one of the most suitable methods for estimating rank-based effects. For instance, in situations where causes of death or degrees of complications are ranked, the ordinal logit model provides the ability to evaluate both the probability distributions and the impact of independent variables on the ordinal outcomes.

While modern machine learning methods offer more flexibility, these models are preferred for the following reasons: Compared to machine learning models, the results of these models are easily interpretable. This is especially crucial in areas such as clinical decision support systems, where it is essential to explain why the model produced a particular outcome. Moreover, for small to medium-sized datasets, logit models generate results efficiently and do not require complex computational infrastructures. Finally, the long-standing acceptance of statistical approaches has enhanced the reliability and preference for these models. Particularly in the medical field, doctors and researchers face fewer challenges in understanding the results produced by these models.

4.1.1. Model 1: Estimation of the Ex Variable Using the Two-State Logit Model

The estimation results of the two-state logit model for predicting the probability of patients dying (variable “Ex”) are presented below. The mathematical model incorporates the independent variables that affect the variable in question

\begin{matrix} E x = - 0.469 + (0.2721 * c a v a u c a k v k n) - (0.4111 * h i l a r) - (0.8010 * b i o p s y 3) - \\ (0.4795 * b i o p s y 4) + (0.7721 * d i a b e t e s 1) + (0.8469 * d i a b e t e s 2) \end{matrix}

The interpretation of the model will be based on the odds ratio (OR) values. In this context, when the OR < 1, it indicates a decreasing effect compared to the reference group, while an OR > 1 suggests an increasing effect compared to the reference group. In the current study, a statistically significant relationship was observed between cavaucakvkn1 and the dependent variable “Ex”. The odds ratio analysis revealed that patients with cavaucakvkn1 end-to-end anastomosis had a higher likelihood of mortality compared to patients without this type of anastomosis (OR = 1.313; p < 0.05). Moreover, patients without hilar invasion but with alveolar invasion were less likely to experience mortality compared to patients with hilar invasion (OR = 0.663; p < 0.01).

Additionally, the estimation results of the two-state logit model showed a statistically significant relationship between biopsy and the dependent variable “Ex”. Patients with hepatitis had a lower likelihood of mortality compared to patients with cirrhosis (OR = 0.449; p < 0.05). Similarly, patients with tumors had a lower likelihood of mortality compared to patients with cirrhosis (OR = 0.619; p < 0.05). Furthermore, a statistically significant relationship was identified between diabetes and the dependent variable “Ex” based on the estimation results of the two-state logit model. It was determined that patients using diabetes_insulin had a higher likelihood of mortality compared to patients using insulin and OAD (OR = 2.164; p < 0.05). Similarly, patients using diabetes_OAD had a higher likelihood of mortality compared to patients using insulin and OAD (OR = 2.332; p < 0.05).

4.1.2. Model 2: Estimation of the Cause of Death Variable Using the Ordinal Model

The estimation results of the ordinal logit model for predicting the cause of death of patients are presented below. The mathematical model incorporates the dependent variable “cause of death” and the independent variables that affect this variable.

\begin{matrix} C a u s e o f d e a t h = 26, 569 + 28, 121 + (0.0034 * t o t a l k c a g) + (0.0982 * p a b) + (1.7694 * \\ m o n t h) + (2.7482 * m i d h e p a t i c v e i n) \end{matrix}

The interpretation of the model will consider both the odds ratio (OR) and marginal effect (ME) values. When interpreting the odds for continuous variables, as there is no reference category, the odds are analyzed for different values of the independent variable (Power and Xie, 2000:76). In the present study, a statistically significant relationship was found between the total age and the cause of death. Based on the odds ratio analysis, it was observed that as the weight in the total tocag variable increased, there was a shift towards multiorgan failure in the patients (OR = 1.003; p < 0.10). Furthermore, according to the marginal effect values, an increase in the total tocag weight reduced the cause of death in the patients by 0.002%. Similarly, a statistically significant relationship was identified between the patient’s Pab value and the cause of death. The analysis results indicated that as the Pab value increased, there was a tendency towards multiorgan failure in the patients (OR = 1.103; p < 0.05). According to the marginal effect values, an increase in the Pab value decreased the cause of death in the patients by 0.5%. Moreover, a statistically significant relationship was observed between the month value and the cause of death. The analysis results revealed that as the month value increased, the patients showed a tendency towards multiorgan failure (OR = 5.867; p < 0.10). The marginal effect values indicated that an increase in the month value reduced the cause of death in the patients by 20.5%.

Additionally, based on the odds ratio analysis, a statistically significant relationship was determined between the orthohepatic vein and the cause of death. It was found that the cause of death in patients with orthohepatic_alveolar but without invasion showed a shift towards multiorgan failure compared to patients with invasion (OR = 15.614; p < 0.05). According to the marginal effect results, patients with orthohepatic venous_alveolar but without invasion decreased the cause of death by 36.6%.

4.1.3. Model 3: Estimation of the Tumor Recurrence Variable Using the Two-State Logit Model

The results of the two-state logit model estimation for predicting tumor recurrence in patients are presented below. The mathematical model incorporates the dependent variable “tumor recurrence” and the independent variables that affect this variable.

\begin{matrix} T u m o r R e c u r r e n c e = 0.0722 - (0.0064 * d o n o r s i z e) + (0.0028 * p r i m a r y t u m o r s i z e) + \\ (0.2672 * n u m b e r o f t u m o r s) - (0.0244 * t u m o r d i a m e t e r 2) + (0.5132 * t u m o r l o c a t i o n 5) - \\ (0.1709 * h e p a t i c s t a t u s) - (0.0052 * p a t h o l o g y t u m o r d i a m e t e r 1) + (0.0065 * p a t h o l o g y \\ t u m o r d i a m e t e r 2) + (0.7931 * h t) \end{matrix}

The interpretation of the model will be made according to the odds ratio (OR) values. Accordingly, a statistically significant relationship was found between donor size and tumor recurrence. According to the results of odds ratios, it was determined that there was a decrease towards the category of “patient has tumor and recurrence” as the donor size value was increased (OR = 0.993; p < 0.01). Similarly, a statistically significant relationship was found between the primary tumor size and tumor recurrence. According to the results of the odds ratios, it was determined that there was an increase towards the category of “patient has tumor and recurrence” as the primary tumor size was increased (OR = 1.003; p < 0.01). A statistically significant relationship was found between the number of tumors and tumor recurrence. Accordingly, it was determined that as the number of tumors increased, there was an increase in the patient tumor and recurrence category (OR = 1.306; p < 0.01). A statistically significant relationship was found between tumor diameter 2 and tumor recurrence. Accordingly, it was determined that there was a decrease towards the category of “the patient has tumor and recurrence” as the tumor diameter 2 variable was increased (OR = 0.975; p < 0.01). A statistically significant relationship was detected between the tumor location 5 variable and tumor recurrence. According to the results of the betting odds ratios, it was determined that there was a decrease towards the category “the patient has tumor and recurrence” as the tumor location 5 was increased (OR = 1.671; p < 0.05). Similarly, a statistically significant relationship was found between the hepatic status variable and tumor recurrence. As the hepatitis status variable was increased, there was a decrease towards the category of “the patient has tumor and recurrence” (OR = 0.843; p < 0.01). A statistically significant relationship was found between the pathology tumor diameter 1 variable and tumor recurrence. As the pathology tumor diameter 1 increased, it was determined that there was a decrease towards the category of “the patient has tumor and recurrence” (OR = 0.995; p < 0.05). Similarly, a statistically significant relationship was found between the pathology tumor diameter 2 variable and tumor recurrence. As the pathology tumor diameter 2 increased, there was an increase towards the category of “the patient has tumor and recurrence” (OR = 1.007; p < 0,05). A statistically significant relationship was found between the Ht variable and tumor recurrence. Accordingly, it was determined that as Ht increased, there was an increase towards the category of “the patient has tumor and recurrence” (OR = 2.210; p < 0.01).

4.1.4. Model 4: Estimation of the Recurrence Location Variable Using the Two-State Logit Model

The estimation results of the two-state logit model for predicting the relapse location of patients are presented below. The mathematical model includes the dependent variable “Recurrence Location” and the independent variables that affect this variable.

\begin{matrix} L o c a t i o n o f R e c u r r e n c e = - 0.0302 + (0.0728 * d o n o r g e n d e r m a l e) + (0.0031 * d o n o r s i z e) + \\ (0.0641 * t y) + (0.0033 * p r i m a r y t u m o r s i z e) + (0.1959 * n u m b e r o f t u m o r s) - (0.0171 * \\ t u m o r s i z e 2) + (0.0067 * p a t t u m d i a m e t e r 2) \end{matrix}

The interpretation of the model will be based on the odds (OR) values. The results of the analysis revealed statistically significant relationships between various variables and tumor recurrence. Firstly, a statistically significant relationship was found between the donorboy variable and tumor recurrence. The odds ratio analysis showed that as the donorboy value increased, there was a decrease towards the patient tumor and recurrence category (OR = 0.993; p < 0.01). Similarly, a statistically significant relationship was identified between the size of the anatumor and tumor recurrence. The odds ratio analysis indicated that as the size of the anatumor increased, there was an increase towards the patient tumor and recurrence category (OR = 1.003; p < 0.01). Furthermore, a statistically significant relationship was observed between the number of tumors and tumor recurrence. It was found that as the number of tumors increased, there was an increase in the patient tumor and recurrence category (OR = 1.306; p < 0.01). Additionally, a statistically significant relationship was found between the tumor diameter 2 variable and tumor recurrence. The odds ratio analysis showed that as the tumor diameter 2 variable increased, there was a decrease towards the patient tumor and recurrence category (OR = 0.975; p < 0.01).

Moreover, a statistically significant relationship was observed between the tumor localization 5 variable and tumor recurrence. The odds ratio analysis indicated that as the tumor localization 5 variable increased, there was a decrease towards the patient tumor and recurrence category (OR = 1.671; p < 0.05). Similarly, a statistically significant relationship was found between the liver status variable and tumor recurrence. It was determined that as the liver status variable increased, there was a decrease towards the patient tumor and recurrence category (OR = 0.843; p < 0.01). Furthermore, a statistically significant correlation was found between the pathology tumor diameter 1 variable and tumor recurrence. As the pathology tumor diameter 1 increased, there was a decrease towards the patient tumor and recurrence category (OR = 0.995; p < 0.05). Similarly, a statistically significant correlation was observed between the pathology tumor diameter 2 variable and tumor recurrence. It was found that as the pathology tumor diameter 2 increased, there was an increase towards the patient tumor and recurrence category (OR = 1.007; p < 0.05). Lastly, a statistically significant relationship was found between the Ht variable and tumor recurrence. The odds ratio analysis showed that as Ht increased, there was an increase towards the patient tumor and recurrence category (OR = 2.210; p < 0.01).

4.2. Results

In this study, machine learning (ML) and deep learning (DL) models were trained and tested, and their performance was evaluated using various performance metrics. The evaluation of model performance involved multiple metrics to ensure a comprehensive assessment, as relying solely on a single metric may not be sufficient for datasets that are not evenly distributed. The performance metrics used in this study included precision (Precision-PR), recall (Recall-RE), F1-score (F1-SC), accuracy (Accuracy-ACC), and Receiver Operating Characteristic (ROC) analysis. These metrics provide valuable insights into different aspects of model performance. For binary classification problems, the 2 × 2 confusion matrix is commonly used to visualize the predictions made by the models. The matrix displays the number of samples predicted for each of the four possible outcomes: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). These values form the basis for deriving various classifier performance metrics. In the context of liver transplantation, F1-SC is more meaningful as it represents the overall performance of the model in terms of minimizing both false positives and false negatives. The application of AI in this domain aims to identify high-risk patients more accurately and to support clinical decision-making processes effectively.

In Figure 7, the confusion matrix is depicted as an example, using the dependent variable “Ex”. The confusion matrix serves as a useful tool for understanding the performance of the model and its predictions.

Figure 7 illustrates the confusion matrix using the dependent variable “Ex” as an example. In the matrix:

TP represents the number of true positives, instances correctly predicted as “Ex” by the model.
FP represents the number of false positives, instances incorrectly predicted as “Ex” by the model while they are actually “Alive”.
FN represents the number of false negatives, instances incorrectly predicted as “Alive” by the model while they are actually “Ex”.
TN represents the number of true negatives, instances correctly predicted as “Alive” by the model.

Accuracy is a measure of the overall success of the model, calculated by dividing the correctly classified data by the total number of data points. However, it is important to note that accuracy may not be an effective metric in datasets with imbalanced class distributions. Using a combination of these performance metrics provides a comprehensive evaluation of the models’ performance and helps assess their effectiveness in the specific classification problem at hand.

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(4)

Precision measures the probability that the samples predicted as positive by the model are actually positive. It provides an indication of the model’s ability to avoid false positives.

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

Recall, also known as sensitivity, represents the proportion of actual positive samples that are correctly predicted as positive by the model. It measures the model’s ability to identify positive instances.

R e c a l l = \frac{T P}{T P + F N}

(6)

The F1-score, as stated, is a measure of the model’s robustness. It combines both precision and recall into a single metric, providing a balanced assessment of the model’s performance. The F1-score is calculated as the harmonic mean of precision and recall, ranging from 0 to 1, where 1 indicates the best possible performance.

F 1 - s c o r e = \frac{2 * R e c a l l * P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(7)

The results from the ML and DL models created for the dependent variable “Ex” before applying any feature extraction technique are presented in Table 2. Based on these results, SVM and RF achieved 88% precision, LR, KNN, SVM, and RF achieved 86% recall, LR, KNN, SVM and RF achieved 83% F1-score and LSTM achieved an AUC value of %98. These models also showed higher accuracy compared to other models, although the exact values are not mentioned.

Furthermore, the results obtained from the models created using the PCA method for the “Ex” dependent variable are provided in Table 2. Among these results, the DT model demonstrated higher performance with 92% precision, 91% recall, 90% F1-score, 90% accuracy and LSTM achieved an AUC value of %97 compared to other models. Lastly, the results obtained from the models created using the LDA method for the “Ex” dependent variable are also shown in Table 2. LR, KNN, SVM, and NB achieved 86% precision, LR, KNN, SVM, NB, RNN, and LSTM achieved 82% recall, LR, KNN, SVM, and NB achieved 83% F1-score, LR, KNN, and SVM achieved 82% accuracy and LSTM achieved an AUC value of 98%. The aforementioned models demonstrated superior performance when compared to the other models evaluated in the study.

The inclusion of precise specifications and particulars pertaining to the models, encompassing their configurations, quantity, and dataset employed, remains absent from the accessible information. Nevertheless, the available information does proffer valuable observations regarding performance metrics, such as precision, recall, F1-score, and accuracy, concerning the “Ex” dependent variable. This permits a comparative analysis of the models within diverse scenarios.

According to the information provided, Table 3 presents the results obtained from ML and DL models for the dependent variable “relapse location” before applying any feature extraction technique. Based on these results:

DT model achieved 81% precision.
RNN and LSTM models achieved 82% recall.
RNN and LSTM models obtained 74% F1-score.
RNN and LSTM models achieved 82% accuracy.
LSTM model achieved 99% AUC score.

These models demonstrated higher performance compared to other models in terms of the mentioned performance metrics for the “relapse location” dependent variable. Furthermore, Table 3 also includes the results obtained from models created using the PCA method for the “relapse location” dependent variable. Based on these results:

RNN and LSTM models achieved 67% precision.
RNN and LSTM models achieved 82% recall.
RNN and LSTM models obtained 74% F1-score.
RNN and LSTM models achieved 82% accuracy.
LSTM model achieved an AUC score of 98%.

These models showed higher performance compared to other models in terms of the mentioned performance metrics. Lastly, the results obtained from the models created using the LDA method for the “relapse location” dependent variable are provided in Table 3. However, the specific performance metrics and values for these models are not mentioned in the information provided.

According to the information provided, Table 4 presents the results obtained from ML and DL models for the dependent variable “tumor recurrence” before applying any feature extraction technique. Based on these results:

RNN and LSTM models achieved 62% precision.
RNN and LSTM models achieved 79% recall.
RNN and LSTM models obtained 69% F1-score.
RNN and LSTM models achieved 79% accuracy.
The LSTM model achieved an AUC score of 97%.

These models demonstrated higher performance compared to other models in terms of the mentioned performance metrics for the “tumor recurrence” dependent variable. Additionally, Table 4 also includes the results obtained from models created using the PCA method for the “tumor recurrence” dependent variable. Based on these results:

RNN and LSTM models achieved 62% precision.
RNN and LSTM models achieved 79% recall.
RNN and LSTM models obtained 69% F1-score.
RNN and LSTM models achieved 79% accuracy.
LSTM model achieved 99% AUC score.

These models showed higher performance compared to other models in terms of the mentioned performance metrics. Lastly, the results obtained from the models created using the LDA method for the “tumor recurrence” dependent variable are provided in Table 4. Based on the given information, models created without applying any feature extraction techniques generally exhibited low performance. LSTM demonstrated the highest performance across all metrics and is the strongest model. LSTM achieved performance values ranging between 79 and 99% in terms of precision, recall, F1-score, and accuracy. RNN stands out as the second-best model after LSTM, showing particularly good performance with AUC values of 97–99% in PCA and LDA methods. The LDA method improved performance in models such as LR, RF, and DT. While PCA provided performance improvements in some models, it was ineffective in others. Overall, KNN and SVM showed lower performance compared to other models.

According to the provided information, Table 5 presents the results obtained from ML and DL models for the dependent variable “cause of death location” before applying any feature extraction technique. Based on these results:

RF model achieved 85% precision.
LR, DT, and RF models achieved 86% recall.
RF model obtained 85% F1-score.
DT and RF models achieved 86% accuracy.
LSTM model achieved an AUC score of 96%.

These models demonstrated higher performance compared to other models in terms of the mentioned performance metrics for the “cause of death location” dependent variable. Furthermore, Table 5 also includes the results obtained from models created using the PCA method for the “cause of death location” dependent variable. Based on these results:

LR model achieved 80% precision.
LR, DT, RF, RNN, and LSTM models achieved 82% recall.
LR model achieved 81% F1-score.
SVM, DT, RF, RNN, and LSTM models achieved 82% accuracy.
LSTM model achieved an AUC score of 98%.

These models showed higher performance compared to other models in terms of the mentioned performance metrics. Lastly, the results obtained from the models created using the LDA method for the “cause of death location” dependent variable are provided in Table 5. However, the specific performance metrics and values for these models are not mentioned in the information provided. According to the provided information, LSTM demonstrated the highest performance across all metrics and is the best model. It achieved a precision performance between 67 and 85%, while recall and F1-score ranged between 82 and 85%. Under the LDA method, the LSTM model showed the best performance with an AUC value of 99%. RNN achieved results close to LSTM in terms of AUC and accuracy. Overall, KNN and SVM showed lower performance compared to other models. For instance, the AUC value of KNN remained between 76 and 80% across all methods.

The information provided emphasizes the utilization of the k-fold cross-validation method in the model to address errors that may arise due to variations between the training and testing datasets. This technique involves randomly partitioning the data into k subsets, where in each subset is used for both training and testing in an iterative manner. Through this process, the model’s performance is assessed across different subsets of the data, thereby reducing the influence of any specific data split and enhancing the reliability of the results.

In this study, a k value of 13 was chosen to enhance the effectiveness of the ML models’ performance. The selection of the optimal k value depends on factors such as the size of the dataset and the complexity of the problem. By performing cross-validation with various k values, the accuracy values can be observed and considered before separating the data into training and testing sets. Figure 8 illustrates the process of cross-validation performed for various k values, displaying the accuracy values obtained for both the training and test data. This depiction provides insights into the relationship between the k value and model performance, enabling the selection of an appropriate k value that yields optimal results.

Formula (8) represents the calculation of accuracy values for cross-validation with N k values. The dataset was divided into equal parts based on the obtained k value, and the accuracy scores for each partition were recorded.

C r o s s - v a l i d a t i o n = \frac{1}{N} \sum_{i = 1}^{b} A_{i}

(8)

After conducting cross-validation, the models were subjected to the Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) methods. However, the specific details of these methods, such as their configurations or parameters, are not provided in the available information. Table 6 showcases the accuracy values of the models created with the dependent variable “Ex” after undergoing cross-validation.

In the study, k-fold cross-validation was performed on the “Ex” dependent variable. Among the machine learning (ML) and deep learning (DL) models, the highest average results were obtained when utilizing the Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) methods. The confusion matrices corresponding to these models, which produced the best outcomes with the LDA method, are depicted in Figure 9. A confusion matrix, also referred to as an error matrix, visually represents the performance of the models by displaying the various outcomes of their predictions. Among the ML models, LR, SVM, NB, and KNN demonstrated similar results and generally performed well. RF and DT showed lower performance compared to the other models. Among the DL models, LSTM did not make any misclassifications. The performance of RNN was significantly lower than LSTM. LSTM was the most successful model among all, while RNN was the weakest model. Let us examine the provided example of the confusion matrix for the model created with the Decision Tree (DT) algorithm:

Total Test Data: 20
True Positive (TP): The model correctly predicted 14 instances as “Ex”.
False Positive (FP): The model incorrectly predicted 4 instances as “Ex” that were actually alive.
False Negative (FN): The model incorrectly predicted 1 instance as alive that was actually “Ex”.
True Negative (TN): The model correctly predicted 3 instances as alive.

These numbers give insights into the model’s performance in terms of correctly identifying “Ex” cases and alive cases. It is important to note that without access to the actual confusion matrices and the specific details of the study, only a general explanation based on the information can be provided.

Figure 10 and Figure 11 display the loss and accuracy function graphs for RNN and LSTM, which are DL models used for the “Ex” dependent variable with the LDA method. The decreasing loss function indicates a reduction in model error, while the synchronous behavior of accuracy in the training and testing phases suggests that the model predicts consistently as it learns.

The receiver operating characteristic (ROC) curve is a crucial criterion for evaluating the predictive performance of classification models, including those created using machine learning (ML) and deep learning (DL) techniques. It provides insights into how well the models can make predictions. The ROC curve represents the relationship between recall (true positive rate) and the false positive rate. By graphically displaying this relationship, the ROC curve enables a clear understanding of the trade-off between recall and the false positive rate. In the graph, the false positive rate (FPR) is plotted on the x-axis, while recall (also known as the true positive rate, TPR) is plotted on the y-axis. An ideal ROC curve demonstrates superior performance when the curve approaches the top-left corner of the graph [49].

Figure 12 presents the ROC curves for various machine learning (ML) and deep learning (DL) models. In Figure 12a, LR and RF, as ML models, produced better results. These models have a high true positive rate and a low false positive rate. In Figure 12b, among the DL models, LSTM performed better than RNN.

The models created with machine learning (ML) and deep learning (DL) for the dependent variable “relapse location” were evaluated using k-fold cross-validation to obtain the mean highest results. Figure 13 presents the confusion matrices for the models generated using the LDA method, which achieved the highest performance. Among the ML models, LR, SVM, RF, DT, and KNN demonstrated similar results and were generally successful in negative classification. Overall, their performance was good. NB showed lower performance compared to the other models. The performance of the LSTM model among the DL models was similar to that of the ML models. RNN achieved the same performance as LSTM. All models had low success in positive classification and low true positive rates. As the relapse location variable consists of three states, the confusion matrix is represented as a 3 × 3 matrix. For instance, in the KNN confusion matrix, 16 out of 22 data represent the correct prediction of no tumor recurrence, while 5 data indicate the accurate prediction of tumor presence without recurrence. Additionally, 1 out of 22 data represent the correct prediction of relapse by the model. Zeros in the matrix signify cases where the models did not predict the corresponding probabilities.

The loss and accuracy function graphs of the DL models, RNN and LSTM, for the relapse dependent variable using the LDA method are shown in Figure 14 and Figure 15. The loss function demonstrates a decreasing trend as the epoch value increases, indicating a reduction in model error. The synchronous accuracy between the training and testing phases indicates that the model effectively predicts while learning.

Figure 16 shows the ROC curve of different ML and DL algorithms for the relapse location dependent variable. As seen in Figure 16, the position of the Logistic Regression model at the top of the ROC curve indicates a high AUC value and a high level of classification accuracy. LR outperforms all other models. While KNN and SVM models show good performance, Gaussian Naive Bayes, Decision Tree Classifier (DTC), and Random Forest models exhibit lower performance. The LSTM model outperforms the RNN model in terms of performance. Furthermore, the ROC curve of the LSTM model demonstrates a notably low false positive rate and a high true positive rate.

The models created with machine learning (ML) and deep learning (DL) for the dependent variable “tumor recurrence” achieved their highest performance through k-fold cross-validation. Figure 17 presents the confusion matrices of the models generated using the LDA method. Among the ML models, LR, SVM, RF, DT, and KNN demonstrated similar results but failed in positive classification. Overall, their performance was good. NB showed better performance in the positive class compared to the other models. The performance of the LSTM model among the DL models was similar to that of NB. RNN failed in the positive class. As an example, the SVM confusion matrix demonstrates that out of the 22 data points, 15 were correctly classified as the absence of tumors by the model, and no tumors were found by the model. Additionally, 1 data point represents the accurate prediction of tumor presence without recurrence, while 6 data points represent the correct prediction of tumor relapse by the model. Zeros in the matrix indicate cases where the models did not predict the corresponding probabilities.

Figure 18 shows the ROC curve of different ML and DL algorithms for the tumor recurrence dependent variable. The Logistic Regression, SVM, and Random Forest models are the models with the highest performance. These models have a high true positive rate and a low false positive rate. KNN, Gaussian Naive Bayes, and Decision Trees are closer to a flat line, indicating lower performance. Models that make random predictions have low classification success rates. The LSTM model outperforms the RNN model. While the ROC curve of the LSTM model demonstrates ideal performance in classification problems, the performance of the RNN model is quite limited.

The models created with machine learning (ML) and deep learning (DL) achieved their highest performance using k-fold cross-validation. Figure 19 presents the confusion matrices of the models generated using the LDA method. Among the ML models, LR, SVM, RF, NB, and KNN demonstrated similar results and showed good performance in both positive and negative classes. DT, however, has a higher number of false positives compared to other ML models. Among the DL models, the LSTM model exhibited the best performance. RNN failed in the positive class and was the weakest model. For instance, in the LSTM confusion matrix, out of the 22 datasets, 18 were correctly classified by the model as cardiac arrest when the cause of death was indeed cardiac arrest. Additionally, three data points were correctly identified by the model as deaths caused by organ failure, although the cause of death was labeled as multi-organ failure. Moreover, there was one data point representing cases where no cause of death was found, and the model predicted the same. Zeros in the matrix indicate cases where the models did not make predictions for those particular probabilities.

Figure 20 and Figure 21 depict the loss and accuracy function graphs of the RNN and LSTM models, which are deep learning models for the cause of death dependent variable. The loss function demonstrates a declining trend as the epoch value increases, indicating a reduction in the model’s error over the course of training. As can be seen from the Figure 20 and Figure 21, a very low loss and high accuracy are observed for the test set.

Figure 22 shows the ROC curve of different ML and DL algorithms for the cause of death dependent variable. Figure 22a shows the performance of various machine learning models, while Figure 22b compares two different neural network models (LSTM and RNN). The Logistic Regression, SVM, and Random Forest models show high performance when looking at the ROC curve. KNN and DTC demonstrate low performance. While LSTM is among the best DL models, it is also the model with the highest accuracy among all models. It was able to correctly separate all positive classes without any false positive rate.

5. Conclusions and Future Work

In this study, the ML and DL methods were employed to evaluate the diagnosis of liver diseases, quality of life, and death status of patients after liver transplantation. The dataset comprised information from 110 patients who underwent liver transplantation at Erzurum Atatürk University Training and Research Hospital’s organ transplantation department. The dataset contained 188 attributes, including the age and gender of living and cadaver donors, the cause of death of cadavers, recipient’s blood results, transport procedures, blood usage, and post-transplant infection occurrence.

The collected data underwent preprocessing, which involved handling outliers, checking for empty data, and removing duplicate entries. Following preprocessing, the dataset was divided into training (80%) and testing (20%) sets. To enhance the model’s performance, feature extraction methods such as PCA and LDA were employed. ML algorithms utilized in the study included NB, SVM, LR, RF, DT, and KNN, while DL algorithms consisted of RNN and LSTM.

Various performance metrics, including precision, recall, F1-score, and accuracy, were employed to assess the models’ performance. The highest accuracy values were obtained using k-fold cross-validation. For the dependent variable “Ex”, applying PCA with k-fold cross-validation yielded LR, KNN, SVM, and NB models with the highest accuracy of 96%. Similarly, applying LDA with k-fold cross-validation resulted in LR, KNN, SVM, and NB models achieving the highest accuracy of 98%. Regarding the dependent variable “relapse location”, applying PCA with k-fold cross-validation yielded LR, KNN, SVM, and NB models with the highest accuracy of 96%. With LDA and k-fold cross-validation, LR, DT, RF, and LSTM models achieved the highest accuracy of 98%.

Concerning the dependent variable “tumor recurrence”, applying PCA with k-fold cross-validation resulted in LR, KNN, SVM, RF, and NB models achieving the highest accuracy of 96%. Applying LDA with k-fold cross-validation yielded LR and DT models with the highest accuracy of 99%. For the dependent variable “cause of death”, applying PCA with k-fold cross-validation yielded LR with the highest accuracy of 87%. Using LDA and k-fold cross-validation, LR and KNN models achieved the highest accuracy of 99%.

This manuscript aims to explore how artificial intelligence and machine learning algorithms can be applied in liver transplantation processes and to provide a proof of concept in this area. While the technical aspects of the models used are important, our primary focus is on how these models can be integrated into clinical decision support systems and contribute to clinical processes.

Specifically, in the context of liver transplantation, the independent variables analyzed include key parameters influencing patient outcomes. The potential contributions of the artificial intelligence methods employed in this study to clinical practice include supporting physicians in decision-making, automating complex data analyses, and providing faster and more accurate insights for patient management. While the primary audience of this study is engineers rather than clinicians, we believe that our findings will guide the development of AI-based systems and their integration into clinical settings in the future.

This study evaluated the factors influencing the quality of life for patients following liver transplantation by utilizing a novel dataset generated through the proposed models. Distinct from existing research, this work introduced a newly developed dataset, offering fresh insights to the field. Future research will focus on achieving improved outcomes by leveraging larger datasets and extending the study’s scope to encompass other types of organ transplants. This study is proof of concept and continues with increasing patient numbers at the center where it is being conducted. With more patients, clinically significant results will be achieved, and it will also contribute to the medical literature.

Author Contributions

M.Y.: Writing—review and editing, software, writing—original draft, methodology; G.Ö.: Project administration, writing—review and editing, supervision; F.B.: Writing—review and editing, writing—original draft, methodology; Z.B.: Writing—review and editing, visualization, methodology; S.K. (Sinan Kul): Writing—review and editing, visualization, software; E.Ş.: Conceptualization, visualization; S.K. (Salih Kara): Investigation; H.E.: Statistical analysis; N.A. (Necip Altundaş): Investigation, methodology; N.A. (Nurhak Aksungur): Investigation, methodology; E.K.: Investigation, methodology; M.S.B.: Visualization; Z.Y.D.: Conceptualization; N.Ö.: Conceptualization, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Scientific and Technological Research Council of Turkey (TUBITAK). Grant No 120E403.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Ataturk University (protocol number: B.30.2.ATA.0.01.00/11 and date: 26 December 2019). Ethical approval was obtained for this study due to involving humans personal health data.

Informed Consent Statement

We hereby declare that we obtained informed consent from all patients included to this study.

Data Availability Statement

The data supporting the findings of this study are not publicly available due to restrictions or proprietary agreements.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tanwar, N.; Rahman, K.F. Machine learning in liver disease diagnosis: Current progress and future opportunities. Iop Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012029. [Google Scholar] [CrossRef]
Morisco, F.; Bruno, R.; Bugianesi, E.; Burra, P.; Calvaruso, V.; Cannoni, A.; Caporaso, N.; Caviglia, G.P.; Ciancio, A.; Fargion, S.; et al. AISF position paper on liver disease and pregnancy. Dig. Liver Dis. 2016, 48, 120–137. [Google Scholar] [CrossRef] [PubMed]
Pugliese, N.; Arcari, I.; Aghemo, A.; Lania, A.G.; Lleo, A.; Mazziotti, G. Osteosarcopenia in autoimmune cholestatic liver diseases: Causes, management, and challenges. World J. Gastroenterol. 2022, 28, 1430. [Google Scholar] [CrossRef]
de Villa, G.H.; Chen, C.-T.; Chen, Y.-R. Spontaneous bone regeneration of the mandible in an elderly patient: A case report and review of the literature. Chang. Gung Med. J. 2003, 26, 363–369. [Google Scholar] [PubMed]
Jin, H.; Kim, S.; Kim, J. Decision factors on effective liver patient data prediction. Int. J.-Bio-Sci.-Bio-Technol. 2014, 6, 167–178. [Google Scholar] [CrossRef]
Ayeldeen, H.; Shaker, O.; Ayeldeen, G.; Anwar, K.M. Prediction of liver fibrosis stages by machine learning model: A decision tree approach. In Proceedings of the 2015 Third World Conference on Complex Systems (WCCS), Marrakech, Morocco, 23–25 November 2015; pp. 1–6. [Google Scholar]
Abdar, M.; Zomorodi-Moghadam, M.; Das, R.; Ting, I.-H. Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 2017, 67, 239–251. [Google Scholar] [CrossRef]
Yu, Y.-D.; Lee, K.-S.; Kim, J.M.; Ryu, J.H.; Lee, J.-G.; Lee, K.-W.; Kim, B.-W.; Kim, D.-S.; Korean Organ Transplantation Registry Study Group. Artificial intelligence for predicting survival following deceased donor liver transplantation: Retrospective multi-center study. Int. J. Surg. 2022, 105, 106838. [Google Scholar] [CrossRef] [PubMed]
Cruz-Ramı´rez, M.; Hervás-Martı´nez, C.; Fernández, J.C.; Briceño, J.; Mata, M.d. Multi-objective evolutionary algorithm for donor–recipient decision system in liver transplants. Eur. J. Oper. Res. 2012, 222, 317–327. [Google Scholar] [CrossRef]
Hashem, S.; ElHefnawi, M.; Habashy, S.; El-Adawy, M.; Esmat, G.; Elakel, W.; Abdelazziz, A.O.; Nabeel, M.M.; Abdelmaksoud, A.H.; Elbaz, T.M.; et al. Machine learning prediction models for diagnosing hepatocellular carcinoma with HCV-related chronic liver disease. Comput. Methods Programs Biomed. 2020, 196, 105551. [Google Scholar] [CrossRef]
Briceño, J.; Cruz-Ramírez, M.; Prieto, M.; Navasa, M.; Urbina, J.O.D.; Orti, R.; Gómez-Bravo, M.-Á.; Otero, A.; Varo, E.; Tomé, S.; et al. Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: Results from a multicenter Spanish study. J. Hepatol. 2014, 61, 1020–1028. [Google Scholar] [CrossRef] [PubMed]
Chen, C.; Chen, B.; Yang, J.; Li, X.; Peng, X.; Feng, Y.; Guo, R.; Zou, F.; Zhou, S.; Hei, Z. Development and validation of a machine learning model for prediction of liver transplantation outcomes. Front. Med. 2023, 10, 1066817. [Google Scholar]
Ilyas, Ö. Recurrent neural network based methods for hepatitis diagnosis. Int. Symp. Sci. Res. Innov. Stud. 2021, 22, 25. [Google Scholar]
Zhang, D.; Hao, X.; Wang, D.; Qin, C.; Zhao, B.; Liang, L.; Liu, W. An efficient lightweight convolutional neural network for industrial surface defect detection. Artif. Intell. Rev. 2023, 56, 10651–10677. [Google Scholar] [CrossRef]
Zhang, D.; Hao, X.; Liang, L.; Liu, W.; Qin, C. A novel deep convolutional neural network algorithm for surface defect detection. J. Comput. Des. Eng. 2022, 9, 1616–1632. [Google Scholar] [CrossRef]
Serban, M.; Balescu, I.; Petrea, S.; Gaspar, B.; Pop, L.; Varlas, V.; Stoian, M.; Diaconu, C.; Balalau, C.; Bacalbasa, N. Artificial intelligence and liver transplantation; literature review. J. Mind Med. Sci. 2024, 11, 374–380. [Google Scholar] [CrossRef]
Prasad, R.; Kasiemobi, O.M. Prediction of mortality in liver transplant recipients using neural network. In Proceedings of the 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT), Greater Noida, India, 9–10 February 2024; Volume 5, pp. 1759–1765. [Google Scholar]
Hidayat, E.; Fajrian, N.A.; Muda, A.K.; Huoy, C.Y.; Ahmad, S. A comparative study of feature extraction using PCA and LDA for face recognition. In Proceedings of the 2011 7th International Conference on Information Assurance and Security (IAS), Melacca, Malaysia, 5–8 December 2011; pp. 354–359. [Google Scholar]
Aburomman, A.A.; Reaz, M.B.I. Ensemble of binary SVM classifiers based on PCA and LDA feature extraction for intrusion detection. In Proceedings of the 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 3–5 October 2016; pp. 636–640. [Google Scholar]
Widiantoro, A.D.; Mustafid, M.; Sanjaya, R. Model Analytic in Fintech User Comment Features Using LDA-CNN on Imbalanced Data. Int. J. Intell. Eng. Syst. 2024, 17, 1079–1098. [Google Scholar]
Ibrahim, I.; Abdulazeez, A. The role of machine learning algorithms for diagnosing diseases. J. Appl. Sci. Technol. 2021, 2, 10–19. [Google Scholar] [CrossRef]
Osisanwo, F.Y.; Akinsola, J.E.T.; Awodele, O.; Hinmikaiye, J.O.; Olakanmi, O.; Akinjobi, J. Supervised machine learning algorithms: Classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 2017, 48, 128–138. [Google Scholar]
Badrouchi, S.; Ahmed, A.; Bacha, M.M.; Abderrahim, E.; Abdallah, T.B. A machine learning framework for predicting long-term graft survival after kidney transplantation. Expert Syst. Appl. 2021, 182, 115235. [Google Scholar] [CrossRef]
Moghadam, P.; Ahmadi, A. A machine learning framework to predict kidney graft failure with class imbalance using Red Deer algorithm. Expert Syst. Appl. 2022, 210, 118515. [Google Scholar] [CrossRef]
Chawla, R.; Balaji, S.; Alabdali, R.N.; Naguib, I.A.; Hamed, N.O.; Zahran, H.Y. Predicting the kidney graft survival using optimized African buffalo-based artificial neural network. J. Healthc. Eng. 2022, 2022, 6503714. [Google Scholar] [CrossRef] [PubMed]
Yoo, K.D.; Noh, J.; Lee, H.; Kim, D.K.; Lim, C.S.; Kim, Y.H.; Lee, J.P.; Kim, G.; Kim, Y.S. A machine learning approach using survival statistics to predict graft survival in kidney transplant recipients: A multicenter cohort study. Sci. Rep. 2017, 7, 890. [Google Scholar] [CrossRef] [PubMed]
Dreiseitl, S.; Ohno-Machado, L. Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inform. 2002, 35–36, 352–359. [Google Scholar] [CrossRef]
Paheding, S.; Saleem, A.; Siddiqui, M.F.H.; Rawashdeh, N.; Essa, A.; Reyes, A.A. Advancing horizons in remote sensing: A comprehensive survey of deep learning models and applications in image classification and beyond. Neural Comput. Appl. 2024, 36, 16727–16767. [Google Scholar] [CrossRef]
Paquette, F.X.; Ghassemi, A.; Bukhtiyarova, O.; Cisse, M.; Gagnon, N.; Della Vecchia, A.; Rabearivelo, H.A.; Loudiyi, Y. Machine learning support for decision-making in kidney transplantation: Step-by-step development of a technological solution. Jmir Med. Inform. 2022, 10, e34554. [Google Scholar] [CrossRef] [PubMed]
Simic-Ogrizovic, S.; Furuncic, D.; Lezaic, V.; Radivojevic, D.; Blagojevic, R.; Djukanovic, L. Using ANN in selection of the most important variables in prediction of chronic renal allograft rejection progression. Transplant. Proc. 1999, 31, 368. [Google Scholar] [CrossRef] [PubMed]
Atallah, D.M.; Badawy, M.; El-Sayed, A.; Ghoneim, M.A. Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier. Multimed. Tools Appl. 2019, 78, 20383–20407. [Google Scholar] [CrossRef]
Mark, E.; Goldsman, D.; Gurbaxani, B.; Keskinocak, P.; Sokol, J. Using machine learning and an ensemble of methods to predict kidney transplant survival. PLoS ONE 2019, 14, e0209068. [Google Scholar] [CrossRef]
Hassani, Z.; Emami, N. Prediction of the survival of kidney transplantation with imbalanced data using intelligent algorithms. Comput. Sci. J. Mold. 2018, 26, 163–181. [Google Scholar]
Ravindhran, B.; Chandak, P.; Schafer, N.; Kundalia, K.; Hwang, W.; Antoniadis, S.; Haroon, U.; Zakri, R.H. Machine learning models in predicting graft survival in kidney transplantation: Meta-analysis. BJS Open 2023, 7, zrad011. [Google Scholar] [CrossRef]
Shi, P.; Fu, C. Time-dependent LSTM for Survival Prediction and Patient Subtyping in Kidney Disease Trajectory. medRxiv 2024. [Google Scholar]
Decruyenaere, A.; Decruyenaere, P.; Peeters, P.; Vermassen, F.; Dhaene, T.; Couckuyt, I. Prediction of delayed graft function after kidney transplantation: Comparison between logistic regression and machine learning methods. BMC Med. Inform. Decis. Mak. 2015, 15, 83. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Tang, Z.; Hu, X.; Lu, S.; Miao, B.; Hong, S.; Bai, H.; Sun, C.; Qiu, J.; Liang, H.; et al. Machine learning for the prediction of severe pneumonia during posttransplant hospitalization in recipients of a deceased-donor kidney transplant. Ann. Transl. Med. 2020, 8, 82. [Google Scholar] [CrossRef] [PubMed]
Greco, R.; Papalia, T.; Lofaro, D.; Maestripieri, S.; Mancuso, D.; Bonofiglio, R. Decisional trees in renal transplant follow-up. Transplant Proc. 2010, 42, 1134–1136. [Google Scholar] [CrossRef] [PubMed]
Tolstyak, Y.; Zhuk, R.; Yakovlev, I.; Shakhovska, N.; Gregus ml, M.; Chopyak, V.; Melnykova, N. The ensembles of machine learning methods for survival predicting after kidney transplantation. Appl. Sci. 2021, 11, 10380. [Google Scholar] [CrossRef]
Esteban, C.; Staeck, O.; Baier, S.; Yang, Y.; Tresp, V. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In Proceedings of the 2016 IEEE International Conference on Healthcare Informatics (ICHI), Chicago, IL, USA, 4–7 October 2016; pp. 93–101. [Google Scholar]
Brown, T.S.; Elster, E.A.; Stevens, K.; Graybill, J.C.; Gillern, S.; Phinney, S.; Salifu, M.O.; Jindal, R.M. Bayesian modeling of pretransplant variables accurately predicts kidney graft survival. Am. J. Nephrol. 2012, 36, 561–569. [Google Scholar] [CrossRef]
Lofaro, D.; Maestripieri, S.; Greco, R.; Papalia, T.; Mancuso, D.; Conforti, D.; Bonofiglio, R. Prediction of chronic allograft nephropathy using classification trees. Transplant Proc. 2010, 42, 1130–1133. [Google Scholar] [CrossRef]
Öztemel, E. Artificial Neural Networks; PapatyaYayincilik: Istanbul, Turkey, 2003. [Google Scholar]
Ichino, M.; Yaguchi, H. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 1994, 24, 698–708. [Google Scholar] [CrossRef]
Graf, R.; Zeldovich, M.; Friedrich, S. Comparing linear discriminant analysis and supervised learning algorithms for binary classification—A method comparison study. Biom. J. 2024, 66, 2200098. [Google Scholar] [CrossRef] [PubMed]
Alzubi, J.; Nayyar, A.; Kumar, A. Machine learning from theory to algorithms: An overview. J. Phys. Conf. Ser. 2018, 1142, 012012. [Google Scholar] [CrossRef]
Swapna, G.; Vinayakumar, R.; Soman, K.P. Diabetes detection using deep learning algorithms. ICT Express 2018, 4, 243–246. [Google Scholar]
Hasan, M.W. Design of IoT energy consumption forecasting model for residential buildings based on improved long short-term memory (LSTM). Meas. Energy 2025, 5, 100033. [Google Scholar] [CrossRef]
Ertorsun, A.D.; Bağ, B.; Uzar, G.; Turanoğlu, M.A. Evaluation of the Performance of Diagnostic Tests with ROC (Receiver Operating Characteristic) Curve Method. 2010. Available online: https://tip.baskent.edu.tr/kw/upload/464/dosyalar/cg/sempozyum/ogrsmpzsnm12/10.2.pdf (accessed on 20 January 2025).

Figure 2. Donor weight histogram graph in the original dataset.

Figure 3. Donor weight data distribution graph (histogram) after data preprocessing.

Figure 4. Feature importance on LR model without using feature extraction method.

Figure 5. Feature importance on the LR model after feature extraction with PCA (a), and LDA (b) method.

Figure 6. DNN model.

Figure 7. 2 × 2 Confusion matrix.

Figure 8. N-layer cross-validation.

Figure 9. Confusion matrix of ML and DL models of “Ex” dependent variable with LDA.

Figure 10. LSTM loss (a) and accuracy (b) function graphs of “Ex” dependent variable with LDA.

Figure 11. RNN accuracy (a) and loss (a) function graphs of “Ex” dependent variable with LDA.

Figure 12. ROC curve of ML models (a) and ROC curve of DL models (b) for the “Ex” dependent variable with LDA.

Figure 13. Confusion matrix of ML and DL models of relapse site dependent variable with LDA.

Figure 14. LSTM loss (a) and accuracy (b) function graphs of the relapse location dependent variable with LDA.

Figure 15. RNN loss (a) and accuracy (b) function graphs of the relapse location dependent variable with LDA.

Figure 16. ROC curve of ML models (a) and ROC curve of DL models (b) for the relapse site dependent variable with LDA.

Figure 17. Confusion matrix of ML and DL models of tumor recurrence dependent variable with LDA.

Figure 18. ROC curve of ML models (a) and ROC curve of DL models (b) of tumor recurrence dependent variable with LDA.

Figure 19. Confusion matrix of ML and DL models of cause-of-death dependent variable with LDA.

Figure 20. LSTM loss (a) and accuracy (b) function graphs of the cause of death dependent variable with LDA.

Figure 21. RNN loss (a) and accuracy (b) function graphs of the cause of death dependent variable with LDA.

Figure 22. ROC curve of ML models (a) and ROC curve of DL models (b) of the cause of death dependent variable with LDA.

Table 1. Hospital dataset average, minimum, and maximum values.

	Average	Max. Value	Min. Value
Gender	1.44	2	1
Age	40.33	73	1
Size (cm)	160.67	185	1
Weight (kg)	65.75	109	12
BMI	30.25	36.94	9.91
Blood group	3.64	8	1
Donor Age	34.53	71	0
Donor Gender	1.34	2	1
Donor Kinship Degree	0.87	5	0
Donor Height	168.86	190	125

Table 2. Performance metrics for Ex dependent variable (N-M: non-method).

Models	Precision (%)			Recall (%)			F1–Score (%)			Accuracy (%)			AUC (%)
	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA
LR	82	85	86	82	86	82	82	86	83	86	86	82	66	70	80
KNN	67	75	86	82	86	82	74	76	83	86	77	82	50	70	77
SVM	88	79	86	86	86	82	83	79	83	86	81	82	84	69	80
NB	72	75	86	55	54	82	60	76	83	54	77	77	52	70	80
DT	75	92	84	77	77	77	76	90	79	77	90	77	56	70	76
RF	88	82	84	86	86	77	83	82	79	86	82	77	93	76	77
RNN	67	67	67	82	81	82	74	75	74	81	81	81	76	77	70
LSTM	67	67	67	82	81	82	74	74	74	81	81	81	98	97	98

Table 3. Performance metrics for the recurrence-location variable (N-M: non-method).

Models	Precision (%)			Recall (%)			F1–Score (%)			Accuracy (%)			AUC (%)
	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA
LR	53	46	52	73	68	68	61	55	59	72	68	68	90	31	45
KNN	53	46	53	73	68	73	61	55	61	72	68	73	53	47	49
SVM	53	46	53	73	68	73	61	55	61	72	68	73	92	45	70
NB	53	46	52	73	68	68	61	55	59	72	68	68	50	38	42
DT	81	46	55	77	68	73	72	55	63	72	68	72	62	47	51
RF	53	46	55	73	68	73	61	55	63	73	68	72	67	47	46
RNN	67	67	67	82	82	82	74	74	74	82	82	82	75	77	50
LSTM	67	67	67	82	82	82	74	74	74	82	82	82	99	98	98

Table 4. Performance metrics for tumor recurrence dependent variable (N-M: non-method).

Models	Precision (%)			Recall (%)			F1-Score (%)			Accuracy (%)			AUC (%)
	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA
LR	46	46	76	68	68	73	55	55	65	31	68	73	91	91	52
KNN	46	46	46	68	68	68	55	55	55	68	68	68	53	53	56
SVM	46	46	46	68	68	68	55	55	55	68	68	68	79	93	76
NB	46	46	76	68	68	73	55	55	65	68	68	73	50	50	50
DT	51	46	49	68	68	68	58	55	57	68	68	68	53	50	51
RF	46	45	49	68	64	68	55	53	57	68	64	68	83	50	56
RNN	62	62	62	79	79	79	69	69	69	79	79	79	77	75	50
LSTM	62	62	62	79	79	79	69	69	69	79	79	79	97	99	99

Table 5. Performance metrics for the cause of death dependent variable (N-M: non-method).

Models	Precision (%)			Recall (%)			F1-Score (%)			Accuracy (%)			AUC (%)
	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA
LR	82	80	85	86	82	86	84	81	85	81	81	86	59	60	59
KNN	73	73	80	77	77	82	75	75	81	77	77	82	66	71	76
SVM	67	67	85	82	82	86	74	74	85	82	82	86	80	84	94
NB	75	66	85	68	77	86	70	71	85	68	77	68	61	80	61
DT	84	76	86	86	82	73	82	79	75	86	82	73	60	61	73
RF	85	67	84	86	82	82	85	74	81	86	82	73	82	77	88
RNN	67	67	62	82	82	79	74	74	69	81	82	79	71	77	50
LSTM	67	67	62	82	82	79	74	74	69	82	82	79	96	98	99

Table 6. Accuracy values of models for Ex dependent variable after cross-validation (N-M: non-method).

Models	Precision (%)			Recall (%)			F1-Score (%)			Accuracy (%)			AUC (%)
	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA	N-M	PCA	LDA
LR	85	82	98	96	96	98	96	96	99	92	87	99	66	65	99
KNN	87	88	98	96	96	96	96	96	96	89	86	99	50	71	99
SVM	82	82	98	96	96	96	96	96	96	86	83	98	76	73	100
NB	82	82	98	96	96	96	96	96	96	69	85	95	52	65	99
DT	82	85	97	94	89	98	94	88	99	83	75	97	56	50	95
RF	82	84	97	96	95	98	96	96	97	89	82	98	84	76	99
RNN	67	82	74	77	94	96	97	94	95	86	82	94	71	79	99
LSTM	67	82	74	77	94	98	97	93	98	86	82	94	94	96	99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yağanoğlu, M.; Öztürk, G.; Bozkurt, F.; Bilen, Z.; Yetiş Demir, Z.; Kul, S.; Şimşek, E.; Kara, S.; Eygu, H.; Altundaş, N.; et al. Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers. Appl. Sci. 2025, 15, 1248. https://doi.org/10.3390/app15031248

AMA Style

Yağanoğlu M, Öztürk G, Bozkurt F, Bilen Z, Yetiş Demir Z, Kul S, Şimşek E, Kara S, Eygu H, Altundaş N, et al. Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers. Applied Sciences. 2025; 15(3):1248. https://doi.org/10.3390/app15031248

Chicago/Turabian Style

Yağanoğlu, Mete, Gürkan Öztürk, Ferhat Bozkurt, Zeynep Bilen, Zühal Yetiş Demir, Sinan Kul, Emrah Şimşek, Salih Kara, Hakan Eygu, Necip Altundaş, and et al. 2025. "Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers" Applied Sciences 15, no. 3: 1248. https://doi.org/10.3390/app15031248

APA Style

Yağanoğlu, M., Öztürk, G., Bozkurt, F., Bilen, Z., Yetiş Demir, Z., Kul, S., Şimşek, E., Kara, S., Eygu, H., Altundaş, N., Aksungur, N., Korkut, E., Başar, M. S., & Öztürk, N. (2025). Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers. Applied Sciences, 15(3), 1248. https://doi.org/10.3390/app15031248

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Clinical Decision Support System Using Artificial Intelligence Methods for Liver Transplant Centers

Abstract

1. Introduction

2. Related Works

3. Materials and Method

3.1. Data Input

3.2. Dataset

3.3. Data Preprocessing

3.4. Feature Extraction

3.5. Machine and Deep Learning Techniques

3.5.1. K-Nearest Neighbors

3.5.2. Logistic Regression

3.5.3. Naive Bayes

3.5.4. Support Vector Machine

3.5.5. Decision Tree

3.5.6. Random Forest

3.5.7. Recurrent Neural Networks

3.5.8. Long Short-Term Memory

4. Experimental Results and Discussion

4.1. Statistical Analysis of Data

4.1.1. Model 1: Estimation of the Ex Variable Using the Two-State Logit Model

4.1.2. Model 2: Estimation of the Cause of Death Variable Using the Ordinal Model

4.1.3. Model 3: Estimation of the Tumor Recurrence Variable Using the Two-State Logit Model

4.1.4. Model 4: Estimation of the Recurrence Location Variable Using the Two-State Logit Model

4.2. Results

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI