1. Introduction
Non-Alcoholic Fatty Liver Disease (NAFLD) is a prevalent liver condition with a significant worldwide impact, posing a substantial risk for the development of cirrhosis and hepatocellular carcinoma (HCC) [
1]. Its prevalence affects around one-fourth of the global population [
2], leading to considerable morbidity, increased mortality rates, and imposing a significant burden on affected individuals, their families, and healthcare systems [
3]. Remarkably, the prevalence of NAFLD in Asian countries is comparable to that in Western countries [
2,
4]. NAFLD is characterized by the abnormal accumulation of fat in liver cells (hepatocytes) and is increasingly recognized as a major public health concern due to its high prevalence worldwide. The rise in obesity and metabolic syndrome contributes to NAFLD becoming the leading cause of chronic liver disease and abnormal liver function tests in China [
2,
5,
6]. Recently, NAFLD has emerged as the primary cause of chronic liver disease and the fastest-growing indication for liver transplantation [
2,
3,
7].
Liver biopsy has long been regarded as the gold standard for diagnosing Non-Alcoholic Fatty Liver Disease (NAFLD). However, this diagnostic method has several drawbacks that limit its routine use in public health check-up studies [
8,
9]. The procedure is invasive, posing a risk of bleeding, and may not be suitable for widespread application. To address these concerns, alternative diagnostic techniques have been explored, such as ultrasonography, magnetic resonance imaging, and computed tomography, which can also detect NAFLD. However, these imaging methods come with their own limitations, including being time-consuming, expensive, and often not easily accessible, particularly in remote regions. Although abdominal ultrasound has been widely used for the assessment of NAFLD, while ultrasound is a non-invasive and relatively affordable imaging technique, it does have some limitations when applied to NAFLD diagnosis and characterization. One of the primary challenges lies in its sensitivity to detecting mild levels of fat infiltration, often resulting in false negatives in cases of early stage NAFLD. The presence of obesity can also hinder the clarity of ultrasound images, making it challenging to accurately assess liver fat content and inflammation. Abdominal ultrasound for NAFLD presents additional complexities that demand specialized expertise. Interpreting ultrasound images for NAFLD requires skilled radiologists or sonographers familiar with the specific nuances of liver appearance associated with fatty infiltration. However, not all regions have access to such specialized medical professionals, exacerbating the diagnostic challenge [
10,
11,
12]. Due to the invasiveness and cost associated with a liver biopsy, it is not routinely performed, prompting the need for non-invasive and more practical diagnostic approaches to effectively identify and manage NAFLD [
13].
Recent advancements in the field of Non-Alcoholic Fatty Liver Disease (NAFLD) have led to the development of several new indices for its diagnosis and assessment, including the hepatic steatosis index (HSI) and fatty liver index (FLI). These novel indices integrate various non-invasive parameters, such as clinical, biochemical, and imaging data, to offer a more comprehensive and accurate evaluation of NAFLD. Unlike traditional methods that require invasive liver biopsies, these new indices provide a safer and less burdensome approach to diagnosing and monitoring NAFLD. These innovative indices can be easily implemented in routine clinical practice, enabling earlier detection, intervention, and improved management of NAFLD to mitigate its potential complications. As research continues in this area, FLI and HSI, and other emerging indices hold great potential to revolutionize the diagnosis and management of NAFLD, ultimately contributing to better patient outcomes and reducing the disease’s global burden [
5,
14,
15].
HSI is a non-invasive index used to assess the presence of hepatic steatosis, also known as fatty liver disease. It is based on easily obtainable clinical and biochemical parameters, making it a practical tool for diagnosing and monitoring NAFLD. HSI incorporates factors such as body mass index (BMI), and the levels of certain liver enzymes, including aspartate aminotransferase (AST) and alanine aminotransferase (ALT). The index is calculated using a specific formula. The simplicity and effectiveness of HSI make it a valuable screening tool, particularly in settings where more sophisticated imaging or liver biopsy options might be limited. However, like other non-invasive indices, HSI may not provide a definitive diagnosis and is often used in conjunction with other diagnostic tools to assess the severity and progression of hepatic steatosis. Ongoing research and validation studies are further refining the utility of HSI in clinical practice, offering a valuable contribution to the management of NAFLD.
If indices utilized for NAFLD diagnosis, including HSI, showcase varying degrees of accuracy and area under the curve (AUC), it underscores an ongoing requirement to explore innovative avenues for enhancing early NAFLD detection with heightened precision. A notable drawback of these indices lies in their limited potential to consistently achieve this goal. This drawback accentuates the urgency for more sophisticated and precise methodologies, such as employing machine learning techniques, to refine NAFLD prediction by encompassing a wider array of clinical and biochemical variables. Machine learning harnesses advanced algorithms to scrutinize intricate patterns and relationships within extensive datasets, thereby fostering the development of more refined and personalized diagnostic models. By integrating diverse clinical, imaging, and biochemical data points, machine learning algorithms can heighten the precision of identifying hepatic steatosis, thereby contributing to the early recognition of individuals at risk. This proactive strategy holds significant promise in enhancing patient outcomes by enabling timely interventions and tailored management strategies.
The field of medicine is abundant with data, yet physicians may inadvertently overlook crucial information necessary for accurate disease diagnosis and treatment. Machine learning techniques, widely employed across diverse areas of health sciences, offer a potential solution to this problem. Leveraging large datasets, these techniques can effectively address the challenges of information extraction and analysis. In fact, machine learning has already been successfully utilized in numerous medical disciplines for disease prediction. Predicting NAFLD plays a crucial role in early detection, resource allocation, and public health planning, ultimately leading to improved management. However, the available data on the use of machine learning models for NAFLD prediction worldwide have been limited.
Machine learning models have been employed for several years to predict NAFLD [
16,
17,
18]. Weidong Ji et al. [
16] employed four machine learning algorithms to predict NAFLD in 304,145 adults, with XGBoost showing the best accuracy (0.880) and AUC (0.951). Xu et al. [
17] used 11 techniques on a dataset of 2,522 individuals to achieve an 83% accuracy for NAFLD prediction. Liu et al. [
18] found XGBoost to have the highest accuracy (0.795) among seven models for diagnosing NAFLD in 15,315 Chinese subjects.
The aim of our study is to develop an ensemble machine learning model for the prediction of NAFLD and compare its performance with HSI, using laboratory tests that are easily obtainable and cost-effective. By utilizing such data, we seek to enhance the accuracy and performance of NAFLD prediction, which could have important implications for both diagnosing and treating NAFLD.
2. Materials and Methods
This research adopts a comprehensive approach to develop an ensemble machine learning model to predict NAFLD, encompassing several key steps. Firstly, the datasets underwent thorough review and preprocessing. Next, feature ranking was performed. Subsequently, multiple machine learning models were developed and applied. The performance of each model was evaluated utilizing various metrics such as F1 score, precision, accuracy, recall, confusion matrix, receiver operating characteristics (ROC), AUC, and the mean absolute error (MAE). Based on the results, the most effective models for predicting NAFLD were determined. Subsequently, a novel ensemble machine learning model was introduced. Additionally, HSI was calculated and compared, and its performance was evaluated using the same set of metrics. Lastly, the two evaluated results were compared to determine their relative performance. These models were developed using the Python programming language on Google Colab (Colaboratory).
2.1. Dataset Description
This study utilized the National Health and Nutrition Examination Survey (NHANES) dataset, administered by the Centers for Disease Control (CDC) of the United States [
19]. The dataset comprises 2505 subjects. Our focus was on 23 specific relevant features that are crucial for NAFLD investigation. These features were meticulously selected based on their potential significance in improving our understanding of NAFLD. Notably, the selected features include laboratory tests levels of Total Cholesterol, Weight, Height, BMI, Waist Circumference, white blood cell count (WBC), red blood cell count (RBC), systolic blood pressure (SBP), diastolic blood pressure (DBP), Hemoglobin, albumin, alkaline phosphatase (ALP), aspartate aminotransferase (AST), alanine aminotransferase (ALT),
γ-glutamyl transferase (GGT), Creatinine, Triglycerides, LDL, HDL, and Fasting Glucose. Furthermore, demographic variables comprising age and gender, alongside the presence of NAFLD were included.
2.2. Preprocessing Data and Feature Ranking
To ensure the accuracy of our diagnosis models and disease predictions, it was essential to use procedure of data cleaning. Before analysis, the dataset used in this study contained instances of duplicate and missing data. A median imputation approach was employed for handling missing data. This method is widely accepted and effective in addressing missing data with random missing data. Using the median has advantages like being less affected by outliers and being a reliable estimate. Additionally, median imputation is straightforward, robust, and commonly utilized in various medical research. The median imputation method holds the original distribution of the data while effectively mitigating the influence of missing values on the analysis outcomes. Earlier studies have demonstrated that it surpasses other methods in enhancing analysis accuracy and reducing bias. So, median imputation was deemed suitable for this study because it is effective in handling missing data and can improve the quality of our findings.
After handling missing data, removing duplicates, and converting certain string features to numeric, the datasets underwent data standardization. Following that, the Synthetic Minority Oversampling Technique Edited Nearest Neighbors (SMOTEENN) was applied to tackle the issue of imbalanced data. Given the imbalance between the majority and minority classes in classification problems, the majority class often has a significantly larger number of examples compared to the minority class, leading to biased models. SMOTEENN combats this by generating synthetic samples of the minority class using SMOTE and subsequently cleaning the samples using ENN. This approach creates a more balanced dataset and reduces bias in machine learning models, enhancing the representation of the minority class.
In the next step, feature importance analysis was performed to assess the predictive efficacy of input attributes and to prioritize them accordingly. This study utilized embedded methods, a category of feature importance techniques. These methods integrate feature selection into the model training process, assigning weights or coefficients to individual features based on their importance. Features with greater coefficients or weights are considered more significant. By incorporating feature selection during model training, embedded methods select the most relevant features, contributing to improved model accuracy.
To address leakage, which is a problem in machine learning studies [
20], SMOTEENN was exclusively applied to the training set after the dataset was split. This adjustment was made to ensure the absence of potential data leakage and to ensure the robustness of our model evaluation.
2.3. HSI Calculations
HSI is a validated predictive index developed by Lee et al. in 2009 for determining the presence of NAFLD. This index employs the parameters of body mass index (BMI), ALT/AST ratio, and the presence of diabetes or female gender, to calculate the likelihood of NAFLD, as indicated by the following formula [
21,
22,
23,
24]:
2.4. Model Selection
To predict NAFLD and facilitate early diagnosis and treatment, this study utilized affordable and easily accessible laboratory test data. An approach in machine learning for medical diagnosis was introduced, leveraging the capabilities of diverse algorithms. Specifically, six different algorithms known for their effectiveness in medical diagnosis were selected, including random forest, K-nearest Neighbors (KNN), Logistic Regression, Support Vector Machine (SVM), extreme gradient boosting (XGBoost), and decision tree. Extensive research has demonstrated the efficacy of these algorithms in various medical diagnosis tasks.
A novel ensemble consisting of three of these algorithms was developed to further enhance the robustness and accuracy of the model. By combining the predictions of the base models (SVM, XGBoost, random forest), the ensemble was formed using a voting technique. To achieve a more reliable and accurate outcome, the ensemble aggregates individual predictions and makes the final prediction based on a majority vote, thereby leveraging the collective knowledge of the models. The selection of the base models was carefully executed to create an ensemble that outperforms each individual model in isolation, achieving a superior performance.
2.5. Evaluation of Model Performance
To assess the performance of the developed machine learning algorithms, various statistical methods were employed. This included the calculation of performance metrics such as False Negatives (FN), True Positives (TP), True Negatives (TN), and False Positives (FP). These metrics were utilized to compute seven performance indicators: F1 score, precision, accuracy, recall, confusion matrix, AUC, and ROC. The confusion matrix provided an overview of the predicted outcomes. Furthermore, MAE was computed for the testing and training datasets.
3. Experimental Results
The dataset used in this study comprised a total of 2505 individuals, including 1310 females and 1195 males. It encompassed 23 features, thoroughly detailed in
Table 1, along with additional analysis and information.
After performing the necessary data preprocessing steps, such as elimination of duplicates, the treatment of missing values, and converting specific string attributes into numerical formats, the datasets underwent standardization. Following this, a comprehensive analysis of correlations was conducted, resulting in the production of heat maps that visually illustrated the relationships between the dataset’s features. This visualization is presented in
Figure 1, providing valuable insights into the correlations between different variables.
Following a thorough analysis, each feature’s value was determined through embedded methods, enabling the ranking of features by their importance. The resulting feature importance rankings are presented in
Table 2. Notably, the analysis revealed that gender holds the highest score among the features, indicating its significant impact.
The datasets utilized in this study were divided into a testing set, accounting for 20% of the data, and a training set, representing the remaining 80%.
Table 3 displays the number of subjects in both the training and testing sets, along with the counts of individuals testing positive and negative for NAFLD.
The objective was to determine the most appropriate technique for predicting Non-Alcoholic Fatty Liver Disease (NAFLD), so six distinct machine learning techniques were developed and implemented. Additionally, an ensemble model was constructed to further enhance prediction accuracy. The performance of these models was then evaluated.
After calculating the HSI using a specific formula, the best cutoff (which is found to be 46.47) was determined to maximize AUC, accuracy. The primary goal of this evaluation was to assess the effectiveness of HSI in predicting NAFLD. Following this evaluation, the two sets of results were compared to determine their relative performance. The comparative results of these models are presented in
Table 4.
To evaluate the performance of the implemented techniques, several evaluation metrics were considered. The evaluation metrics included accuracy, precision, recall, F1 score, and AUC. These metrics are detailed in
Table 4.
Figure 2 visually depicts AUC values and ROC curves acquired through employing the ensemble model and the machine learning algorithms. These metrics are crucial in assessing the performance and discrimination capabilities of the models. To compare the predicted values of the implemented techniques with the actual values, the confusion matrix was utilized. The confusion matrix, which outlines the true positives, true negatives, false positives, and false negatives, is detailed in
Table 5. Additionally, MAE values for the testing and training datasets are provided in
Table 6. These values are crucial indicators for assessing model performance and determining whether the model is overfitting to the training data. By comparing the MAE values, one can evaluate the generalization ability of the model.
4. Discussion
Based on the experimental results, the findings suggest that an ensemble model incorporating techniques random forest, XGBoost, and SVM, can serve as an effective tool for healthcare professionals and doctors in predicting NAFLD, utilizing affordable laboratory test data. This ensemble model demonstrates superior performance compared to this index, which has been utilized by doctors for evaluating and screening NAFLD.
Conventional statistical techniques have limitations in directly predicting NAFLD due to their reliance on selecting potential risk factors from data. To overcome these limitations, this study introduces machine learning techniques, which leverage statistical methods for data analysis and evaluation. Machine learning algorithms offer powerful tools for disease study and diagnosis, with the advantage of simultaneously considering multiple features without the need for variable selection. By applying machine learning techniques, this research aims to enhance NAFLD prediction by comprehensively analyzing diverse factors and their complex relationships, providing a data-driven and multivariate approach to improve diagnostic accuracy.
Furthermore, in our study, it was observed that HSI exhibited lower accuracy and AUC compared to the machine learning models utilized. This indicates that HSI was limited in its ability to accurately identify individuals with NAFLD, as it could only detect a small number of patients with the condition. Conversely, all of the machine learning models demonstrated higher F1 scores than HSI. This indicates that the machine learning models outperformed HSI in terms of precision and recall, achieving a better balance between correctly identifying true positives and minimizing false positives and false negatives. The higher F1 scores achieved by the machine learning models provide supporting evidence that they exhibited superior performance in predicting NAFLD compared to HSI.
SMOTEENN was also exclusively applied to the training set after the dataset was split. This adjustment was made to ensure the absence of potential data leakage and to ensure the robustness of our model evaluation. Our evaluation metrics, including accuracy, precision, recall, F1-score, and AUC, remained consistently high. Minor variations observed in certain metrics were within an acceptable range and did not compromise the overall performance of the model. Furthermore, the Mean Absolute Error (MAE) for both the training and testing datasets remained very low, indicating a strong fit of the model to the data. Confusion matrices were generated for both scenarios to provide a comprehensive assessment of our model’s performance. Given these results, confidence is asserted that data leakage did not occur. Additionally, it can be confirmed that no subject contributed data to both the training and test sets. In conclusion, applying SMOTEENN exclusively to the training set after splitting the dataset maintains the integrity of our findings while addressing potential concerns regarding data leakage. The consistency of our evaluation metrics reaffirms the robustness of our approach and the validity of our results.
Over the years, the utilization of machine learning techniques in disease prediction has been widely explored by numerous researchers. In a study conducted by Weidong Ji et al. [
16], four machine learning algorithms were employed to predict Non-Alcoholic Fatty Liver Disease (NAFLD) using a dataset consisting of 304,145 adults. Among these four algorithms, XGBoost demonstrated the highest performance, with accuracy of 0.880 and AUC value of 0.951. Xu et al. [
17] undertook a study to evaluate the optimal predictive clinical model for NAFLD using machine learning techniques. The study encompassed a dataset of 2,522 individuals who met the diagnostic criteria for NAFLD. Among the 11 different techniques employed, the best performance was observed, yielding an accuracy of 83%. Liu et al. [
18] conducted a study aiming to explore the predictive capabilities of seven machine learning tools for NAFLD on 15,315 Chinese subjects. Among these models, the XGBoost model exhibited the highest accuracy, achieving a value of 0.795. This model demonstrated the best prediction ability among all the models constructed in the study, highlighting its superior performance in accurately diagnosing NAFLD.
Atsawarungruangkit et al. conducted a study aiming to create machine learning models for predicting Non-Alcoholic Fatty Liver Disease (NAFLD), utilizing data from the NHANES 1988–1994 dataset comprising 3235 participants, sourced from the National Center for Health Statistics (NCHS). Comparing the results, they found that the ensemble of random undersampling (RUS) boosted trees achieved the highest accuracy at 71.1%. Interestingly, a simpler model, referred to as “coarse trees,” outperformed this with an accuracy of 74.9% [
25].
The findings of previous studies strongly support the effectiveness of machine learning tools in predicting NAFLD. These studies provide compelling evidence of the significant potential of machine learning algorithms in screening and identifying individuals at risk of NAFLD. The results affirm that machine learning techniques can be valuable tools in improving the accuracy and efficiency of NAFLD diagnosis. Consistent with the aforementioned studies, our own research corroborated the effectiveness of machine learning tools in NAFLD prediction. In our study, we developed an ensemble model using machine learning techniques, which achieved an impressive accuracy of 99% and an AUC of 100%. These results serve as further validation of the utility and efficacy of machine learning models in predicting NAFLD. Additionally, our findings highlight the potential of employing an ensemble approach to enhance the accuracy of NAFLD predictions. Together, the cumulative evidence from previous studies and our own research underscores the robustness and promising nature of machine learning tools in NAFLD prediction. These results emphasize the valuable contribution of machine learning algorithms in improving the screening and diagnosis of this prevalent liver disease. It is crucial to emphasize that while the ensemble model shows promising results in the accurate prediction of NAFLD, its effectiveness and integration into clinical practice require further research and validation studies. Nonetheless, the current findings indicate its potential as a valuable tool for healthcare professionals, utilizing affordable laboratory test data to aid in the diagnosis of NAFLD. To ensure its reliable implementation, additional research is warranted to validate its performance and establish its role in the healthcare setting.
Gender has emerged as a critically important factor in NAFLD, which is currently the most common liver disorder worldwide. Sexual dimorphism is evident in NAFLD, with notable disparities in both prevalence and severity based on gender. These differences are not solely influenced by sociocultural factors or lifestyle variations, but also attributed to biological disparities resulting from chromosomal makeup and sex hormone levels [
26]. Numerous studies have demonstrated the impact of gender on NAFLD prevalence. For instance, a Japanese study conducted over 12 years found that the average prevalence of fatty liver in men was double that observed in women (26% vs. 13%). Interestingly, women exhibited a gradual increase in prevalence with age, whereas men demonstrated a relatively stable prevalence across all age groups. In the 70–79 age group, the prevalence of NAFLD was higher in females compared to males. Similarly, a study conducted in South China revealed a significantly higher prevalence of NAFLD in men compared to women below the age of 50 (22.4% vs. 7.1%). However, the prevalence reversed among individuals over the age of 50, with higher rates observed in women (27.6% vs. 20.6%). In humans, NAFLD predominantly affects men, while premenopausal women are equally protected from NAFLD and cardiovascular disease [
27]. In our study, we conducted an analysis to determine the importance value of each feature and ranked them accordingly. Strikingly, the results indicated that gender exhibited the highest scores in terms of feature importance, reinforcing its significance in predicting NAFLD. These findings align with previous studies that have demonstrated the prominent role of gender in the prevalence and severity of NAFLD. Therefore, gender is a significant and influential feature in NAFLD. It plays a crucial role in the varying prevalence and severity of the disease. Understanding the mechanisms underlying gender disparities in NAFLD is crucial for developing targeted therapeutic interventions. Further research is needed to explore the complex interplay between gender, sex hormones, and NAFLD to optimize management strategies and improve patient outcomes.
This study has notable strengths that contribute to its robustness and generalizability. Firstly, it utilizes the NHANES dataset, which encompasses a diverse racial and ethnic background of individuals in the United States. The dataset is specifically designed to represent the non-institutionalized civilian population and includes oversampling of specific demographic groups, ensuring adequate representation. This approach enhances the study’s findings and minimizes potential bias. Additionally, this study leverages large-scale datasets for testing and training the algorithms, enabling a comprehensive evaluation across different models. Notably, an ensemble model is developed to enhance accuracy and optimize algorithm performance. These strengths collectively enhance the reliability and applicability of the study’s results.
Nevertheless, our study has certain limitations, notably the utilization of restricted datasets, the absence of clinical data, and the lack of a clinical trial. For future endeavors, our goal is to integrate further attributes pertaining to NAFLD, which will aid in the creation of more dependable and effective machine learning methods. Additionally, we strongly advocate for the implementation of a clinical trial to authenticate the efficacy of these techniques in real-world situations.