Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD)
:1. Introduction
2. Methodology
3. Data
3.1. Data Understanding
3.2. Selecting the Data
3.3. Calculated Attributes
3.3.1. CKD Stage
3.3.2. High-Risk Profile Classification
3.4. Cleansing the Data
- High Variance Value: We encounter an issue where multiple variations convey the same meaning due to data sourced from multiple channels over the years. This issue can be caused by typographical errors or variations in term usage. Some typographical errors exist in the raw data, such as ‘Caffien’, which should have the same meaning as ‘Caffein’. There are also differences in using the term, such as when some patients have medical records for ‘Tuberculosis’ and others have ‘TB’, which actually have the same meaning. We anticipate that this high variance value will have an impact on the modeling accuracy, so we manually set the grouping value for some attributes to minimize the variance. For instance, we classify substances called ‘Caffien’ and ‘Caffein’ using the same feature as ‘Substance_Caffein’. The doctor, with his domain knowledge, assists in this manual grouping process to prevent misinterpretation.
- Handle Null Data: Null data are a critical aspect to avoid in the modeling process since their existence can have a substantial impact on the modeling outcomes. Therefore, it is crucial to handle this issue. In this research, there are two approaches to handling it: (1) populate it with an exception categorical like ‘UNKNOWN’, or (2) do not use the patient data in the modeling process. When the data are categorical, not numeric, and have a prior exception category, we choose the first approach. Example: Out of 2498 patients, 205 (8.21%) have a null value in their ‘race’ feature. Since there are already 7 categories of race, including ‘UNKNOWN’, we update the 205 patients with the ‘Null’ race to ‘UNKNOWN’. Unlike approach 1, we use approach 2 when the feature is numeric in nature, allowing us to not estimate the value. There are 371 patients who have no ‘Creatinine’ data to calculate their CKD stage; unfortunately, we are not included in the modeling process to reduce the bias.
- Irregular data: The raw data are derived from patient registries, not from clinical studies. The timing of the visits is irregular, and not all tests or measurements are conducted consistently [27]. This causes doubts about the reliability of time-related data points. Figure 5 depicts a snapshot of the medical history of one patient. Both the first and second rows record the same medical history, ‘Abdomen Protube’, at the same visit time, ‘visit 1’, and on the same study day for history collection (mhdy feature), ‘1’. Therefore, we can assume these data are duplicates. Hence, instead of relying on a count-based approach, we utilize the max aggregate method. For instance, we set the ‘Hypertension’ column to 1 if a patient has at least one recorded instance of hypertension. On the other hand, if there are no such records, we set the column to 0.
3.5. Transformed Data
3.6. Dataset Diversity Limitation
4. Modeling
- Data Splitting: The transformed data are split into training and testing sets to ensure robust model evaluation. The data are divided in an 80:20 ratio, where 80% of the data is used for training the model and 20% is reserved for testing during the evaluation phase.
- Dataset Definition: The experimental modeling utilizes three distinct datasets, the details of which are provided in the following section. These datasets are curated to reflect diverse features relevant to high-risk patient identification.
- Handling Imbalanced Data: As discussed earlier, the ratio of high-risk to non-high-risk patient profiles in the dataset is 2.4:1. This class imbalance may cause the model to be biased toward the majority class, potentially leading to over-fitting, where the model performs well on the training data but poorly on unseen test data. To mitigate this issue, two techniques were employed.
- Model Training: Seven machine learning algorithms are selected for this study. The rationale for selecting these algorithms, along with the experimental results, is discussed in detail in a later section.
- Model Performance Metrics: The models are evaluated using several performance metrics, including accuracy, precision, recall, F1 score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). AUC-ROC is chosen as the primary performance metric, as it provides a robust measure of performance for both imbalanced and balanced datasets, being less influenced by the distribution of classes [35].
- Hyperparameter Tuning: The model with the highest AUC score is further optimized through hyperparameter tuning. Optuna, a hyperparameter optimization framework, is utilized to systematically search for the best set of parameters, improving the model’s performance.
- Explainable AI (XAI) Approach: To enhance the interpretability of the selected model, two local model-agnostic XAI techniques are applied: Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). These methods are chosen because they can be applied to any machine learning model, regardless of its architecture.
- Explainability Evaluation: The primary objective of this research is to improve experts’ interpretability of the AI model’s predictions. Therefore, a human evaluation process is conducted. A custom evaluation matrix, focusing on the explainability of the model’s predictions, is developed and distributed to medical experts (doctors) for assessment.
4.1. Define Datasets
4.2. Handling Imbalanced Data
4.3. Model Training
4.4. Model Performance Metrics
4.5. Hyperparameter Tuning
4.6. Explainable AI
5. Human Evaluation
5.1. Define Metrics for Explainability
- Trustworthiness: I trust the AI model’s predictions.
- Causality: The model’s explanations help me understand the cause-and-effect relationships behind the predictions.
- Transferability: I believe the model’s insights can be applied to other patients with similar conditions.
- Informativeness: The visualizations and explanations provided by the AI model are informative and clear.
- Confidence: I feel confident in using the AI model’s predictions for decision-making.
5.2. Selection of Models for Assessment
5.3. Conducting the Survey with Users
5.4. Summary of Results on Explainability Levels
6. Conclusions and Future Work
Author Contributions
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Feature for Modeling Process
Feature Type | Features | Values | Source |
Demographic (5 features) | age, sex | continuous | raw data |
race, ethnic | categorical | raw data with null data handling | |
flag_death | binary | raw data with null data handling | |
Intervention (4 features) | is_liver_procedure, is_cyst_procedure, is_kidney_procedure, is_blood_procedure | binary | performing one-hot-encoding from raw data |
Substance Used (4 features) | is_consume_alcohol, is_consume_tobacco, is_consume_caffeine, is_consume_decaffeine | binary | performing one-hot-encoding from raw data |
Hospitalization (1 feature) | is_hospitalization | binary | performing one-hot-encoding from raw data |
Family history (8 feature) | is_affected_mother, is_affected_father, is_affected_siblings, is_affected_aunt_uncle, is_affected_grandparents, is_affected_son, is_affected_daughters, is_affected_others_family_member | binary | performing one-hot-encoding from raw data |
CKD Stage Classification (1 feature) | CKD_classification | categorical | calculated and categorized from raw data |
High Risk Profile Classification (1 feature) | is_high_risk_profile (Target Variable) | binary | calculated and categorized from raw data |
Medical history and clinical record (77 features) | hypertension, flank_pain, back_pain, abdominal_pain, abdomen_protube, anorexia, nausea, anemia, headache, migraine, gout, fatique, changes_in_appetite, back_trouble, acne, asthma, hay_fever, rheumatic, pyelonephritis, rheumatic_heart, tuberculosis, jaundice, heart_attack, allergies, abmass, mood_changes, pneumonia, myeloma, cancer, scarlet_fever, constipation, diarrhea, insomnia, impotence, drowsiness, event_urinary_tract_infection, event_gross_hematuria, event_symptomatic_nephrolithiasis, event_edema, event_nocturia, event_non_exertional_chest_pain, event_shortness_of_breath_at_rest, event_exertional_chest_pain, event_inguinal_hernia, event_umbilical_hernia, event_intracranial_aneurysm, event_kidney_cyst_hemorrhage, event_asymptomatic_nephrolithiasis, event_cardiac_valve_disease, event_heart_palpitations, event_shortness_of_breath_with_exertion, event_diverticulosis, event_end_stage_renal_disease, event_cardiac_arrhythmia, event_shortness_of_breath, event_loss_of_consciousness, event_symptomatic_intracranial_aneurysm, coronary_artery_disease, event_left_ventricular_hypertrophys, event_diabetes_mellitus, event_seizures, event_severe_headache, event_diverticulitis, event_other_acute_neurological_event, event_kidney_cyst_infection, event_congenital_heart_disease, event_abdominal_mass, event_coronary_heart_failure, event_aneurysm, event_hepatic_venous_outflow_obstruction, event_ruptured_intracranial_aneurysm, event_ruptured_intracranial, event_inferio_vena_cava_compression, event_ascending_cholangitis, event_peripheralvascular_disease, event_kidney_stones, event_carotid_disease | binary | performing one-hot-encoding from raw data |
Appendix B. Hyperparameter
Parameter | Value |
Number of Estimators | 100 |
Max Depth | None |
Min Samples Split | 2 |
Min Samples Leaf | 1 |
Max Features | auto |
Bootstrap | True |
Random State | 42 |
Parameter | Value |
Penalty | l2 |
C | 1.0 |
Solver | lbfgs |
Max Iterations | 100 |
Multiclass | auto |
Random State | None |
Parameter | Value |
C | 1.0 |
Kernel | linear |
Degree | 3 |
Gamma | scale |
Max Iterations | −1 |
Random State | None |
Parameter | Value |
Learning Rate | 0.1 |
Number of Estimators | 100 |
Max Depth | −1 |
Num Leaves | 31 |
Subsample | 1.0 |
Colsample By Tree | 1.0 |
Min Child Weight | 1 × 10−3 |
Random State | 42 |
Parameter | Value |
Learning Rate | 0.1 |
Number of Estimators | 100 |
Max Depth | 3 |
Min Samples Split | 2 |
Min Samples Leaf | 1 |
Subsample | 1.0 |
Loss | deviance |
Random State | 42 |
Parameter | Value |
Learning Rate (eta) | 0.3 |
Max Depth | 6 |
Min Child Weight | 1 |
Subsample | 1 |
Colsample by Tree | 1 |
Number of Estimators | 100 |
Gamma | 0 |
Scale Pos Weight | 1 |
Random State | 42 |
Parameter | Value |
Data Scaling | StandardScaler |
Train–Test Split | Test Size: 0.2 |
Random State: 42 | |
Input Layer | Shape (n_features) |
Hidden Layer 1 | Units 64, Activation ReLU |
Hidden Layer 2 | Units 32, Activation ReLU |
Output Layer | Units 1, Activation sigmoid |
Optimizer | Adam, Learning Rate 0.001 |
Loss Function | binary_crossentropy |
Metrics | accuracy |
Epochs | 10 |
Batch Size | 32 |
Validation Split | 0.2 |
Early Stopping | Monitor: val_loss, Patience 5 |
Random State | 42 |
Appendix C. Evaluation Metrics
Set Attribute | Algorithm | Imbalanced Method | Accuracy | Precision | Recall | F1 | AUC |
Set1 | DNN | SMOTE | 0.6864 | 0.6729 | 0.6864 | 0.6777 | 0.6126 |
Set1 | DNN | ROS | 0.6723 | 0.6458 | 0.6723 | 0.6519 | 0.5753 |
Set1 | DNN | 0.7119 | 0.6875 | 0.7119 | 0.6781 | 0.5968 | |
Set1 | GBT | SMOTE | 0.6667 | 0.6282 | 0.6398 | 0.631 | 0.6397 |
Set1 | GBT | ROS | 0.678 | 0.6624 | 0.6871 | 0.6597 | 0.6871 |
Set1 | GBT | 0.7288 | 0.7054 | 0.6018 | 0.6017 | 0.6018 | |
Set1 | LightGBM | SMOTE | 0.6949 | 0.6446 | 0.6432 | 0.6439 | 0.6432 |
Set1 | LightGBM | ROS | 0.6977 | 0.6525 | 0.6575 | 0.6547 | 0.6575 |
Set1 | LightGBM | 0.7288 | 0.6822 | 0.6385 | 0.6475 | 0.6385 | |
Set1 | LR | SMOTE | 0.6808 | 0.6468 | 0.6623 | 0.65 | 0.6623 |
Set1 | LR | ROS | 0.6723 | 0.6417 | 0.6585 | 0.6439 | 0.6585 |
Set1 | LR | 0.7288 | 0.6895 | 0.619 | 0.6254 | 0.6189 | |
Set1 | RF | SMOTE | 0.709 | 0.6592 | 0.6535 | 0.656 | 0.65348 |
Set1 | RF | ROS | 0.6893 | 0.6402 | 0.6415 | 0.6408 | 0.6415 |
Set1 | RF | 0.7232 | 0.678 | 0.6148 | 0.6207 | 0.6148 | |
Set1 | SVM | SMOTE | 0.6582 | 0.6279 | 0.6434 | 0.6292 | 0.6434 |
Set1 | SVM | ROS | 0.661 | 0.6491 | 0.6723 | 0.6437 | 0.6723 |
Set1 | SVM | 0.7401 | 0.7277 | 0.6174 | 0.6218 | 0.6174 | |
Set1 | XGBoost | SMOTE | 0.6751 | 0.6233 | 0.6239 | 0.6236 | 0.6239 |
Set1 | XGBoost | ROS | 0.6695 | 0.6235 | 0.6296 | 0.6258 | 0.6296 |
Set1 | XGBoost | 0.726 | 0.6785 | 0.6316 | 0.64 | 0.6316 | |
Set2 | DNN | SMOTE | 0.7034 | 0.6999 | 0.7034 | 0.7015 | 0.6504 |
Set2 | DNN | ROS | 0.6356 | 0.6437 | 0.6356 | 0.6393 | 0.5878 |
Set2 | DNN | 0.7175 | 0.696 | 0.7175 | 0.6912 | 0.6132 | |
Set2 | GBT | SMOTE | 0.6667 | 0.6391 | 0.6569 | 0.6402 | 0.6569 |
Set2 | GBT | ROS | 0.6582 | 0.6434 | 0.6654 | 0.6392 | 0.6654 |
Set2 | GBT | 0.7316 | 0.7017 | 0.6137 | 0.6181 | 0.6137 | |
Set2 | LightGBM | SMOTE | 0.6921 | 0.6473 | 0.6534 | 0.6498 | 0.6534 |
Set2 | LightGBM | ROS | 0.7006 | 0.6562 | 0.662 | 0.6587 | 0.662 |
Set2 | LightGBM | 0.726 | 0.6773 | 0.6389 | 0.6476 | 0.6389 | |
Set2 | LR | SMOTE | 0.678 | 0.6491 | 0.6676 | 0.6512 | 0.6676 |
Set2 | LR | ROS | 0.6525 | 0.6323 | 0.6515 | 0.6302 | 0.6515 |
Set2 | LR | 0.7316 | 0.7017 | 0.6137 | 0.6181 | 0.6137 | |
Set2 | RF | SMOTE | 0.6977 | 0.6475 | 0.6453 | 0.6463 | 0.6453 |
Set2 | RF | ROS | 0.7006 | 0.6495 | 0.6449 | 0.6469 | 0.6449 |
Set2 | RF | 0.7401 | 0.7216 | 0.6296 | 0.638 | 0.6296 | |
Set2 | SVM | SMOTE | 0.661 | 0.6383 | 0.6577 | 0.6376 | 0.6577 |
Set2 | SVM | ROS | 0.6356 | 0.6367 | 0.6587 | 0.6229 | 0.6587 |
Set2 | SVM | 0.7147 | 0.6929 | 0.5695 | 0.5528 | 0.5695 | |
Set2 | XGBoost | SMOTE | 0.6836 | 0.6314 | 0.6301 | 0.6307 | 0.6301 |
Set2 | XGBoost | ROS | 0.6977 | 0.6525 | 0.6575 | 0.6547 | 0.6575 |
Set2 | XGBoost | 0.7006 | 0.6383 | 0.6008 | 0.6052 | 0.6008 | |
Set3 | DNN | SMOTE | 0.7203 | 0.7091 | 0.7203 | 0.7125 | 0.6519 |
Set3 | DNN | ROS | 0.732 | 0.717 | 0.732 | 0.719 | 0.689 |
Set3 | DNN | 0.7345 | 0.7202 | 0.7345 | 0.7017 | 0.6206 | |
Set3 | GBT | SMOTE | 0.6695 | 0.653 | 0.6761 | 0.6503 | 0.676 |
Set3 | GBT | ROS | 0.6695 | 0.6604 | 0.6859 | 0.654 | 0.6858 |
Set3 | GBT | 0.7458 | 0.7363 | 0.6264 | 0.6334 | 0.6264 | |
Set3 | LightGBM | SMOTE | 0.7034 | 0.6719 | 0.691 | 0.676 | 0.691 |
Set3 | LightGBM | ROS | 0.7006 | 0.667 | 0.684 | 0.6711 | 0.684 |
Set3 | LightGBM | 0.7345 | 0.6908 | 0.6451 | 0.6549 | 0.6451 | |
Set3 | LR | SMOTE | 0.6838 | 0.6581 | 0.679 | 0.6596 | 0.679 |
Set3 | LR | ROS | 0.6751 | 0.6605 | 0.6851 | 0.6572 | 0.6851 |
Set3 | LR | 0.7288 | 0.6995 | 0.6067 | 0.6089 | 0.6067 | |
Set3 | RF | SMOTE | 0.7062 | 0.668 | 0.6808 | 0.6722 | 0.6808 |
Set3 | RF | ROS | 0.6808 | 0.6412 | 0.6525 | 0.6446 | 0.6525 |
Set3 | RF | 0.7514 | 0.7193 | 0.6599 | 0.6722 | 0.6599 | |
Set3 | SVM | SMOTE | 0.6808 | 0.6718 | 0.699 | 0.6658 | 0.699 |
Set3 | SVM | ROS | 0.6525 | 0.6624 | 0.6882 | 0.6433 | 0.6882 |
Set3 | SVM | 0.7147 | 0.6781 | 0.5793 | 0.5707 | 0.5793 | |
Set3 | XGBoost | SMOTE | 0.7147 | 0.6825 | 0.7016 | 0.6872 | 0.7016 |
Set3 | XGBoost | ROS | 0.6723 | 0.6433 | 0.661 | 0.6451 | 0.661 |
Set3 | XGBoost | 0.7316 | 0.6857 | 0.6455 | 0.6548 | 0.6455 | |
Set3 | SVM | 0.7147 | 0.6929 | 0.5695 | 0.5528 | 0.5695 |
Appendix D. Visualization of Extreme Cases
- Mahboob, M.; Rout, P.; Leslie, S.; Bokhari, S. Autosomal Dominant Polycystic Kidney Disease; StatPearls Publishing: Treasure Island, FL, USA, 2024; Volume 3. [Google Scholar]
- Otsuka. What Is ADPKD, What Kind of Disease Is It. Available online: https://adpkd.jp/basic/about.html (accessed on 8 August 2024).
- Willey, C.; Blais, J.; Hall, A.; Krasa, H.; Makin, A.; Czerwiec, F. Prevalence of autosomal dominant polycystic kidney disease in the European Union. Nephrol. Dial. Transplant. 2016, 32, 1356–1363. [Google Scholar] [CrossRef] [PubMed]
- Chebib, F.; Torres, V. Autosomal dominant polycystic kidney disease: Core curriculum 2016. Am. J. Kidney Dis. 2016, 67, 792–810. [Google Scholar] [CrossRef] [PubMed]
- Chapman, A.; Devuyst, O.; Eckardt, K.; Gansevoort, R.; Harris, T.; Horie, S.; Kasiske, B.; Odland, D.; Pei, Y.; Perrone, R.; et al. Autosomal-dominant polycystic kidney disease (ADPKD): Executive summary from a Kidney Disease: Improving Global Outcomes (KDIGO) Controversies Conference. Kidney Int. 2015, 88, 17–27. [Google Scholar] [CrossRef]
- Torres, V.; Chapman, A.; Devuyst, O.; Gansevoort, R.; Grantham, J.; Higashihara, E.; Perrone, R.; Krasa, H.; Ouyang, J.; Czerwiec, F.; et al. Tolvaptan in patients with autosomal dominant polycystic kidney disease. N. Engl. J. Med. 2012, 367, 2407–2418. [Google Scholar] [CrossRef] [PubMed]
- Otsuka. Otsuka’s JINARC the First-Ever Treatment Approved in Europe for Adults Living with ADPKD, a Chronic Genetic Kidney Disease. Available online: https://www.otsuka.co.jp/en/company/newsreleases/assets/pdf/20150528_1_01.pdf (accessed on 30 May 2024).
- Foundation, P. Tolvaptan Treatment for ADPKD. Available online: https://pkdcure.org/tolvaptan/ (accessed on 30 May 2024).
- Torres, V.; Chapman, A.; Devuyst, O.; Gansevoort, R.; Perrone, R.; Koch, G.; Ouyang, J.; McQuade, R.; Blais, J.; Czerwiec, F.; et al. Tolvaptan in later-stage autosomal dominant polycystic kidney disease. N. Engl. J. Med. 2017, 377, 1930–1942. [Google Scholar] [CrossRef]
- Zhang, W.; Blumenfeld, J.; Prince, M. MRI in autosomal dominant polycystic kidney disease. J. Magn. Reson. Imaging 2019, 50, 41–51. [Google Scholar] [CrossRef]
- JYNARQUE. Taking a Holistic Assessment Can Identify Appropriate Patients for JYNARQUE (Tolvaptan). Available online: https://www.jynarquehcp.com/identify-patients (accessed on 30 May 2024).
- Taylor, J.; Thomas, R.; Metherall, P.; Gastel, M.; Cornec-Le Gall, E.; Caroli, A.; Furlano, M.; Demoulin, N.; Devuyst, O.; Winterbottom, J.; et al. An artificial intelligence generated automated algorithm to measure total kidney volume in ADPKD. Kidney Int. Rep. 2024, 9, 249–256. [Google Scholar] [CrossRef]
- Ali, O.; Abdelbaki, W.; Shrestha, A.; Elbasi, E.; Alryalat, M.; Dwivedi, Y. A systematic literature review of artificial intelligence in the healthcare sector: Benefits, challenges, methodologies, and functionalities. J. Innov. Knowl. 2023, 8, 100333. [Google Scholar] [CrossRef]
- Aljaaf, A.; Al-Jumeily, D.; Hussain, A.; Fergus, P.; Al-Jumaily, M.; Abdel-Aziz, K. Toward an optimal use of artificial intelligence techniques within a clinical decision support system. In Proceedings of the 2015 Science and Information Conference (SAI), London, UK, 28–30 July 2015. [Google Scholar]
- Doniyorjon, M.; Madinakhon, R.; Shakhnoza, M.; Cho, Y. An Improved Method of Polyp Detection Using Custom YOLOv4-Tiny. Appl. Sci. 2022, 12, 10856. [Google Scholar] [CrossRef]
- Mukhtorov, D.; Rakhmonova, M.; Muksimova, S.; Cho, Y. Endoscopic Image Classification Based on Explainable Deep Learning. Sensors 2023, 23, 3176. [Google Scholar] [CrossRef]
- Bernardini, M.; Romeo, L.; Frontoni, E.; Amini, M. A Semi-Supervised Multi-Task Learning approach for predicting short-term kidney Disease evolution. IEEE J. Biomed. Health Inform. 2021, 25, 3983–3994. [Google Scholar] [CrossRef] [PubMed]
- Almansour, N.; Syed, H.; Khayat, N.; Altheeb, R.; Juri, R.; Alhiyafi, J.; Alrashed, S.; Olatunji, S. Neural network and support vector machine for the prediction of chronic kidney disease: A comparative study. Comput. Biol. Med. 2019, 109, 101–111. [Google Scholar] [CrossRef] [PubMed]
- Raihan, M.J.; Khan, M.A.M.; Kee, S.H.; Nahid, A.A. Detection of the Chronic Kidney Disease Using XGBoost Classifier and Explaining the Influence of the Attributes on the Model Using SHAP. Sci. Rep. 2023, 13, 6263. [Google Scholar] [CrossRef] [PubMed]
- Barredo Arrieta, A.; Diaz-Rodriguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Lötsch, J.; Kringel, D.; Ultsch, A. Explainable artificial intelligence (XAI) in biomedicine: Making AI decisions trustworthy for physicians and patients. BioMedInformatics 2021, 2, 1–17. [Google Scholar] [CrossRef]
- Gilpin, L.; Bau, D.; Yuan, B.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018. [Google Scholar]
- Ribera, M.; Lapedriza, A. Can we do better explanations? A proposal of User-Centered Explainable AI. In Proceedings of the IUI Workshops ’19, Los Angeles, CA, USA, 20 March 2019. [Google Scholar]
- Schröer, C.; Kruse, F.; Gómez, J. A systematic literature review on applying CRISP-DM process model. Procedia Comput. Sci. 2021, 181, 526–534. [Google Scholar] [CrossRef]
- Wirth, R.; Hipp, J. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000. [Google Scholar]
- Critical Path Institute. Polycystic Kidney Disease Outcomes Consortium. Available online: https://c-path.org/program/polycystic-kidney-disease-outcomes-consortium/ (accessed on 30 May 2024).
- Clinical Data Interchange Standards Consortium. Study Data Tabulation Model, Implementation Guide: Human Clinical Trials, Version 3.4 (Final). Available online: https://sastricks.com/cdisc/SDTMIG%20v3.4-FINAL_2022-07-21.pdf (accessed on 29 November 2023).
- Jo, W.; Kim, S.; Kim, K.; Suh, C.; Kim, J.; Kim, H.; Lee, J.; Oh, W.; Choi, S.; Pyo, J. Correlations between renal function and the total kidney volume measured on imaging for autosomal dominant polycystic kidney disease: A systematic review and meta-analysis. Eur. J. Radiol. 2017, 95, 56–65. [Google Scholar] [CrossRef]
- Park, S.; Jeong, T. Estimated glomerular filtration rates show minor but significant differences between the single and subgroup creatinine-based Chronic Kidney Disease Epidemiology Collaboration equations. Ann. Lab. Med. 2019, 39, 205–208. [Google Scholar] [CrossRef]
- American Kidney Fund. Stages of Kidney Disease (CKD). Available online: https://www.kidneyfund.org/all-about-kidneys/stages-kidney-disease (accessed on 30 May 2024).
- Irazabal, M.; Rangel, L.; Bergstralh, E.; Osborn, S.; Harmon, A.; Sundsbak, J.; Bae, K.; Chapman, A.; Grantham, J.; Mrug, M.; et al. Imaging classification of autosomal dominant polycystic kidney disease: A simple model for selecting patients for clinical trials. J. Am. Soc. Nephrol. 2015, 26, 160–172. [Google Scholar] [CrossRef]
- Rastogi, A.; Ameen, K.; Al-Baghdadi, M.; Shaffer, K.; Nobakht, N.; Kamgar, M.; Lerma, E. Autosomal dominant polycystic kidney disease: Updated perspectives. Ther. Clin. Risk Manag. 2019, 15, 1041–1052. [Google Scholar] [CrossRef] [PubMed]
- Franklin, G.; Stephens, R.; Piracha, M.; Tiosano, S.; Lehouillier, F.; Koppel, R.; Elkin, P. The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective. Life 2024, 14, 652. [Google Scholar] [CrossRef] [PubMed]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
- He, H.; Garcia, E. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Panda, N. A review on logistic regression in medical research. Natl. J. Community Med. 2022, 13, 265–270. [Google Scholar] [CrossRef]
- Burges, C. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; NeurIPS: San Diego, CA, USA, 2017. [Google Scholar]
- Krauss, C.; Do, X.; Huck, N. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the SP 500. Eur. J. Oper. Res. 2017, 259, 689–702. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Zhou, Z. Machine Learning; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Nai, Y.; Teo, B.; Tan, N.; O’Doherty, S.; Stephenson, M.; Thian, Y.; Chiong, E.; Reilhac, A. Comparison of metrics for the evaluation of medical segmentations using prostate MRI dataset. Comput. Biol. Med. 2021, 134, 104497. [Google Scholar] [CrossRef] [PubMed]
- Müller, D.; Soto-Rey, I.; Kramer, F. Towards a guideline for evaluation metrics in medical image segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef] [PubMed]
- Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef] [PubMed]
- Lipton, Z.; Steinhardt, J. Why do tree based models still outperform deep learning on tabular data? arXiv 2019, arXiv:2207.08815. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv 2019, arXiv:1907.10902. [Google Scholar]
- Ribeiro, M.; Singh, S.; Guestrin, C. Model-Agnostic Interpretability of Machine Learning. arXiv 2016, arXiv:1606.05386. [Google Scholar]
- Ribeiro, M.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego, CA, USA, 12–17 June 2016. [Google Scholar]
- Lundberg, S.; Lee, S. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2020. [Google Scholar]
- Doshi-Velez, F.; Kim, B. Considerations for Evaluation and Generalization in Interpretable Machine Learning. In Explainable and Interpretable Models in Computer Vision and Machine Learning; Springer: Cham, Switzerland, 2018; pp. 3–17. [Google Scholar]
- Joshi, A.; Kale, S.; Chandel, S.; Pal, D. Likert scale: Explored and explained. Br. J. Appl. Sci. Technol. 2015, 7, 396–403. [Google Scholar] [CrossRef]
Process Stage | AI Researcher (the Author) | Domain Experts (Nephrology Doctor) | End Users (General Doctor) |
Business Understanding | Gathers requirements and defines the business problem from the AI perspective. | Provide input on medical requirements, domain knowledge, and expected outcomes. | - |
Data Understanding | Analyzes the dataset, identifies data quality issues, and ensures data are suitable for AI tasks. | Offer insights into interpreting the medical data and validating the features. | - |
Data Preparation | Cleans, transforms, and preprocesses the data for modeling. | Provide feedback on the clinical relevance of data transformations or feature engineering. | - |
Modeling | Builds and trains the AI model. Implements XAI. | Validate the medical accuracy of the model’s results and guide any medical assumptions. | - |
Evaluation | Evaluates the AI model’s performance and accommodates human evaluation. | Perform human evaluation using the proposed explainability matrix. | Perform human evaluation using the proposed explainability matrix. |
File Name/Num. of Rows/Num. of Attributes | Brief Description and Reasons Selection | Statistical Information |
DM/2498/29 | Demographics of patients as subject observations. Describe personal information as the reference for all tables. Age, sex, race, and ethnicity are used as a reference for subject observation. | All 2498 rows have a unique user id. |
PR/764/40 | List of the procedures that have been performed on the patients. The procedure is used because this might be related to how severe the patient is. | Only 326 patients (around 13%) have a procedure record. There are 24 intervention names that are categorized into 4 categories: liver, cyst, stone, and blood procedure. |
SU/7667/43 | Information about the substance used regularly by the patients. The daily substance may be associated with the patient’s daily lifestyle, potentially influencing its severity. | 65% of the patients have records (1623 of 2498 patients). There are five unique substance categories: alcohol, tobacco, water, decaffeine, and caffeine. |
CE/31449/37 | A domain for events containing clinically significant occurrences that are not adverse events. The clinical events describe the health conditions of patients. | 93% of patients have a CE record (2325 of 2498). There are 46 unique clinical event categorizations. One patient can have multiple clinical events (multiple rows). |
HO/1698/29 | Record the inpatient or outpatient event, such as a hospitalization or rehabilitation event, for the patient. Although there is not enough information about whether the hospitalization event is because of the ADPKD or not, this information might reflect the severity of the patients. | The inpatient event records only hospitalization. Only 34% of patients have a HO record (326 of 2498) |
MH/27464/37 | Record the patient’s prior medical history as reported. Medical history documents the patient’s complete range of problems and all the diagnoses that have been established. | Patient’s primary diagnosis is ADPKD. There are 39 unique medical history terms, such as hypertension, migraine, and so on. |
VS/308484/35 | Record the vital signs of patients, such as temperature, height, and weight. The latest height measurement is the only vstest used; these data are used to calculate hTKV (height-TKV) as a basis to define the high-risk kidney enlargement classification. | There are eight unique vital signs category recorded: body mass index, body surface area, diastolic blood pressure, heart rate, height, pulse rate, systolic blood pressure, and weight. |
FH/4929/29 | Specifically contain records of the family history of ESRD (end-stage renal disease) or ADPKD. Since ADPKD is an inherited disease, the affected relative information becomes important. | There are 1808 patients (72.3%) reported to have a family history of ESRD and/or ADPKD. There are 19 distinct subject categories, such as “FATHER” and “MOTHER”. |
LB/196527/51 | Record the patient’s laboratory test results. We do not use all of the test results; we only use data to calculate the estimated glomerular filtration rate (eGFR). The eGFR uses for assessing the presence and degree of renal disease. | There are 3 categories of lab tests (chemistry, hematology, and urinalysis) and 34 unique lab tests or examinations reported (such as creatinine, potassium, and so on). |
MP/29678/42 | Specifically record the kidney measurements such as volume, width, depth, and mass. The kidney measurement is needed to calculate the total kidney volume (TKV). TKV is an essential calculation to define the patient’s high-risk kidney enlargement profile. | 95% of patients (2371 of 2498) have a kidney measurement record. The object is measured and categorized as right kidney, left kidney, or bilateral (both kidneys). The method of measurement is either ultrasound, MRI, or CT. |
Variables | n | % |
Age | ||
15–20 | 128 | 7.23% |
21–30 | 312 | 17.63% |
31–40 | 473 | 26.72% |
41–50 | 489 | 27.63% |
51–60 | 250 | 14.12% |
61–70 | 91 | 5.14% |
71–80 | 22 | 1.24% |
81–85 | 5 | 0.28% |
Sex | ||
Female | 1095 | 61.86% |
Male | 675 | 38.14% |
Ethnic | ||
Not Hispanic or Latino | 741 | 41.86% |
Hispanic or Latino | 23 | 1.30% |
Not Reported | 1006 | 56.84% |
Race | ||
White | 1543 | 87.18% |
Black or African American | 46 | 2.60% |
Asian | 14 | 0.79% |
American Indian or Alaska Native | 9 | 0.51% |
Native Hawaiian | 9 | 0.51% |
Other | 22 | 1.24% |
Unknown | 135 | 7.63% |
Set | Feature 1 |
Set 2 (34 attributes) | ‘abdominal_pain’, ‘abdomen_protube’, ‘affacted_father’, ‘affacted_mother’, ‘affacted_sibling’, ‘anemia’, ‘back_pain’, ‘caffeine’, ‘ckd_classification_encoded’, ‘changes_in_appetite’, ‘consume_alcohol’, ‘consume_tobacco’, ‘dthfl_encoded’, ‘ethnic_encoded’, ‘event_cardiac_arrhythmia’, ‘event_cardiac_valve_disease’, ‘event_diverticulosis’, ‘event_end_stage_renal_disease’, ‘event_gross_hematuria’, ‘event_kidney_cyst_hemorrhage’, ‘event_severe_headache’, ‘event_symptomatic_nephrolithiasis’, ‘event_urinary_tract_infection’, ‘fatique’, ‘flank_pain’, ‘headache’, ‘hospitalization’, ‘hypertension’, ‘liver_procedure’, ‘migraine’, ‘mood_changes’, ‘nausea’, ‘race_encoded’, ‘sex_encoded’ |
Set 3 (33 attributes) | ‘abdominal_pain’, ‘abdomen_protube’, ‘affacted_father’, ‘anemia’, ‘anorexia’, ‘back_pain’, ‘blood_procedure’, ‘ckd_classification_encoded’, ‘consume_tobacco’, ‘diarrhea’, ‘ethnic_encoded’, ‘event_aneurysm’, ‘event_cardiac_arrhythmia’, ‘event_edema’, ‘event_end_stage_renal_disease’, ‘event_exertional_chest_pain’, ‘event_gross_hematuria’, ‘event_hepatic_venous_outflow_obstruction’, ‘event_inguinal_hernia’, ‘event_intracranial_aneurysm’, ‘event_nocturia’, ‘event_non_exertional_chest_pain’, ‘event_shortness_of_breath_at_rest’, ‘event_umbilical_hernia’, ‘flank_pain’, ‘hospitalization’, ‘hypertension’, ‘insomnia’, ‘liver_procedure’, ‘migraine’, ‘nausea’, ‘sex_encoded’, ‘stone_procedure’ |
Set Attributes | Training Algorithm | Imbalanced Method | Accuracy | Precision | Recall | F1 | AUC |
Set 3 1 | XGBoost | SMOTE | 0.715 | 0.682 | 0.702 | 0.687 | 0.702 |
Set 3 | SVM | SMOTE | 0.681 | 0.672 | 0.699 | 0.666 | 0.699 |
Set 3 | LightGBM | SMOTE | 0.703 | 0.672 | 0.691 | 0.676 | 0.691 |
Set 3 | SVM | ROS | 0.652 | 0.662 | 0.688 | 0.643 | 0.688 |
Set 2 | GBT | ROS | 0.678 | 0.661 | 0.687 | 0.660 | 0.687 |
Set 3 | GBT | ROS | 0.670 | 0.660 | 0.686 | 0.654 | 0.686 |
Set 3 | LR | ROS | 0.675 | 0.661 | 0.685 | 0.657 | 0.685 |
Set 3 | LightGBM | ROS | 0.701 | 0.667 | 0.684 | 0.671 | 0.684 |
Set 3 | DNN | ROS | 0.732 | 0.717 | 0.732 | 0.719 | 0.689 |
Set 3 | RF | SMOTE | 0.706 | 0.668 | 0.681 | 0.672 | 0.681 |
Hyperparameter Tuning | Accuracy | Precision | Recall | F1 | AUC |
’lambda’: 3.9957070357300086, ‘alpha’: 2.799141734671868, ‘max_depth’: 7, ‘eta’: 0.08770989189740201, ‘gamma’: 0.1671936607245508, ‘colsample_bytree’: 0.9400135473143985, ‘min_child_weight’: 9 | 0.709 | 0.686 | 0.712 | 0.689 | 0.712 |
Model | Trustworthiness | Causality | Transferability | Informativeness | Confidence |
NoXAI | 3.15 | 2.85 1 | 3.18 | 2.22 1 | 3.00 |
LIME | 3.22 | 3.26 | 3.37 | 3.41 | 3.04 |
LIME + Text | 3.41 | 3.52 | 3.48 | 3.67 | 3.30 |
SHAP | 3.40 | 3.26 | 3.26 | 3.33 | 3.19 |
SHAP + Text 2 | 3.70 | 3.78 | 3.63 | 3.82 | 3.48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dwiyanti, L., on behalf of PKDOC; Nambo, H.; Hamid, N. Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD). AI 2024, 5, 2037-2065. https://doi.org/10.3390/ai5040100
Dwiyanti L on behalf of PKDOC, Nambo H, Hamid N. Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD). AI. 2024; 5(4):2037-2065. https://doi.org/10.3390/ai5040100
Chicago/Turabian StyleDwiyanti, Latifa on behalf of PKDOC, Hidetaka Nambo, and Nur Hamid. 2024. "Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD)" AI 5, no. 4: 2037-2065. https://doi.org/10.3390/ai5040100
APA StyleDwiyanti, L., on behalf of PKDOC, Nambo, H., & Hamid, N. (2024). Leveraging Explainable Artificial Intelligence (XAI) for Expert Interpretability in Predicting Rapid Kidney Enlargement Risks in Autosomal Dominant Polycystic Kidney Disease (ADPKD). AI, 5(4), 2037-2065. https://doi.org/10.3390/ai5040100