Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI
Abstract
:1. Introduction
- RQ1: How reliable are evaluation metrics such as accuracy in assessing machine learning model performance in making important predictions?
- RQ2: How well do the explainable AI techniques SHAP and LIME complement conventional evaluation metrics?
- RQ3: How do the various machine learning algorithms compare when it comes to the explainability of their outcomes?
1.1. Contribution
1.2. Paper Organization
2. Materials and Methods
2.1. Dataset
2.2. LIME: Local Interpretable Model-Agnostic Explanations
- Generate perturbations: A set of perturbed versions of the original data point is created by slightly modifying its features. This simulates how changes in input features might affect the model’s prediction.
- Query the black box: The model’s predictions for each perturbed data point are obtained.
- Train the surrogate model: A local surrogate model, such as a linear regression, is fit to the generated data and corresponding predictions. This model aims to mimic the black box’s behavior in the vicinity of the original data point.
- Explain the prediction: The weights or coefficients of the trained surrogate model represent the contributions of each feature to the model’s output. These weights are interpreted as the explanation for the original prediction.
- is the prediction of the surrogate model for an input data point x.
- is the intercept.
- are the coefficients for each feature .
2.3. SHAP: Shapley Additive Explanations
- represents the SHAP value for feature xi.
- S is a subset of features S’ excluding xi.
- f(S) is the model’s prediction using only features in S.
- n is the number of features.
- can be considered a weighing factor.
- g is a link function such as a sigmoid for binary classification
3. Results
3.1. Logistic Regression
Analysis of Results for SHAP and LIME
3.2. Explanations Based on the K-Nearest Neighbors Algorithm
SHAP and LIME Analysis of K-NN Model Performance
3.3. Explanations Based on the Decision Tree Algorithm
SHAP and LIME Analysis of the Decision Tree Model’s Performance
3.4. Explanations Based on the Random Forest Algorithm
SHAP and LIME Analysis of the Random Forest Model’s Performance
3.5. Explanations Based on the Gradient Boosting Algorithm
SHAP and LIME Analysis of the Gradient Boosting Model’s Performance
3.6. Explanations Based on the AdaBoost Algorithm
SHAP and LIME Analysis of the AdaBoost Model’s Performance
3.7. Explanations Based on the Stochastic Gradient Descent Classifier Algorithm
SHAP and LIME Analysis of the SGD Classifier Model Performance
3.8. Explanations Based on the Naive Bayes Algorithm
SHAP and LIME Analysis of the Naive Bayes Model’s Performance
3.9. Explanations Based on the Support Vector Machine Algorithm
SHAP and LIME Analysis of the SVM Model’s Performance
3.10. Explanations Based on the XGBoost Algorithm
SHAP and LIME Analysis of the XGBoost Model’s Performance
3.11. Explanations Based on the LightGBM Algorithm
SHAP and LIME Analysis of the LightGBM Model’s Performance
4. Discussion
- RQ1: How reliable are evaluation metrics such as accuracy in assessing the machine learning model performance in making important predictions?The experiments demonstrate that the evaluation metrics fall short in their trustworthiness. The metrics are high even when the models relied on an unsound ranking of the features.
- RQ2: How well do the explainable AI techniques SHAP and LIME complement conventional evaluation metrics?The attribution techniques SHAP and LIME can add corroboratory evidence of the model’s performance. The results from the experiment show the complementary nature of the plots from the explainability methods and the evaluation metrics. LIME is substantially impacted by the choice of hyperparameters and may not fully comply with some legal requirements [36]. However, for the experiments described in this paper, the profile of LIME explanations was mostly consistent with the explanations from SHAP. A summary metric for the explainability aspects may further enhance the utility of these methods.
- RQ3: How do the various machine learning algorithms compare when it comes to the explainability of their outcomes?The majority of the machine learning algorithms behave quite similarly with respect to the explainability of the outcomes. However, some algorithms like SGD Classifier perform poorly in terms of the explainability of their outcomes, while Naive Bayes performs slightly better.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Open Sourcing Mental Illness. OSMI Mental Health in Tech Survey; Open Sourcing Mental Illness: West Lafayette, IN, USA, 2014. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Interpretable Machine Learning: A Unified Approach based on Proximity Measures. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 997–1005. [Google Scholar]
- Lundberg, S.M.; Lee, S.I.; Erion, A.; Johnson, M.J.; Vennekamp, P.; Bengio, Y. A Unified Approach to Interpretable Explanatory Modeling. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 4769–4777. [Google Scholar]
- Sujal, B.; Neelima, K.; Deepanjali, C.; Bhuvanashree, P.; Duraipandian, K.; Rajan, S.; Sathiyanarayanan, M. Mental health analysis of employees using machine learning techniques. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India, 4–8 January 2022; pp. 1–6. [Google Scholar]
- Mitravinda, K.; Nair, D.S.; Srinivasa, G. Mental Health in Tech: Analysis of Workplace Risk Factors and Impact of COVID-19. SN Comput. Sci. 2023, 4, 197. [Google Scholar] [CrossRef] [PubMed]
- Li, Y. Application of Machine Learning to Predict Mental Health Disorders and Interpret Feature Importance. In Proceedings of the 2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS), Chengdu, China, 7–9 July 2023; pp. 257–261. [Google Scholar]
- Vorbeck, J.; Gomez, C. Algorithms and Anxiety: An Investigation of Mental Health in Tech. MA Data Anal. Appl. Soc. Res. 2020. [Google Scholar]
- Baptista, M.L.; Goebel, K.; Henriques, E.M. Relation between prognostics predictor evaluation metrics and local interpretability SHAP values. Artif. Intell. 2022, 306, 103667. [Google Scholar] [CrossRef]
- Ratul, Q.E.A.; Serra, E.; Cuzzocrea, A. Evaluating attribution methods in machine learning interpretability. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5239–5245. [Google Scholar]
- Hu, B.; Tunison, P.; Vasu, B.; Menon, N.; Collins, R.; Hoogs, A. XAITK: The explainable AI toolkit. Appl. AI Lett. 2021, 2, e40. [Google Scholar] [CrossRef]
- Ueda, D.; Kakinuma, T.; Fujita, S.; Kamagata, K.; Fushimi, Y.; Ito, R.; Matsui, Y.; Nozaki, T.; Nakaura, T.; Fujima, N.; et al. Fairness of artificial intelligence in healthcare: Review and recommendations. Jpn. J. Radiol. 2024, 42, 3–15. [Google Scholar] [CrossRef] [PubMed]
- Kerz, E.; Zanwar, S.; Qiao, Y.; Wiechmann, D. Toward explainable AI (XAI) for mental health detection based on language behavior. Front. Psychiatry 2023, 14, 1219479. [Google Scholar] [CrossRef] [PubMed]
- Band, S.S.; Yarahmadi, A.; Hsu, C.C.; Biyari, M.; Sookhak, M.; Ameri, R.; Dehzangi, I.; Chronopoulos, A.T.; Liang, H.W. Application of explainable artificial intelligence in medical health: A systematic review of interpretability methods. Inform. Med. Unlocked 2023, 40, 101286. [Google Scholar] [CrossRef]
- Srinivasu, P.N.; Sirisha, U.; Sandeep, K.; Praveen, S.P.; Maguluri, L.P.; Bikku, T. An Interpretable Approach with Explainable AI for Heart Stroke Prediction. Diagnostics 2024, 14, 128. [Google Scholar] [CrossRef] [PubMed]
- Rahmatinejad, Z.; Dehghani, T.; Hoseini, B.; Rahmatinejad, F.; Lotfata, A.; Reihani, H.; Eslami, S. A comparative study of explainable ensemble learning and logistic regression for predicting in-hospital mortality in the emergency department. Sci. Rep. 2024, 14, 3406. [Google Scholar] [CrossRef] [PubMed]
- Pendyala, V.S.; Kim, H. Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems. In Proceedings of the 2023 International Conference on Computer, Electrical & Communication Engineering (ICCECE), Kolkata, India, 20–21 January 2023; pp. 1–7. [Google Scholar]
- Arslan, Y.; Lebichot, B.; Allix, K.; Veiber, L.; Lefebvre, C.; Boytsov, A.; Goujon, A.; Bissyandé, T.F.; Klein, J. Towards refined classifications driven by shap explanations. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Vienna, Austria, 23–26 August 2022; pp. 68–81. [Google Scholar]
- Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–8 February 2020; pp. 180–186. [Google Scholar]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
- Viswan, V.; Shaffi, N.; Mahmud, M.; Subramanian, K.; Hajamohideen, F. Explainable artificial intelligence in Alzheimer’s disease classification: A systematic review. Cogn. Comput. 2024, 16, 1–44. [Google Scholar] [CrossRef]
- Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Del Ser, J.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable artificial intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inform. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
- Cover, T.M. Patterns in random sequences. IEEE Trans. Inf. Theory 1967, 13, 1–14. [Google Scholar]
- Quinlan, J.R. A decision tree method for the identification of antibacterial active compounds. In Proceedings of the International Conference on Machine Learning, Skytop, PA, USA, 24–26 June 1986; pp. 248–257. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A general boosting framework. Ann. Stat. 2002, 38, 1183–1232. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Le, C.; Yann, A. Stochastic gradient descent training for large-scale online linear classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 248–256. [Google Scholar]
- Russell, S.W. The Bayesian approach to automatic speech recognition. IEEE Trans. Speech Audio Process. 1995, 16, 227–252. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGboost: A scalable system for state-of-the-art gradient boosting. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3146–3154. [Google Scholar]
- Shapley, L.S. A Value for N-Person Games; Princeton University Press: Princeton, NY, USA, 1953. [Google Scholar]
- Uddin, M.M.; Farjana, A.; Mamun, M.; Mamun, M. Mental health analysis in tech workplace. In Proceedings of the 7th North American International Conference on Industrial Engineering and Operations Management, Orlando, FL, USA, 12–14 June 2022; pp. 12–14. [Google Scholar]
- Bajaj, V.; Bathija, R.; Megnani, C.; Sawara, J.; Ansari, N. Non-Invasive Mental Health Prediction using Machine Learning: An Exploration of Algorithms and Accuracy. In Proceedings of the 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Orlando, FL, USA, 12–14 June 2023; pp. 313–321. [Google Scholar]
- Molnar, C. Interpretable Machine Learning; Independently Published. 2020. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 21 August 2023).
Evaluation Metrics | ||||
---|---|---|---|---|
Models | Precision | Recall | F1 score | Accuracy |
LR | 0.8279 | 0.8944 | 0.8599 | 0.8456 |
KNN | 0.7428 | 0.5226 | 0.6135 | 0.6534 |
DT | 0.7949 | 0.9547 | 0.8675 | 0.8465 |
RF | 0.7857 | 0.8844 | 0.8321 | 0.8121 |
GB | 0.8364 | 0.8994 | 0.8668 | 0.8544 |
AdaBoost | 0.9206 | 0.2914 | 0.4427 | 0.6137 |
SGD | 0.9206 | 0.2914 | 0.4427 | 0.6137 |
NB | 0.7483 | 0.5678 | 0.6457 | 0.6719 |
SVM | 0.7949 | 0.9547 | 0.8675 | 0.8465 |
XGB | 0.8146 | 0.8391 | 0.8267 | 0.8148 |
LGBM | 0.8240 | 0.8944 | 0.8578 | 0.8439 |
Model | Hyperparameters |
---|---|
LR | max iter: 1000 |
KNN | n neighbors: 5 |
DT | criterion: “gini”, max depth: 4 |
RF | n estimators: 200, max depth: 12, min samples leaf: 6, min samples split: 20 |
GB | n estimators: 50, learning rate: 1.0, max depth: 1 |
AdaBoost | n estimators: 50 |
SGD | max iter: 1000, loss: ‘log loss’, tol: 1 × 10−3 |
NB | none |
SVM | kernel: “linear”, probability: True |
XGB | none |
LGBM | num round = 10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pendyala, V.; Kim, H. Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI. Electronics 2024, 13, 1025. https://doi.org/10.3390/electronics13061025
Pendyala V, Kim H. Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI. Electronics. 2024; 13(6):1025. https://doi.org/10.3390/electronics13061025
Chicago/Turabian StylePendyala, Vishnu, and Hyungkyun Kim. 2024. "Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI" Electronics 13, no. 6: 1025. https://doi.org/10.3390/electronics13061025
APA StylePendyala, V., & Kim, H. (2024). Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI. Electronics, 13(6), 1025. https://doi.org/10.3390/electronics13061025