A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework
Abstract
:1. Introduction
2. Related Work
3. Methodology
4. Taxonomy of LLMs in Healthcare
5. Threats and Vulnerabilities of LLMs in Healthcare
5.1. Data Exfiltration from LLMs
5.2. Data Manipulation in LLMs
5.3. Bypassing Security Measures in LLMs
5.4. Model Manipulation in LLMs
5.5. Availability Attacks in LLMs
5.6. Attacks on User Request/Responses
6. Secure Framework for Implementing LLMs in Healthcare
7. Open Research Challenges
7.1. Misinformation
7.2. Resource Implications
7.3. Bias and Fairness in Healthcare LLMs
7.4. Interpretability and Transparency
7.5. Integration with Clinical Workflows
7.6. Ethical Considerations and Patient Privacy
7.7. Clinical Validation and Real-World Performance
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
- Liu, Y.; Han, T.; Ma, S.; Zhang, J.; Yang, Y.; Tian, J.; He, H.; Li, A.; He, M.; Liu, Z.; et al. Summary of ChatGPT-Related Research and Perspective towards the Future of Large Language Models. Meta-Radiol. 2023, 1, 100017. [Google Scholar] [CrossRef]
- Microsoft Research. Microsoft the New Bing: Our Approach to Responsible AI; Microsoft Research: Redmond, WA, USA, 2023. [Google Scholar]
- Cascella, M.; Montomoli, J.; Bellini, V.; Bignami, E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J. Med. Syst. 2023, 47, 33. [Google Scholar] [CrossRef] [PubMed]
- Rao, A.; Kim, J.; Kamineni, M.; Pang, M.; Lie, W.; Dreyer, K.J.; Succi, M.D. Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot. J. Am. Coll. Radiol. 2023, 20, 990–997. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Ali, S.R.; Dobbs, T.D.; Hutchings, H.A.; Whitaker, I.S. Using ChatGPT to Write Patient Clinic Letters. Lancet Digit. Health 2023, 5, e179–e181. [Google Scholar] [CrossRef]
- Patel, S.B.; Lam, K. ChatGPT: The Future of Discharge Summaries? Lancet Digit. Health 2023, 5, e107–e108. [Google Scholar] [CrossRef]
- Yang, X.; Chen, A.; PourNejatian, N.; Shin, H.C.; Smith, K.E.; Parisien, C.; Compas, C.; Martin, C.; Costa, A.B.; Flores, M.G.; et al. A Large Language Model for Electronic Health Records. NPJ Digit. Med. 2022, 5, 194. [Google Scholar] [CrossRef]
- Arora, A.; Arora, A. The Promise of Large Language Models in Health Care. Lancet 2023, 401, 641. [Google Scholar] [CrossRef]
- Straw, I.; Callison-Burch, C. Artificial Intelligence in Mental Health and the Biases of Language Based Models. PLoS ONE 2020, 15, e0240376. [Google Scholar] [CrossRef]
- Coventry, L.; Branley, D. Cybersecurity in Healthcare: A Narrative Review of Trends, Threats and Ways Forward. Maturitas 2018, 113, 48–52. [Google Scholar] [CrossRef] [PubMed]
- Ahn, C. Exploring ChatGPT for Information of Cardiopulmonary Resuscitation. Resuscitation 2023, 185, 109729. [Google Scholar] [CrossRef] [PubMed]
- D’Amico, R.S.; White, T.G.; Shah, H.A.; Langer, D.J. I Asked a ChatGPT to Write an Editorial About How We Can Incorporate Chatbots Into Neurosurgical Research and Patient Care…. Neurosurgery 2023, 92, 663–664. [Google Scholar] [CrossRef]
- Vaishya, R.; Misra, A.; Vaish, A. ChatGPT: Is This Version Good for Healthcare and Research? Diabetes Metab. Syndr. Clin. Res. Rev. 2023, 17, 102744. [Google Scholar] [CrossRef] [PubMed]
- Pan, X.; Zhang, M.; Ji, S.; Yang, M. Privacy Risks of General-Purpose Language Models. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1314–1331. [Google Scholar]
- Hügle, T. The Wide Range of Opportunities for Large Language Models Such as ChatGPT in Rheumatology. RMD Open 2023, 9, e003105. [Google Scholar] [CrossRef]
- Weidinger, L.; Mellor, J.; Rauh, M.; Griffin, C.; Uesato, J.; Huang, P.-S.; Cheng, M.; Glaese, M.; Balle, B.; Kasirzadeh, A.; et al. Ethical and Social Risks of Harm from Language Models. arXiv 2021, arXiv:2112.04359. [Google Scholar]
- Weidinger, L.; Uesato, J.; Rauh, M.; Griffin, C.; Huang, P.-S.; Mellor, J.; Glaese, A.; Cheng, M.; Balle, B.; Kasirzadeh, A.; et al. Taxonomy of Risks Posed by Language Models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022. [Google Scholar]
- Brown, H.; Lee, K.; Mireshghallah, F.; Shokri, R.; Tramèr, F. What Does It Mean for a Language Model to Preserve Privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022. [Google Scholar]
- Holzinger, A.; Keiblinger, K.; Holub, P.; Zatloukal, K.; Müller, H. AI for Life: Trends in Artificial Intelligence for Biotechnology. New Biotechnol. 2023, 74, 16–24. [Google Scholar] [CrossRef]
- Sharma, G.; Thakur, A. ChatGPT in Drug Discovery. 2023. Available online: https://chemrxiv.org/engage/chemrxiv/article-details/63d56c13ae221ab9b240932f (accessed on 8 March 2024).
- Eggmann, F.; Weiger, R.; Zitzmann, N.U.; Blatz, M.B. Implications of Large Language Models Such as ChatGPT for Dental Medicine. J. Esthet. Restor. Dent. 2023, 35, 1098–1102. [Google Scholar] [CrossRef]
- Greshake, K.; Abdelnabi, S.; Mishra, S.; Endres, C.; Holz, T.; Fritz, M. Not What You’ve Signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. arXiv 2023, arXiv:2302.12173. [Google Scholar]
- Harrer, S. Attention Is Not All You Need: The Complicated Case of Ethically Using Large Language Models in Healthcare and Medicine. eBioMedicine 2023, 90, 104512. [Google Scholar] [CrossRef]
- Li, J.; Dada, A.; Puladi, B.; Kleesiek, J.; Egger, J. ChatGPT in Healthcare: A Taxonomy and Systematic Review. Comput. Methods Programs Biomed. 2024, 245, 108013. [Google Scholar] [CrossRef] [PubMed]
- Meskó, B.; Topol, E.J. The Imperative for Regulatory Oversight of Large Language Models (or Generative AI) in Healthcare. NPJ Digit. Med. 2023, 6, 120. [Google Scholar] [CrossRef] [PubMed]
- Derner, E.; Batistič, K.; Zahálka, J.; Babuška, R. A Security Risk Taxonomy for Large Language Models. arXiv 2023, arXiv:2311.11415. [Google Scholar]
- He, K.; Mao, R.; Lin, Q.; Ruan, Y.; Lan, X.; Feng, M.; Cambria, E. A Survey of Large Language Models for Healthcare: From Data, Technology, and Applications to Accountability and Ethics. arXiv 2023, arXiv:2310.05694. [Google Scholar]
- Chen, J.; Guo, H.; Yi, K.; Li, B.; Elhoseiny, M. VisualGPT: Data-Efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining. CoRR 2021, abs/2102.10407. [Google Scholar] [CrossRef]
- Wang, S.; Zhao, Z.; Ouyang, X.; Wang, Q.; Shen, D. ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image Using Large Language Models. arXiv 2023, arXiv:2302.07257. [Google Scholar]
- Li, C.; Zhang, Y.; Weng, Y.; Wang, B.; Li, Z. Natural Language Processing Applications for Computer-Aided Diagnosis in Oncology. Diagnostics 2023, 13, 286. [Google Scholar] [CrossRef]
- Omoregbe, N.A.I.; Ndaman, I.O.; Misra, S.; Abayomi-Alli, O.O.; Damaševičius, R. Text Messaging-Based Medical Diagnosis Using Natural Language Processing and Fuzzy Logic. J. Healthc. Eng. 2020, 2020, 8839524. [Google Scholar] [CrossRef]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and Other Large Language Models Are Double-Edged Swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef]
- Abd-Alrazaq, A.A.; Alajlani, M.; Ali, N.; Denecke, K.; Bewick, B.M.; Househ, M. Perceptions and Opinions of Patients About Mental Health Chatbots: Scoping Review. J. Med. Internet Res. 2021, 23, e17828. [Google Scholar] [CrossRef]
- Ji, S.; Zhang, T.; Ansari, L.; Fu, J.; Tiwari, P.; Cambria, E. MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare. arXiv 2021, arXiv:2110.15621. [Google Scholar]
- Carlini, N.; Liu, C.; Erlingsson, Ú.; Kos, J.; Song, D. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. In Proceedings of the 28th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019. [Google Scholar]
- Nasr, M.; Shokri, R.; Houmansadr, A. Comprehensive Privacy Analysis of Deep Learning: Stand-Alone and Federated Learning under Passive and Active White-Box Inference Attacks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2019. [Google Scholar]
- Shokri, R.; Stronati, M.; Shmatikov, V. Membership Inference Attacks against Machine Learning Models. CoRR 2016, abs/1610.05820. [Google Scholar]
- Jagannatha, A.; Rawat, B.P.S.; Yu, H. Membership Inference Attack Susceptibility of Clinical Language Models. CoRR 2021, abs/2104.08305. [Google Scholar]
- Oh, M.G.; Hyun Park, L.; Kim, J.; Park, J.; Kwon, T. Membership Inference Attacks With Token-Level Deduplication on Korean Language Models. IEEE Access 2023, 11, 10207–10217. [Google Scholar] [CrossRef]
- Zhang, R.; Hidano, S.; Koushanfar, F. Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers. arXiv 2022, arXiv:2209.10505. [Google Scholar]
- Zhu, T.; Ye, D.; Zhou, S.; Liu, B.; Zhou, W. Label-Only Model Inversion Attacks: Attack With the Least Information. IEEE Trans. Inf. Forensics Secur. 2023, 18, 991–1005. [Google Scholar] [CrossRef]
- Guo, S.; Xie, C.; Li, J.; Lyu, L.; Zhang, T. Threats to Pre-Trained Language Models: Survey and Taxonomy. arXiv 2022, arXiv:2202.06862. [Google Scholar]
- Li, S.; Liu, H.; Dong, T.; Zhao, B.Z.H.; Xue, M.; Zhu, H.; Lu, J. Hidden Backdoors in Human-Centric Language Models. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 3123–3140. [Google Scholar]
- Yan, J.; Yadav, V.; LI, S.; Chen, L.; Tang, Z.; Wang, H.; Srinivasan, V.; Ren, X.; Jin, H. Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection. In Proceedings of the 4th American Chapter of the Association for Computational Linguistics, Mexico City, Mexico, 16–21 June 2024. [Google Scholar]
- Lapid, R.; Langberg, R.; Sipper, M. Open Sesame! Universal Black Box Jailbreaking of Large Language Models. arXiv 2023, arXiv:2309.01446. [Google Scholar] [CrossRef]
- Shen, X.; Chen, Z.J.; Backes, M.; Shen, Y.; Zhang, Y. “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models. arXiv 2023, arXiv:2308.03825. [Google Scholar]
- Roman Samoilenko. New Prompt Injection Attack on ChatGPT Web Version. Markdown Images Can Steal Your Chat Data Web Page. Available online: https://systemweakness.com/new-prompt-injection-attack-on-chatgpt-web-version-ef717492c5c2 (accessed on 29 March 2023).
- Heidenreich, H.S.; Williams, J.R. The Earth Is Flat and the Sun Is Not a Star: The Susceptibility of GPT-2 to Universal Adversarial Triggers. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Virtual, 19–21 May 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 566–573. [Google Scholar]
- Karabacak, M.; Margetis, K. Embracing Large Language Models for Medical Applications: Opportunities and Challenges. Cureus 2023, 15, e39305. [Google Scholar] [CrossRef]
- Ferdush, J.; Begum, M.; Hossain, S.T. ChatGPT and Clinical Decision Support: Scope, Application, and Limitations. Ann. Biomed. Eng. 2024, 52, 1119–1124. [Google Scholar] [CrossRef] [PubMed]
- Nazi, Z.A.; Peng, W. Large Language Models in Healthcare and Medical Domain: A Review. Informatics 2024, 11, 57. [Google Scholar] [CrossRef]
- Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in Medicine: An Overview of Its Applications, Advantages, Limitations, Future Prospects, and Ethical Considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef]
- Hossain, E.; Rana, R.; Higgins, N.; Soar, J.; Barua, P.D.; Pisani, A.R.; Turner, K. Natural Language Processing in Electronic Health Records in Relation to Healthcare Decision-Making: A Systematic Review. Comput. Biol. Med. 2023, 155, 106649. [Google Scholar] [CrossRef]
- Sezgin, E. Artificial Intelligence in Healthcare: Complementing, Not Replacing, Doctors and Healthcare Providers. Digit. Health 2023, 9, 20552076231186520. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Vaidyam, A.N.; Wisniewski, H.; Halamka, J.D.; Kashavan, M.S.; Torous, J.B. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can. J. Psychiatry 2019, 64, 456–464. [Google Scholar] [CrossRef]
- Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M.; et al. Opinion Paper: “So What If ChatGPT Wrote It?” Multidisciplinary Perspectives on Opportunities, Challenges and Implications of Generative Conversational AI for Research, Practice and Policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
- Agbavor, F.; Liang, H. Predicting Dementia from Spontaneous Speech Using Large Language Models. PLoS Digit. Health 2022, 1, e0000168. [Google Scholar] [CrossRef]
- Wong, C.; Zhang, S.; Gu, Y.; Moung, C.; Abel, J.; Usuyama, N.; Weerasinghe, R.; Piening, B.; Naumann, T.; Bifulco, C.; et al. Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology. In Proceedings of the 8th Machine Learning for Healthcare Conference, New York, NY, USA, 11–12 August 2023. [Google Scholar]
- Hirosawa, T.; Harada, Y.; Yokose, M.; Sakamoto, T.; Kawamura, R.; Shimizu, T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int. J. Environ. Res. Public Health 2023, 20, 3378. [Google Scholar] [CrossRef]
- Olaronke, I.; Olaleke, J. A Systematic Review of Natural Language Processing in Healthcare. Int. J. Inf. Technol. Comput. Sci. 2015, 8, 44–50. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hamid, R.; Brohi, S. A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework. Big Data Cogn. Comput. 2024, 8, 161. https://doi.org/10.3390/bdcc8110161
Hamid R, Brohi S. A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework. Big Data and Cognitive Computing. 2024; 8(11):161. https://doi.org/10.3390/bdcc8110161
Chicago/Turabian StyleHamid, Rida, and Sarfraz Brohi. 2024. "A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework" Big Data and Cognitive Computing 8, no. 11: 161. https://doi.org/10.3390/bdcc8110161
APA StyleHamid, R., & Brohi, S. (2024). A Review of Large Language Models in Healthcare: Taxonomy, Threats, Vulnerabilities, and Framework. Big Data and Cognitive Computing, 8(11), 161. https://doi.org/10.3390/bdcc8110161