Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses
Abstract
:1. Introduction
2. Related Work
2.1. Advancements in LLM Capabilities for Healthcare Applications
2.2. Challenges in Integrating LLMs into Healthcare
2.3. LLMs in Clinical Trials and Patient Monitoring
2.4. Applications and Limitations of LLMs in Diagnostics
2.5. Ethical and Technical Considerations for LLM Deployment
3. Method
3.1. Research Strategy
3.2. Description of the LLMs Evaluated
3.3. Data Collection Methods
3.4. Evaluation Metrics for Diagnosing Diseases Through LLMs
- True positive (TP): instances where the LLM correctly identified the disease, showcasing the model’s ability to accurately match symptom descriptions with the correct disease diagnosis.
- False positive (FP): instances where the LLM incorrectly identified a disease, attributing a condition to the symptom descriptions that did not align with the actual disease present, thereby overestimating the model’s diagnostic accuracy.
- False negative (FN):instances where the LLM either attributed a different disease than the one actually present based on the symptom descriptions or failed to recognize the presence of a disease altogether, thereby underestimating the model’s diagnostic sensitivity.
- Precision: this metric evaluates the exactness of the model’s positive predictions (i.e., the proportion of TP observations among all positive diagnoses made by the model), offering insight into the accuracy of the model’s disease identification.
- Recall: this metric assesses the model’s ability to identify all pertinent instances (i.e., the ratio of TP observations to all actual positives within the dataset), providing a measure of the model’s comprehensiveness in disease detection.
- F1 Score: This metric serves as a balanced measure of both precision and recall, particularly valuable when the contributions of both metrics are of equal importance. It is calculated as the harmonic mean of precision and recall, furnishing a singular measure of the model’s overall diagnostic performance.
4. Results
4.1. Overview of Findings
4.2. Comparative Analysis
5. Discussion
5.1. Interpretation of Results
5.2. Enhancing Diagnostic Processes with Large Language Models
5.3. Limitations of the Study
5.4. Future Research Directions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
LLM | Large Language Model |
GPT | Generative Pre-trained Transformer |
HIPAA | Health Insurance Portability and Accountability Act |
NLP | Natural Language Processing |
AI | Artificial Intelligence |
CDC | Centers for Disease Control and Prevention |
WHO | World Health Organization |
EHR | Electronic Health Records |
FP | False positive |
TP | True positive |
FN | False negative |
References
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large Language Models: A Survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Singh, A. Exploring Language Models: A Comprehensive Survey and Analysis. In Proceedings of the 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 1–2 November 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Hadi, M.U.; Al Tashi, Q.; Qureshi, R.; Shah, A.; Muneer, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Hassan, S.Z.; et al. Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. TechRxiv 2024. [Google Scholar] [CrossRef]
- OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Choudhury, A.; Chaudhry, Z. Large Language Models and User Trust: Focus on Healthcare. J. Med. Internet Res. 2024, 26, e56764. [Google Scholar] [CrossRef]
- Team, G. Gemini: A Family of Highly Capable Multimodal Models. arXiv 2024, arXiv:2312.11805. [Google Scholar]
- Webster, P. Six Ways Large Language Models are Changing Healthcare. Nat. Med. 2023, 29, 2969–2971. [Google Scholar] [CrossRef]
- Peng, W.; Feng, Y.; Yao, C.; Zhang, S.; Zhuo, H.; Qiu, T.; Zhang, Y.; Tang, J.; Gu, Y.; Sun, Y. Evaluating AI in Medicine: A Comparative Analysis of Expert and ChatGPT Responses to Colorectal Cancer Questions. Sci. Rep. 2024, 14, 2840. [Google Scholar] [CrossRef]
- Cui, H.; Fang, X.; Xu, R.; Kan, X.; Ho, J.C.; Yang, C. Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM. arXiv 2024, arXiv:2403.08818. [Google Scholar]
- Montagna, S.; Ferretti, S.; Klopfenstein, L.C.; Florio, A.; Pengo, M.F. Data Decentralisation of LLM-Based Chatbot Systems in Chronic Disease Self-Management. In Proceedings of the 2023 ACM Conference on Information Technology for Social Good, Lisbon, Portugal, 6–8 September 2023; pp. 205–212. [Google Scholar] [CrossRef]
- Kusa, W.; Mosca, E.; Lipani, A. “Dr LLM, what do I have?”: The impact of user beliefs and prompt formulation on health diagnoses. In Proceedings of the Third Workshop on NLP for Medical Conversations; Khosla, S., Ed.; Association for Computational Linguistics: Bali, Indonesia, 2023; pp. 13–19. [Google Scholar] [CrossRef]
- Lai, T.; Shi, Y.; Du, Z.; Wu, J.; Fu, K.; Dou, Y.; Wang, Z. Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models. arXiv 2023, arXiv:2307.11991. [Google Scholar]
- Humphrey, B.A. Data Privacy vs. Innovation: A Quantitative Analysis of Artificial Intelligence in Healthcare and Its Impact on HIPAA regarding the Privacy and Security of Protected Health Information. Ph.D. Thesis, Robert Morris University, Moon Twp, PA, USA, 2021. [Google Scholar]
- Dhakal, U.; Singh, A.K.; Devkota, S.; Sapkota, Y.; Lamichhane, B.; Paudyal, S.; Dhakal, C. GPT-4’s assessment of its performance in a USMLE-based case study. arXiv 2024, arXiv:2402.09654. [Google Scholar]
- Denecke, K.; May, R.; Rivera Romero, O. Potential of Large Language Models in Health Care: Delphi Study. J. Med. Internet Res. 2024, 26, e52399. [Google Scholar] [CrossRef]
- Meskó, B.; Hetényi, G.; Győrffy, Z. The role of artificial intelligence in precision medicine. Expert Rev. Precis. Med. Drug Dev. 2017, 2, 239–241. [Google Scholar] [CrossRef]
- Singh, A.; Ehtesham, A.; Mahmud, S.; Kim, J.H. Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model. In Proceedings of the 2024 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2024; pp. 73–78. [Google Scholar] [CrossRef]
- de Curtò, J.; de Zarzà, I.; Roig, G.; Cano, J.C.; Manzoni, P.; Calafate, C.T. LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments. Electronics 2023, 12, 2814. [Google Scholar] [CrossRef]
- Chiu, Y.Y.; Sharma, A.; Lin, I.W.; Althoff, T. A Computational Framework for Behavioral Assessment of LLM Therapists. arXiv 2024, arXiv:2401.00820. [Google Scholar]
- Batsis, J.A.; Mackenzie, T.A.; Emeny, R.T.; Lopez-Jimenez, F.; Bartels, S.J. Low Lean Mass With and Without Obesity, and Mortality: Results From the 1999–2004 National Health and Nutrition Examination Survey. J. Gerontol. Ser. A 2017, 72, 1445–1451. [Google Scholar] [CrossRef]
- Baharudin, N.; Mohamed-Yassin, M.S.; Daher, A.M.; Ramli, A.S.; Khan, N.M.N.; Abdul-Razak, S. Prevalence and factors associated with lipid-lowering medications use for primary and secondary prevention of cardiovascular diseases among Malaysians: The REDISCOVER study. BMC Public Health 2022, 22, 228. [Google Scholar] [CrossRef]
- Abbasian, M.; Azimi, I.; Rahmani, A.M.; Jain, R. Conversational Health Agents: A Personalized LLM-Powered Agent Framework. arXiv 2024, arXiv:2310.02374. [Google Scholar]
- Meng, X.; Yan, X.; Zhang, K.; Liu, D.; Cui, X.; Yang, Y.; Zhang, M.; Cao, C.; Wang, J.; Wang, X.; et al. The application of large language models in medicine: A scoping review. iScience 2024, 27, 109713. [Google Scholar] [CrossRef]
- Clusmann, J.; Kolbinger, F.R.; Muti, H.S.; Carrero, Z.I.; Eckardt, J.N.; Laleh, N.G.; Löffler, C.M.L.; Schwarzkopf, S.C.; Unger, M.; Veldhuizen, G.P.; et al. The future landscape of large language models in medicine. Commun. Med. 2023, 3, 141. [Google Scholar] [CrossRef] [PubMed]
- Reese, J.T.; Danis, D.; Caufield, J.H.; Groza, T.; Casiraghi, E.; Valentini, G.; Mungall, C.J.; Robinson, P.N. On the limitations of large language models in clinical diagnosis. medRxiv 2023. [Google Scholar] [CrossRef]
- Yu, K.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare: A critical analysis of the legal and ethical implications. Int. J. Med. Inform. 2020, 141, 104431. [Google Scholar]
- Singh, A.K.; Lamichhane, B.; Devkota, S.; Dhakal, U.; Dhakal, C. Do Large Language Models Show Human-like Biases? Exploring Confidence—Competence Gap in AI. Information 2024, 15, 92. [Google Scholar] [CrossRef]
- Ullah, E.; Parwani, A.; Baig, M.M.; Singh, R. Challenges and Barriers of Using Large Language Models (LLM) Such as ChatGPT for Diagnostic Medicine with a Focus on Digital Pathology—A Recent Scoping Review. Diagn. Pathol. 2024, 19, 43. [Google Scholar] [CrossRef]
- Jo, E.; Epstein, D.A.; Jung, H.; Kim, Y.H. Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; Association for Computing Machinery: New York, NY, USA, 2023. Article 18. pp. 1–16. [Google Scholar] [CrossRef]
- Baric-Parker, J.; Anderson, E. Patient Data-Sharing for AI: Ethical Challenges, Catholic Solutions. Linacre Q. 2020, 87, 471–481. [Google Scholar] [CrossRef] [PubMed]
- Yuan, J.; Tang, R.; Jiang, X.; Hu, X. LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability. Am. Med. Inform. Assoc. (AMIA) Annu. Symp. 2024. Available online: https://par.nsf.gov/biblio/10448809 (accessed on 13 January 2025).
- Jin, M.; Yu, Q.; Shu, D.; Zhang, C.; Fan, L.; Hua, W.; Zhu, S.; Meng, Y.; Wang, Z.; Du, M.; et al. Health-LLM: Personalized Retrieval-Augmented Disease Prediction System. arXiv 2024, arXiv:2402.00746. [Google Scholar]
- Kim, Y.; Xu, X.; McDuff, D.; Breazeal, C.; Park, H.W. Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data. arXiv 2024, arXiv:2401.06866. [Google Scholar]
- Xu, X.; Yao, B.; Dong, Y.; Gabriel, S.; Yu, H.; Hendler, J.; Ghassemi, M.; Dey, A.K.; Wang, D. Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2024, 8, 32. [Google Scholar] [CrossRef]
- Ghosh, A.; Acharya, A.; Jain, R.; Saha, S.; Chadha, A.; Sinha, S. CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare. arXiv 2023, arXiv:2312.11541. [Google Scholar] [CrossRef]
- Shieh, A.; Tran, B.; He, G.; Kumar, M.; Freed, J.A.; Majety, P. Assessing ChatGPT 4.0’s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Sci. Rep. 2024, 14, 9330. [Google Scholar] [CrossRef]
- Yuan, J.; Tang, R.; Jiang, X.; Hu, X. Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching. AMIA Annu. Symp. Proc. 2024, 2023, 1324–1333. [Google Scholar]
- Centers for Disease Control and Prevention. Symptoms of Diseases. Available online: https://www.cdc.gov (accessed on 9 October 2024).
- World Health Organization. Disease Symptoms and Information. Available online: https://www.who.int (accessed on 9 October 2024).
- Mayo Clinic. Symptoms of Common Diseases. Available online: https://www.mayoclinic.org (accessed on 9 October 2024).
- Cleveland Clinic. Disease Symptoms and Conditions. Available online: https://my.clevelandclinic.org (accessed on 9 October 2024).
- Johns Hopkins Hospital. Symptoms and Disease Information. Available online: https://www.hopkinsmedicine.org (accessed on 9 October 2024).
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Frantzidis, C.A.; Bamidis, P.D. Description and Future Trends of ICT Solutions Offered Towards Independent Living: The Case of LLM Project. In Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 9–13 June 2009; p. 59. [Google Scholar] [CrossRef]
Model | Diseases Evaluated per Model |
---|---|
Gemini | 50 |
GPT-3.5 | 50 |
GPT-4 | 50 |
o1 Preview Model | 50 |
GPT-4o | 50 |
Total | 250 total evaluations |
Model | Precision | Recall | F1 Score |
---|---|---|---|
Gemini | 0.97 | 0.69 | 0.81 |
GPT-3.5 | 0.91 | 0.85 | 0.88 |
GPT-4 | 0.96 | 0.92 | 0.94 |
o1 Preview | 0.93 | 0.91 | 0.92 |
GPT-4o | 0.95 | 0.88 | 0.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gupta, G.K.; Singh, A.; Manikandan, S.V.; Ehtesham, A. Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses. AI 2025, 6, 13. https://doi.org/10.3390/ai6010013
Gupta GK, Singh A, Manikandan SV, Ehtesham A. Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses. AI. 2025; 6(1):13. https://doi.org/10.3390/ai6010013
Chicago/Turabian StyleGupta, Gaurav Kumar, Aditi Singh, Sijo Valayakkad Manikandan, and Abul Ehtesham. 2025. "Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses" AI 6, no. 1: 13. https://doi.org/10.3390/ai6010013
APA StyleGupta, G. K., Singh, A., Manikandan, S. V., & Ehtesham, A. (2025). Digital Diagnostics: The Potential of Large Language Models in Recognizing Symptoms of Common Illnesses. AI, 6(1), 13. https://doi.org/10.3390/ai6010013