Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj
Abstract
:1. Introduction
- Knowledge retrieval: When the model encounters uncertain or ambiguous input, the RAG module retrieves relevant knowledge from specific external resources. This retrieval process enables the model to augment its understanding of the topic at hand and generate more informed responses.
- Validation of uncertain text: After retrieving relevant knowledge, the RAG module validates the uncertain text generated by the GPT-3.5 Turbo model against the retrieved information. By cross-referencing the model’s output with external knowledge sources, the RAG module assesses the accuracy and credibility of the generated text, identifying and correcting any inaccuracies or inconsistencies before finalizing the response.
- Domain-specific fine-tuning of LLM: We fine-tune a large language model (LLM) specifically for the domain of healthcare and cultural sensitivities relevant to Hajj pilgrims. This fine-tuning process ensures that the model is capable of understanding and generating relevant responses within the context of healthcare conversations during the pilgrimage.
- Introducing the HajjHealthQA dataset: To facilitate the development and evaluation of our healthcare chatbot, we introduce the HajjHealthQA dataset. This dataset contains a diverse collection of questions, answers, and conversations relevant to healthcare issues faced by Hajj pilgrims. We also employ synthetic data augmentation techniques (https://github.com/AbeerMostafa/HajjHealthQA-Dataset (accessed on 1 March 2024)).
- RAG module for uncertainty validation: We add a retrieval-augmented generation (RAG) module to validate uncertain information provided by the chatbot. This mechanism enhances the reliability and accuracy of the chatbot’s responses by cross-referencing generated text with external knowledge sources.
- Training a secondary AI agent on the HealthVer dataset: We train two separate models as part of our framework, one on the HajjHealthQA dataset for Hajj-specific healthcare inquiries and another on the HealthVer dataset for medical information verification. The latter is used to verify that the medical information generated by our chatbot is supported by medical evidence.
- Prompt engineering for case study specifics: We employ prompt engineering techniques tailored to the specific case study of building a healthcare chatbot for Hajj pilgrims. This ensures that the chatbot’s responses are optimized for relevance, accuracy, and cultural appropriateness within the context of Hajj-related healthcare scenarios.
- Multilingual support: To accommodate the linguistic diversity of Hajj pilgrims, our chatbot offers multilingual support, allowing users to interact in their preferred language.
2. Related Work
2.1. Health Challenges Faced by Hajj Pilgrims
2.2. Medical Q&A
2.3. Use of Synthetic Data
2.4. Hajj Q&A
3. HajjHealthQA Dataset
4. Methodology
4.1. Model Fine-Tuning
4.2. Retrieval-Augmented Generation
4.3. Evidence-Based Verification
4.4. Prompt Engineering
- Task-specific prompts
- Multilingual support
- Customization for cultural sensitivity
- Contextual awareness and follow-up prompts
- Iterative improvement through user feedback
5. Experimental Setup
5.1. Hyperparameter Tuning
5.1.1. Number of Epochs
5.1.2. Batch Size
5.1.3. Learning Rate Multiplier
5.2. Evaluation Metrics
6. Results and Discussion
6.1. Accuracy Analysis
6.2. Quality Metrics
6.3. Comparison with Benchmark Results
7. Data Privacy and Ethical Considerations
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Abdelmoety, D.; El-Bakri, N.; Almowalld, W.; Turkistani, Z.; Bugis, B.; Baseif, E.; Melbari, M.H.; AlHarbi, K.; Abu-Shaheen, A. Characteristics of Heat Illness during Hajj: A Cross-Sectional Study. BioMed Res. Int. 2018, 2018, 5629474. [Google Scholar] [CrossRef] [PubMed]
- Al-Masud, S.M.R.; Bakar, A.A.; Yussof, S. Determining the types of diseases and emergency issues in Pilgrims during Hajj: A literature review. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 87–94. [Google Scholar]
- Razavi, S.M.; Mardani, M.; Salamati, P. Infectious diseases and preventive measures during hajj mass gatherings: A review of the literature. Arch. Clin. Infect. Dis. 2018, 13, e62526. [Google Scholar] [CrossRef]
- Salmon-Rousseau, A.; Piednoir, E.; Cattoir, V.; de La Blanchardiere, A. Hajj-associated infections. MEdecine Mal. Infect. 2016, 46, 346–354. [Google Scholar] [CrossRef] [PubMed]
- Yezli, S.; Yassin, Y.; Mushi, A.; Almuzaini, Y.; Khan, A. Pattern of utilization, disease presentation, and medication prescribing and dispensing at 51 primary healthcare centers during the Hajj mass gathering. BMC Health Serv. Res. 2022, 22, 143. [Google Scholar] [CrossRef] [PubMed]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and other large language models are double-edged swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef] [PubMed]
- Javaid, M.; Haleem, A.; Singh, R.P. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCounc. Trans. Benchmarks Stand. Eval. 2023, 3, 100105. [Google Scholar] [CrossRef]
- De Angelis, L.; Baglivo, F.; Arzilli, G.; Privitera, G.P.; Ferragina, P.; Tozzi, A.E.; Rizzo, C. ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Front. Public Health 2023, 11, 1166120. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 5 February 2024).
- Glik, D.C. Risk communication for public health emergencies. Annu. Rev. Public Health 2007, 28, 33–54. [Google Scholar] [CrossRef]
- Almehmadi, M.; Pescaroli, G.; Alqahtani, J.S.; Oyelade, T. Investigating health risk perceptions during the Hajj: Pre-Travel advice and adherence to preven-tative health measures. Afr. J. Respir. Med. 2021, 16, 1–6. [Google Scholar]
- Alqahtani, A.S.; Tashani, M.; Heywood, A.E.; Booy, R.; Rashid, H.; Wiley, K.E. Exploring Australian Hajj Tour Operators’ Knowledge and Practices Regarding Pilgrims’ Health Risks: A Qualitative Study. JMIR Public Health Surveill. 2019, 5, e10960. [Google Scholar] [CrossRef] [PubMed]
- Aljohani, A.; Nejaim, S.; Khayyat, M.; Aboulola, O. E-government and logistical health services during Hajj season. Bull. Natl. Res. Cent. 2022, 46, 112. [Google Scholar] [CrossRef]
- Dzaraly, N.D.; Rahman, N.I.A.; Simbak, N.B.; Ab Wahab, S.; Osman, O.; Ismail, S.; Haque, M. Patterns of communicable and non-communicable diseases in Pilgrims during Hajj. Res. J. Pharm. Technol. 2014, 7, 12. [Google Scholar]
- Abdelhay, M.; Mohammed, A.; Hefny, H.A. Deep learning for Arabic healthcare: MedicalBot. Soc. Netw. Anal. Min. 2023, 13, 71. [Google Scholar] [CrossRef] [PubMed]
- Singh, S.; Susan, S. Healthcare Question–Answering System: Trends and Perspectives. In Proceedings of the International Health Informatics Conference: IHIC 2022, Cuttack, India, 17–19 May 2022; Springer: Cham, Switzerland, 2023; pp. 239–249. [Google Scholar]
- Pal, V.K.; Singh, S.; Sinha, A.; Shekh, M.S. Medical Chatbot using AI and NLP. i-Manag. J. Softw. Eng. 2022, 16, 46. [Google Scholar]
- Long, C.; Subburam, D.; Lowe, K.; Santos, A.d.; Zhang, J.; Hwang, S.; Saduka, N.; Horev, Y.; Su, T.; Cote, D.; et al. ChatENT: Augmented Large Language Model for Expert Knowledge Retrieval in Otolaryngology-Head and Neck Surgery. medRxiv 2023, 2023-08. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Li, Y.; Lu, S.; Van, H.; Aerts, H.J.W.L.; Savova, G.K.; Bitterman, D.S. Evaluating the ChatGPT family of models for biomedical reasoning and classification. J. Am. Med. Inform. Assoc. JAMIA 2024, 31, 940–948. [Google Scholar] [CrossRef]
- Puri, R.; Spring, R.; Shoeybi, M.; Patwary, M.; Catanzaro, B. Training Question Answering Models From Synthetic Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 5811–5826. [Google Scholar] [CrossRef]
- Wei, J.; Huang, D.; Lu, Y.; Zhou, D.; Le, Q.V. Simple synthetic data reduces sycophancy in large language models. arXiv 2023, arXiv:2308.03958. [Google Scholar]
- Sulaiman, S.; Mohamed, H.; Arshad, M.R.M.; Rashid, N.A.A.; Yusof, U.K. Hajj-QAES: A Knowledge-Based Expert System to Support Hajj Pilgrims in Decision Making. In Proceedings of the 2009 International Conference on Computer Technology and Development, Kota Kinabalu, Malaysia, 13–15 November 2009; Volume 1, pp. 442–446. [Google Scholar] [CrossRef]
- Sharef, N.M.; Murad, M.A.; Mustapha, A.; Shishechi, S. Semantic question answering of umrah pilgrims to enable self-guided education. In Proceedings of the 2013 13th International Conference on Intellient Systems Design and Applications, Salangor, Malaysia, 8–10 December 2013; pp. 141–146. [Google Scholar] [CrossRef]
- Mohamed, H.H.; Arshad, M.R.H.M.; Azmi, M.D. M-HAJJ DSS: A mobile decision support system for Hajj pilgrims. In Proceedings of the 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 15–17 August 2016; pp. 132–136. [Google Scholar] [CrossRef]
- Nusuk: Your Official Guide to Makkah and Madinah. Available online: https://www.nusuk.sa/ (accessed on 1 November 2023).
- Mecca WABot: Smart System Makes Hajj and Umrah Pilgrims Easy to Worship. Available online: https://kumparan.com/beritaanaksurabaya/mecca-wabot-sistem-pintar-mudahkan-jemaah-haji-dan-umrah-beribadah-20f6faH8EMZ/2 (accessed on 1 November 2023).
- Ministry of Health in the Kingdom of Saudi Arabia. Available online: https://www.moh.gov.sa/en/ (accessed on 1 November 2023).
- WHO Chronic Respiratory Diseases. Available online: https://www.who.int/health-topics/chronic-respiratory-diseases#tab=tab_3 (accessed on 1 November 2023).
- Ministry of Hajj and Umrah in the Kingdom of Saudi Arabia. Available online: https://www.haj.gov.sa/Home (accessed on 1 November 2023).
- CGD Society—FAQ Lung Issues. Available online: https://cgdsociety.org/living-with-cgd/managing-cgd/common-problems/lung-problems/faqs-lung-issues/ (accessed on 1 November 2023).
- Top Doctors—Frequently Asked Questions about Lung Diseases. Available online: https://www.topdoctors.co.uk/medical-articles/frequently-asked-questions-about-lung-diseases# (accessed on 1 November 2023).
- Hajj and Umrah Health Requirements. Available online: https://www.saudiembassy.net/hajj-and-umrah-health-requirements. (accessed on 1 November 2023).
- Health Requirements for Hajj. Available online: https://www.moh.gov.sa/en/HealthAwareness/Pilgrims_Health/Pages/default.aspx (accessed on 1 November 2023).
- Sarrouti, M.; Abacha, A.B.; M’rabet, Y.; Demner-Fushman, D. Evidence-based fact-checking of health-related claims. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3499–3512. [Google Scholar]
- Phatak, A.; Mago, V.K.; Agrawal, A.; Inbasekaran, A.; Giabbanelli, P.J. Narrating Causal Graphs with Large Language Models. arXiv 2024, arXiv:2403.07118. [Google Scholar]
- Gao, M.; Hu, X.; Ruan, J.; Pu, X.; Wan, X. LLM-based NLG Evaluation: Current Status and Challenges. arXiv 2024, arXiv:2402.01383. [Google Scholar]
- Saadany, H.; Orǎsan, C. BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-Oriented Text. In Proceedings of the Translation and Interpreting Technology Online Conference, Online, 5–7 July 2021; pp. 48–56. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Akter, S.N.; Yu, Z.; Muhamed, A.; Ou, T.; Bäuerle, A.; Cabrera, Á.A.; Dholakia, K.; Xiong, C.; Neubig, G. An In-depth Look at Gemini’s Language Abilities. arXiv 2023, arXiv:2312.11444. [Google Scholar]
Dataset | GPT-3.5 Turbo | Fine-Tuning | Fine-Tuning + RAG |
---|---|---|---|
Real data only | 68.1% | 73.3% | 79.8% |
Synthetic data only | 86.6% | 93.3% | 97.4% |
Real and synthetic 50/50 | 76.4% | 83.5% | 89% |
Dataset | ROUGE | Precision | F1-Score |
---|---|---|---|
Real data only | 0.78 | 0.76 | 0.76 |
Synthetic data only | 0.92 | 0.89 | 0.9 |
Real and synthetic 50/50 | 0.87 | 0.84 | 0.84 |
Dataset | Recall | Precision | F1-Score |
---|---|---|---|
Real data only | 0.873 | 0.844 | 0.86 |
Synthetic data only | 0.93 | 0.91 | 0.92 |
Real and synthetic 50/50 | 0.91 | 0.9 | 0.898 |
Dataset | Recall | Precision | F1-Score |
---|---|---|---|
Real data only | 0.87 | 0.85 | 0.86 |
Synthetic data only | 0.89 | 0.88 | 0.88 |
Real and synthetic | 0.88 | 0.86 | 0.86 |
Prompt | Output |
---|---|
Please decide if the following claim supports the evidence. Claim: Engage in light to moderate physical activities, such as walking, and avoid strenuous exercises. Rest when needed to prevent overexertion. Evidence: Engage in light exercises, such as walking, and pace yourself during rituals. Listen to your body, take breaks, and avoid strenuous activities that may lead to exhaustion. | SUPPORTS |
Please decide if the following claim supports the evidence. Claim: Elderly pilgrims should consult with their healthcare provider to ensure they are physically able to participate in Hajj. They should also take precautions to prevent heat-related illnesses and stay hydrated. Evidence: Elderly pilgrims should undergo a thorough medical evaluation before Hajj. Consider factors such as mobility, medication management, and the overall impact on their health. | SUPPORTS |
Please decide if the following claim supports the evidence. Claim: Yes, there are medical facilities available during Hajj to provide emergency care. Evidence: Yes, medical facilities are set up along the Hajj route, and hospitals are equipped to handle emergencies. | SUPPORTS |
Dataset | Gemini Pro | GPT-3.5 Turbo | GPT 4 Turbo | Mixtral |
---|---|---|---|---|
MMLU (5-shot) | 65.22 | 67.75 | 80.48 | 68.81 |
MMLU (CoT) | 62.09 | 70.07 | 78.95 | 59.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alghamdi, H.M.; Mostafa, A. Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj. Information 2024, 15, 371. https://doi.org/10.3390/info15070371
Alghamdi HM, Mostafa A. Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj. Information. 2024; 15(7):371. https://doi.org/10.3390/info15070371
Chicago/Turabian StyleAlghamdi, Hanan M., and Abeer Mostafa. 2024. "Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj" Information 15, no. 7: 371. https://doi.org/10.3390/info15070371
APA StyleAlghamdi, H. M., & Mostafa, A. (2024). Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj. Information, 15(7), 371. https://doi.org/10.3390/info15070371