Previous Article in Journal
Quality Improvement Project to Change Prescribing Habits of Surgeons from Combination Opioids Such as Hydrocodone/Acetaminophen to Single-Agent Opioids Such as Oxycodone in Pediatric Postop Pain Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Anesthetic Patient Education Through the Utilization of Large Language Models for Improved Communication and Understanding

by
Jeevan Avinassh Ratnagandhi
1,
Praghya Godavarthy
1,
Mahindra Gnaneswaran
1,
Bryan Lim
2,* and
Rupeshraj Vittalraj
3
1
Department of Emergency Medicine, Peninsula Health, Melbourne, VIC 3199, Australia
2
Department of Emergency Medicine, Western Health, Melbourne, VIC 3011, Australia
3
Department of Anesthesia, Eastern Health, Melbourne, VIC 3128, Australia
*
Author to whom correspondence should be addressed.
Anesth. Res. 2025, 2(1), 4; https://doi.org/10.3390/anesthres2010004
Submission received: 17 December 2024 / Revised: 15 January 2025 / Accepted: 17 January 2025 / Published: 30 January 2025

Abstract

:
Background/Objectives: The rapid development of Large Language Models (LLMs) presents promising applications in healthcare, including patient education. In anesthesia, where patient anxiety is common due to misunderstandings and fears, LLMs could alleviate perioperative anxiety by providing accessible and accurate information. This study explores the potential of LLMs to enhance patient education on anesthetic and perioperative care, addressing time constraints faced by anesthetists. Methods: Three language models—ChatGPT-4, Claude 3, and Gemini—were evaluated using three common patient prompts. To minimize bias, incognito mode was used. Readability was assessed with the Flesch–Kincaid, Flesch Reading Ease, and Coleman–Liau indices. Response quality was rated for clarity, comprehension, and informativeness using the DISCERN score and Likert Scale. Results: Claude 3 required the highest reading level, delivering detailed responses but lacking citations. ChatGPT-4o offered accessible and concise answers but missed key details. Gemini provided reliable and comprehensive information and emphasized professional guidance but lacked citations. According to DISCERN and Likert scores, Gemini had the highest rank for reliability and patient friendliness. Conclusions: This study found that Gemini provided the most reliable information, followed by Claude 3, although no significant differences were observed. All models showed limitations in bias and lacked sufficient citations. While ChatGPT-4o was the most comprehensible, it lacked clinical depth. Further research is needed to balance simplicity with clinical accuracy, explore Artificial Intelligence (AI)–physician collaboration, and assess AI’s impact on patient safety and medical education.

1. Introduction

The swift progression of Artificial Intelligence (AI) has given rise to Large Language Models (LLMs), sophisticated algorithms that leverage deep learning techniques and extensive datasets to comprehend and generate information across a wide spectrum of topics, presenting it in a manner that closely emulates human communication [1,2]. The capabilities of LLMs have prompted evaluations of their potential application within the healthcare sector, where their aptitude for learning and pattern recognition presents promising opportunities for tasks such as diagnostic formulation, genotype analysis, and treatment planning [1,3]. Patient education and counseling represent a particularly promising domain for the application of LLMs, owing to their advanced natural language processing (NLP) capabilities, enabling them to generate human-like responses by predicting words based on vast training data [2,3].
Anesthesia is a topic that can induce anxiety and apprehension in patients, primarily stemming from a limited understanding of the underlying procedures, fear of mortality, and concerns about potential side effects [4]. This sense of unease is compounded by global studies revealing widespread confusion among patients regarding the role of anesthetists, the common risks associated with anesthesia, and enduring misconceptions about postoperative side effects such as postoperative pain [5,6,7,8]. Such uncertainty can contribute to considerable perioperative anxiety, which poses a substantial challenge given its potential to negatively impact postoperative outcomes and increase morbidity [4].
This underscores the critical importance of patient education, which, when delivered effectively during the perioperative period, can provide essential psychosocial support and alleviate preoperative anxiety, ultimately leading to improved postoperative outcomes [9]. Anesthetists’ requirements to provide patient education and relieve preoperative anxiety can be limited by the time constraints of other clinical and non-clinical duties. The advent of LLMs offers a potential solution to mitigate these pressures, enabling the efficient delivery of patient education without overburdening anesthetists. This study seeks to explore the potential of LLMs to provide comprehensive and patient-friendly information to patients regarding anesthetic and perioperative care.

2. Materials and Methods

Three language models—ChatGPT-4o (Open AI, San Francisco, CA, USA), Claude 3 (Anthropic, San Francisco, CA, USA), and Gemini (Google, Mountain View, CA, USA)—were rigorously assessed by using a set of fifteen distinct prompts (Table 1). The models were selected for their extensive popularity among users, each attracting millions of monthly visitors [10,11]. Additionally, their free-to-use accessibility makes them the most probable LLMs to be utilized by the general public. The authors Ratnagandhi and Lim (Junior Medical Officers) collaboratively determined that these prompts encompassed the most frequently asked questions from patients. To minimize potential bias related to prior online behavior, this study employed incognito mode in the browser. Consistency in the testing procedure was ensured by having a single author present all prompts on the same day, utilizing the same device and account.
A set of evaluation metrics was considered—for assessing readability, we utilized the Flesch–Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), and Coleman–Liau Index (Table 2) [12,13,14].
The Flesch Reading Ease score ranges from 1 to 100, with higher values indicating easier readability. The Flesch–Kincaid Grade Level metric was employed to gauge the educational attainment needed for understanding, where a score of 8 signifies suitability for individuals at an eighth-grade education level in the United States. The Coleman–Liau Index spans from 0 to infinity, correlating with the US school grade necessary for comprehension; for example, a score of 6 indicates appropriateness for a sixth-grade reading level. Scores between 13 and 16 correspond to college-level understanding, while those exceeding 16 are categorized as a professional level. The quality of the responses generated by the LLMs was evaluated using the DISCERN score (Table 2) and a Likert Scale (Table 3), with ratings ranging from 1 to 5 across several criteria: clarity, comprehension, readability, patient friendliness, and informativeness [15,16]. The DISCERN and Likert Scale scores were derived by an anesthetist (Vittalraj) and junior medical officers (Ratnagandhi and Lim). A paired t-test (p < 0.05) was also conducted to compare the LLMs’ scores against each other (Table 4).

3. Results

According to Table 2, the Flesch–Kincaid Grade Level Analysis indicates that responses generated by Claude 3 necessitate the highest level of educational attainment in the United States (US) for comprehension, with an average score of 16.5 ± 3.9, equivalent to a postgraduate level of education [17]. In contrast, outputs from ChatGPT-4o required the lowest US grade reading level, with an average score of 11.6 ± 2.4, placing it below Gemini, which scored 13.2 ± 2.1. This trend is further corroborated by the Flesch Reading Ease Score Analysis, where Claude 3 again demanded the most advanced reading proficiency (18.4 ± 13.6), followed by Gemini (31.3 ± 10.9) and ChatGPT-4o (35.2 ± 12.6). Consistent findings emerged from the Coleman–Liau Readability Index Analysis, with Claude 3 being identified as the most challenging for readers, recording an average score of 17.8 ± 2.3. Conversely, ChatGPT-4o exhibited the highest level of accessibility according to this index, achieving a score of 16.3 ± 2.3, while Gemini recorded a score of 16.8 ± 1.7.
The DISCERN score assessment indicated that Gemini delivered the most reliable information in response to the prompts, achieving a score of 50.9 ± 3.4. This was succeeded by Claude 3, which received a score of 47.5 ± 4.9, while ChatGPT-4o registered a mean score of 42.6 ± 5.5.
The Likert Scale (Table 3) scoring system evaluated Claude 3 as the preeminent LLM, achieving an aggregate score of 23 and securing a perfect score of 5 across all domains, with the exception of readability. This was closely followed by Gemini, which attained perfect scores in the categories of clarity and patient friendliness, aggregating a score of 21. Conversely, the Likert Scale assessment identified ChatGPT-4o as the lowest-performing LLM, with a total score of 19; nonetheless, it was the sole model to achieve a perfect score of 5 in readability.

3.1. ChatGPT-4o

ChatGPT-4o provided responses that effectively addressed each prompt, employing straightforward language and occasionally utilizing bullet points to segment information into manageable portions. This approach earned it a perfect score of 5 in the patient friendliness category on the Likert Scale (Table 3) However, in comparison to Claude 3 and Gemini, its answers were somewhat superficial, lacking the depth and elaboration present in their responses. This discrepancy was particularly apparent in the model’s response to prompt 5, where it failed to adequately address the potential risks associated with the mishandling of insulin prior to anesthesia. This shortcoming was reflected in its DISCERN score of 40, a significant deviation from the higher scores of Claude 3 and Gemini, which were 53 and 51, respectively. Similarly, for prompt 8, ChatGPT-4o received a DISCERN score of 42, compared to Claude 3’s score of 48, which can be attributed to the omission of critical information, particularly regarding red flags to monitor during the postoperative period, as well as the lack of detailed pharmacological options for pain management. Additionally, it is worth noting that ChatGPT-4o’s responses were typically more concise than those of the other two LLMs, often offering significantly shorter answers. Additionally, ChatGPT-4o did not reference any sources for the information and advice presented. While the responses from ChatGPT-4o were accessible and easy to comprehend, they were ultimately unremarkable in terms of content depth and nuance.

3.2. Claude 3

Although Claude 3 did not employ the same linguistic simplicity as ChatGPT-4o, as evidenced by its consistently inferior FKGL, FRES, and Coleman–Liau Index scores (Table 2), its responses were markedly more detailed and nuanced. The model distinguished itself through its tendency to provide supplementary information. This was reflected in its response to prompt 1, not only articulating the need for fasting but also presenting quantifiable and actionable fasting guidelines for patients. Similarly, in prompt 12, the model outlined specific prevention strategies designed to mitigate risks associated with anesthesia. These qualities were clearly reflected by Claude 3’s perfect Likert Scale (Table 3) score of 5 for both informativeness and comprehension, as opposed to ChatGPT-4o’s score of 3 for the same categories.
Moreover, Claude 3 regularly employed bullet points to organize its responses, enhancing the clarity and efficiency of its information delivery. Additionally, Claude 3 demonstrated a unique feature in its response to prompt 2, proactively offering to provide patients with further reading material and resources. Unfortunately, this was only demonstrated in that single response. However, despite the substantial depth of clinical information, Claude 3 consistently lacked citations and references to substantiate the data presented. Overall, Claude 3’s responses provided patients with detailed and expansive information to their questions.

3.3. Gemini

Similar to Claude 3, Gemini consistently delivered detailed and comprehensive responses to prompts, often providing additional insights beyond the scope of the questions asked. For instance, in its response to prompt 12, Gemini outlined a broad spectrum of risks associated with anesthesia, categorizing them by threat level, while also offering practical recommendations to mitigate these risks. This approach enabled Gemini to notably outperform both Claude 3 and ChatGPT-4o, achieving a DISCERN score of 52, compared to their respective scores of 44 and 38 on the same prompt. Likewise, in response to prompt 11, the model not only addressed the direct query by listing precautions to take prior to anesthesia but also suggested healthy lifestyle habits for patients to adopt in the lead-up to the procedure.
Gemini also employed bullet points effectively to organize its responses, presenting complex information in a clear and digestible format. However, like Claude 3, it did not provide sources or citations to substantiate the clinical information presented.
Furthermore, echoing Claude 3, Gemini underscored the critical role of healthcare professionals in guiding patient care. Throughout its responses to prompts 4, 5, 6, 13, and 15, the model emphasized the necessity of involving anesthetists or other qualified medical practitioners in decision-making, stressing that patient treatment should always be informed by professional expertise.
In conclusion, while Gemini lacked citations for its information, it provided in-depth and nuanced responses enriched with additional practical advice.

4. Discussion

Anesthesia has been an indispensable cornerstone of modern medicine for over 175 years, playing a critical role in facilitating a wide range of surgical and diagnostic procedures [18]. Its development and widespread use have revolutionized the ability to perform complex operations and interventions that would have been otherwise impossible for patients to endure, and it is involved in over 230 million procedures globally each year, underscoring its ubiquity and essential role in contemporary healthcare [19]. Despite its widespread use, anesthesia carries significant risks, ranging from minor complications to life-threatening events. This necessitates meticulous preoperative evaluation and a detailed informed consent process. Given the complexity and volume of cases, the time investment is considerable, impacting both healthcare providers and patients. The patient education process must strike a delicate balance between ensuring comprehensive understanding and maintaining the efficiency of medical workflows.
The evaluation of the three LLMs revealed significant concerns regarding clinical accuracy and potential bias, as reflected in the subjective DISCERN scores. All models consistently scored poorly on questions assessing bias detection and understanding of information sources. Notably, none of the models disclosed sources or citations for the provided information. This lack of transparency is particularly troubling given that LLMs rely on internet-sourced data, which may include inaccurate or biased information [20]. The “Black Box” problem is a significant hurdle in the development and deployment of LLMs within healthcare. It refers to the opacity of these complex systems: while we can observe the inputs and outputs, the internal processes by which LLMs transform data into responses remain largely hidden. This lack of transparency has profound implications for their understanding, trust, and safe implementation in clinical practice.
The “Black Box” opacity of LLMs raises concerns about biased outputs and hinders clinical integration. LLMs trained on limited data, such as from a single continent, may produce inaccurate recommendations for diverse populations [21]. Additionally, the inability to scrutinize the training data (guidelines, textbooks, and research) and provide real-time updates on medical information or guidelines unless prompted by the user necessitates medical practitioners’ input, potentially leading to distrust and reluctance to adopt these tools [22,23,24].
To safely utilize LLMs in healthcare, the “Black Box” problem must be addressed. This involves increasing data transparency regarding training sources, developing explainable AI (XAI) to make LLM reasoning clearer, and implementing human-in-the-loop systems where clinicians retain oversight of LLM recommendations [21,25].
All three models consistently recommended consulting an anesthetist or healthcare professional to mitigate bias in patient care decisions, as reflected in their high scores for “Does it provide support for shred-decision making?” in the DISCERN evaluation. This underscores the critical role of healthcare professionals in patient education and further emphasizes the need for human-in-the-loop systems in complex clinical decisions to ensure patient safety.
Claude 3 and Gemini achieved relatively higher scores in the DISCERN assessment, largely due to the breadth and depth of clinical information incorporated in their responses. Both models consistently delivered answers that extended beyond the immediate scope of the prompts, offering comprehensive insights that anticipated potential follow-up questions. This was reflected in both LLMs’ responses to prompt 11, “Is there anything I should avoid after having anesthesia?”, where Claude 3 and Gemini additionally provided practices that patients could incorporate to alleviate the effects or minimize the risks of anesthesia. This proactive approach is particularly valuable as patients may not always possess the knowledge to formulate the right questions. By providing detailed and anticipatory responses, these models help fill knowledge gaps that patients may not even know about. However, while this aspect of the responses is commendable, there is room for improvement, particularly in the inclusion of external resources, citations, and recommended readings. This would lend greater credibility and validation to the information provided, enhancing the overall reliability of the responses.
In terms of readability, ChatGPT-4o consistently recorded the lowest required reading grade across multiple readability assessments, including the Flesch–Kincaid Grade Level Analysis, Flesch–Kincaid Reading Ease, and Coleman–Liau Index. These results were further corroborated by its performance on the subjective Likert Scale, where it achieved a perfect score of 5 in the readability category, making it the only model to do so. This raises a critical question: Should LLMs prioritize delivering detailed and technically accurate information or should they focus on simplifying that information to make it more accessible? A 2014 Canadian study highlighted the difficulties patients face with information recall, underscoring that even well-meaning and detailed explanations can be difficult for patients to retain [26]. A related study in Spain revealed that 21% of patients did not read the consent forms they signed, and two-thirds could not recall the information given by their anesthetists [27]. These findings suggest that a range of factors, including cultural and functional illiteracy, contribute to patients’ difficulty in processing and remembering medical information [26].
From this perspective, ChatGPT-4o’s approach of emphasizing clear and patient-friendly language that is easier to comprehend might significantly enhance patients’ engagement in medical education and improve the retention of essential information. However, the trade-off lies in the potential oversimplification, which could mean that patients are not receiving enough clinical detail to make fully informed decisions. Given that many patients already struggle with complex medical terminology and detailed explanations, there is an inherent need for LLMs to strike a delicate balance between Claude 3 and Gemini’s depth of information and ChatGPT-4o’s readability. Too much detail may overwhelm patients, while too little may undermine their ability to make informed decisions. Achieving this balance is crucial if LLMs are to be trusted as reliable tools for independent patient education with minimal intervention from healthcare professionals. Further research could help determine the optimal level of detail, with patient feedback being central to understanding how these models can best serve as educational resources. Specifically, studies assessing patients’ experiences with LLM-generated responses to anesthesia-related queries could offer valuable insights into how these tools can be refined to maximize efficacy and minimize time constraints for healthcare professionals.
The use of LLMs in patient education offers a range of compelling benefits that can enhance both the quality and accessibility of healthcare information. LLMs’ proficiency in linguistics, vocabulary, and grammar allows for tailored explanations of complex medical concepts in a manner that is clear and easily understood by patients [28]. This can be particularly valuable in situations where patients need to comprehend intricate procedures, risks, or treatment options. Many patients have a limited understanding of what anesthesia entails, and this lack of knowledge can lead to heightened stress, misconceptions, and reluctance towards treatments and even increase risks of intraoperative and postoperative complications [29,30]. By providing real-time, tailored, and on-demand educational support, LLMs can help bridge knowledge gaps, reduce patient anxiety, and foster a sense of empowerment [31,32,33]. LLMs can also improve accessibility by overcoming language and literacy barriers, making anesthesia education more accessible to diverse patient populations [33,34].
By improving comprehension and reducing misunderstandings, LLMs can play a critical role in enhancing patient safety and satisfaction in the perioperative setting whilst also alleviating the cognitive load on anesthetists and other healthcare professionals responsible for delivering information in person.
While AI holds significant promise in the healthcare sector, offering the potential to enhance efficiency and alleviate the work burden on physicians, this study highlights the substantial modifications and improvements that remain necessary before AI can be seamlessly and safely integrated into clinical practice. Beyond technical challenges, it is crucial to consider the broader implications of AI’s role in the healthcare workforce, particularly the risk of displacing physicians. Additionally, the widespread adoption of AI could inadvertently diminish career satisfaction within the medical profession by reducing the intellectual stimulation traditionally derived from diagnostic problem-solving [25]. As discussed, the implementation of human-in-the-loop systems, where physicians actively engage in overseeing AI activities, would not only mitigate safety concerns but also preserve the central role of doctors in patient care [21,25,35]. Far from replacing medical professionals, such an approach could expand opportunities within the field, positioning doctors as pivotal figures in guiding and supervising AI applications in healthcare, thereby enhancing both patient outcomes and career prospects [35,36,37].

5. Conclusions

This study demonstrated that Gemini delivered the most reliable information among the three LLMs, as evidenced by its superior Likert scores and average DISCERN scores. However, no statistically significant differences were observed between the three models (Table 4). It was followed by Claude 3, which exhibited comparable performance across both subjective evaluation metrics. Despite this, all three LLMs revealed significant limitations regarding bias, primarily due to the lack of external resources and citations to substantiate the information presented.
ChatGPT, while emerging as the LLM that produced the most easily comprehensible responses, was found to lack sufficient clinical depth in its answers. This raises the need for further research to explore the optimal balance between simplicity in language and the necessary clinical detail required for informed patient decision-making. Such studies could help determine how to best tailor LLM-generated content to ensure both accessibility and accuracy, ensuring that patients are not only able to understand the information but are also equipped with enough clinical insight to make informed choices regarding their healthcare. Further research is required to evaluate the capacity of physicians to effectively collaborate with AI as the implications of such collaboration on the delivery of medical education can affect patient safety. In addition, it would be beneficial to investigate physicians’ satisfaction with the integration of AI into clinical practice, exploring influences on workflow, decision-making processes, and overall professional experience.

Author Contributions

Conceptualization, J.A.R. and B.L.; Methodology, J.A.R.; Data Curation, J.A.R., P.G. and M.G.; Formal Analysis J.A.R.; Writing, J.A.R., P.G., M.G. and B.L.; Resources, J.A.R. and P.G.; Review and Editing, J.A.R., B.L. and R.V.; and Supervision, B.L. and R.V. All authors have read and agreed to the published version of the manuscript.

Funding

No authors received any funding or financial support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Amohareb, S.N.; Aldairem, A.; Alrashed, M.; Saleh, K.B.; Badreldin, H.A.; et al. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef] [PubMed]
  2. Meng, X.; Yan, X.; Zhang, K.; Liu, D.; Cui, X.; Yang, Y.; Zhang, M.; Cao, C.; Wang, J.; Wang, X.; et al. The application of large language models in medicine: A scoping review. iScience 2024, 27, 109713. [Google Scholar] [CrossRef]
  3. Bajwa, J.; Munir, U.; Nori, A.; Williams, B. Artificial Intelligence in healthcare: Transforming the practice of medicine. Future Healthc. J. 2021, 8, 188–194. [Google Scholar] [CrossRef] [PubMed]
  4. Lumb, A.B.; Latchford, G.J.; Bekker, H.L.; Hetmanski, A.R.; Thomas, C.R.; Schofield, C.E. Investigating the causes of patient anxiety at induction of anaesthesia: A mixed methods study. J. Perioper. Pract. 2021, 31, 246–254. [Google Scholar] [CrossRef]
  5. Arefayne, N.R.; Getahun, A.B.; Melkie, T.B.; Endalew, N.S.; Nigatu, Y.A. Patients’ knowledge and perception of anesthesia and the anesthetists: Cross-sectional study. Ann. Med. Surg. 2022, 78, 103740. [Google Scholar] [CrossRef] [PubMed]
  6. Chew, S.T.; Tan, T.; Tan, S.S.; Ip-Yam, P.C. A survey of patients’ knowledge of anaesthesia and perioperative care. Singap. Med. J. 1998, 39, 399–402. [Google Scholar]
  7. Eyelade, O.R.; Akinyemi, J.O.; Adewole, I.F. Patients’ perception and knowledge of anaesthesia and anaesthetists- a questionnaire survey: Original research. S. Afr. J. Anaesth. Analg. 2010, 16, 28–31. [Google Scholar] [CrossRef]
  8. Singh, P.M.; Kumar, A.; Anjan, T. Rural perspective about anesthesia and anesthesiologist A cross sectional study. J. Anaesthesiol. Clin. Pharmacol. 2013, 29, 228–234. [Google Scholar] [CrossRef] [PubMed]
  9. Ali, Z.; Ahsan, Z.; Liaqat, N.; Din, I. Bridging the gap: Evaluation of preoperative patients’ education by comparing expectations and real perioperative surgical experiences: A mixed-methods descriptive cross-sectional study. BMC Health Serv. Res. 2024, 24, 964. [Google Scholar] [CrossRef] [PubMed]
  10. Menon, D.; Shilpa, K. “Chatting with GPT”: Analyszing the factors influencing users’ intention to Use the Open AI’s ChatGPT using the UTAUT model. Heliyon 2023, 9, e20962. [Google Scholar] [CrossRef]
  11. Zhou, S.; Luo, X.; Chen, C.; Jiang, H.; Yang, C.; Ran, G.; Yu, J.; Yin, C. The performance of large language model-powered chatbots compared to oncology physicians on colorectal cancer queries. Int. J. Surg. 2024, 110, 6509–6517. [Google Scholar] [CrossRef] [PubMed]
  12. Thomas, G.; Hartley, R.D.; Kincaid, J.P. Test-Retest and Inter-Analyst Reliability of the Automated Readability Index, Flesch Reading Ease Score, and the Fog Count. J. Read. Behav. 1975, 7, 149–154. [Google Scholar] [CrossRef]
  13. Flesch, R. A New Readability Yardstick. J. Appl. Psychol. 1948, 32, 221–233. [Google Scholar] [CrossRef]
  14. Coleman, M.; Liau, T.L. A Computer Readability Formula Designed for Machine Scoring. J. Appl. Psychol. 1975, 60, 283–284. [Google Scholar] [CrossRef]
  15. Fefer, M.; Lamb, C.C.; Shen, A.H.; Clardy, P.; Muralidhar, V.; Devlin, P.M.; Dee, E.C. Multilingual Analysis of the Quality and Readability of Online Health Information on the Adverse Effects of Breast Cancer Treatments. JAMA Surg. 2020, 155, 781. [Google Scholar] [CrossRef] [PubMed]
  16. Sullivan, G.M.; Artino, A.R. Analyzing and Interpreting Data from Likert-Type Scales. J. Grad. Med. Educ. 2013, 5, 541–542. [Google Scholar] [CrossRef]
  17. Flesch Reading Ease and the Flesch Kincaid Grade Level. Readable. Available online: https://readable.com/readability/flesch-reading-ease-flesch-kincaid-grade-level/ (accessed on 21 October 2024).
  18. Anesthesia. Available online: https://www.nigms.nih.gov/education/fact-sheets/Pages/anesthesia.aspx (accessed on 5 November 2024).
  19. Gottschalk, A.; Van Aken, H.; Zenz, M.; Standl, T. Is Anesthesia Dangerous? Dtsch. Ärzteblatt Int. 2011, 108, 469–474. [Google Scholar] [CrossRef] [PubMed]
  20. Park, B.; Choi, J. Identifying the Source of Generation for Large Language Models. arXiv 2024, arXiv:2407.12846. [Google Scholar]
  21. Wang, D.; Yang, Q.; Abdul, A.; Lim, B.Y. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems—CHI’19 2019, Glasgow, UK, 4–9 May 2019. [Google Scholar] [CrossRef]
  22. Lomas, A.; Broom, M.A. Large Language Models for Overcoming Language Barriers in Obstetric Anaesthesia: A Structured Assessment. Int. J. Obstet. Anesth. 2024, 60, 104249. [Google Scholar] [CrossRef] [PubMed]
  23. Menick, J.; Trebacz, M.; Mikulik, V.; Aslanides, J.; Song, F.; Chadwick, M.; Glaese, M.; Young, S.; Campbell-Gillingham, L.; Irving, G.; et al. Teaching language models to support answers with verified quotes. arXiv 2024, arXiv:2203.11147. [Google Scholar]
  24. Limitations of LLMs: Bias, Hallucinations, and More. Available online: https://learnprompting.org/docs/basics/pitfalls#5-bias (accessed on 23 November 2024).
  25. Artificial Intelligence in Primary Care. RACGP. Available online: https://www.racgp.org.au/advocacy/position-statements/view-all-position-statements/clinical-and-practice-management/artificial-intelligence-in-primary-care (accessed on 5 January 2025).
  26. Tait, A.R.; Malviya, S.; Voepel-Lewis, T. Obtaining Informed Consent. Anesthesiology 2002, 96, 1278. [Google Scholar] [CrossRef]
  27. Rosique, I.; Pérez-Cárceles, M.D.; Romero-Martin, M.; Osuna, E.; Luna, A. The Use and Usefulness of Information for Patients Undergoing Anaesthesia. Med. L. 2006, 25, 715–727. [Google Scholar]
  28. Kianian, R.; Sun, D.; Rojas-Carabali, W.; Agrawal, R.; Tsui, E. Large Language Models May Help Patients Understand Peer-Reviewed Scientific Articles About Ophthalmology: Development and Usability Study. J. Med. Internet Res. 2024, 26, e59843. [Google Scholar] [CrossRef] [PubMed]
  29. Jlala, H. Anesthesiologists’ Perception of Patients’ Anxiety under Regional Anesthesia. Local Reg. Anesth. 2010, 3, 65–71. [Google Scholar] [CrossRef] [PubMed]
  30. Celik, F.; Edipoglu, I.S. Evaluation of Preoperative Anxiety and Fear of Anesthesia Using APAIS Score. Eur. J. Med. Res. 2018, 23, 41. [Google Scholar] [CrossRef] [PubMed]
  31. Aydin, S.; Karabacak, M.; Vlachos, V.; Margetis, K. Large Language Models in Patient Education: A Scoping Review of Applications in Medicine. Front. Med. 2024, 11, 1477898. [Google Scholar] [CrossRef] [PubMed]
  32. Busch, F.; Hoffmann, L.; Rueger, C.; van Dijk, E.H.; Kader, R.; Ortiz-Prado, E.; Makowski, M.R.; Saba, L.; Hadamitzky, M.; Kather, J.N.; et al. Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges. medRxiv (Cold Spring Harb. Lab.) 2024. [Google Scholar] [CrossRef]
  33. Ravi, A.; Neinstein, A.; Murray, S.G. Large Language Models and Medical Education: Preparing for a Rapid Transformation in How Trainees Will Learn to Be Doctors. ATS Sch. 2023, 4, 282–292. [Google Scholar] [CrossRef]
  34. Lim, B.; Seth, I.; Bulloch, G.; Xie, Y.; Hunter-Smith, D.J.; Rozen, W.M. Evaluating the Efficacy of Major Language Models in Providing Guidance for Hand Trauma Nerve Laceration Patients: A Case Study on Google’s AI BARD, Bing AI, and ChatGPT. Plast. Aesthetic Res. 2023, 10, 43. [Google Scholar] [CrossRef]
  35. Mesko, B.; Gorog, M. A short guide for medical professionals in the era of artificial intelligence. NPJ Digit. Med. 2020, 3, 126. [Google Scholar] [CrossRef]
  36. Seo, J.B. The Role of Doctors in the Age of Artificial Intelligence. J. Korean Med. Assoc. 2019, 62, 136–139. [Google Scholar] [CrossRef]
  37. Liu, X.; Keane, P.A.; Denniston, A.K. Time to regenerate: The doctor in the age of artificial intelligence. J. R. Soc. Med. 2018, 111, 113–116. [Google Scholar] [CrossRef] [PubMed]
Table 1. Prompts input into each LLM.
Table 1. Prompts input into each LLM.
Prompt NumberPrompt
1Why should I fast before anesthesia?
2How long before my anesthesia should I give up smoking?
3How many standards of alcohol per day is acceptable before my anesthesia?
4What medications should I stop taking before my anesthesia? And how long before my anesthesia should I stop taking them?
5I am an insulin-dependent diabetic, should I still take my insulin before anesthesia?
6I have atrial fibrillation, is it safe for me to have anesthesia?
7How is the anesthetic administered?
8Will I have pain after the procedure with anesthesia?
9How long will it take for the anesthesia to wear off?
10What is the difference between anesthesia and sedation?
11Is there anything I should avoid after having anesthesia?
12What are the risks of anesthesia?
13Will anesthesia stop me from breathing?
14What if I wake up in the middle of the anesthesia?
15How will anesthesia affect my pregnancy?
Table 2. Readability and reliability Scores.
Table 2. Readability and reliability Scores.
LLMPrompt No.FKGLFRESColeman–Liau IndexDISCERN Score
ChatGPT-4o110.736.916.945
212.930.117.747
315.315.419.357
48.943.115.539
512.533.716.740
614.916.319.343
79.644.412.635
812.040.514.942
97.759.411.538
109.548.715.544
119.641.315.936
1214.225.418.138
1314.216.319.547
1411.838.115.946
1510.639.015.538
Mean ± St Dev11.6 ± 2.435.2 ± 12.616.3 ± 2.342.6 ± 5.5
Claude 3112.43616.656
225.82.116.754
318.413.117.146
413.530.116.851
519.220.715.653
619.13.920.248
719.46.018.142
814.431.216.348
915.425.315.346
1019.50.021.044
1114.922.019.041
1218.60.023.744
1313.331.416.647
1412.326.917.255
1511.831.117.344
Mean ± St Dev16.5 ± 3.918.4 ±13.617.8 ± 2.347.9 ± 4.7
Gemini111.837.416.755
211.445.815.450
314.321.318.553
412.332.217.653
516.519.817.851
612.231.916.958
710.244.813.747
814.031.016.744
913.234.115.249
1011.839.715.550
1111.340.615.552
1216.59.420.752
1311.141.215.849
1416.318.417.953
1515.021.618.347
Mean ± St Dev13.2 ± 2.131.3 ± 10.916.8 ± 1.750.9 ± 3.4
Table 3. Likert Scale of the LLMs’ responses.
Table 3. Likert Scale of the LLMs’ responses.
Likert Scale
CriteriaChatGPT-4oClaude 3Gemini
Clarity355
Comprehension354
Readability533
Patient friendliness555
Informativeness354
Total192321
Table 4. T-test of LLMs’ scores against each other.
Table 4. T-test of LLMs’ scores against each other.
ChatGPT-4oClaude 3Gemini
ChatGPT-4o-−0.42, p-values = 0.670.016, p-value = 0.99
Claude 30.42, p-values = 0.67-0.99, p-value = 0.33
Gemini−0.016, p-value = 0.99−0.99, p-value = 0.33-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ratnagandhi, J.A.; Godavarthy, P.; Gnaneswaran, M.; Lim, B.; Vittalraj, R. Enhancing Anesthetic Patient Education Through the Utilization of Large Language Models for Improved Communication and Understanding. Anesth. Res. 2025, 2, 4. https://doi.org/10.3390/anesthres2010004

AMA Style

Ratnagandhi JA, Godavarthy P, Gnaneswaran M, Lim B, Vittalraj R. Enhancing Anesthetic Patient Education Through the Utilization of Large Language Models for Improved Communication and Understanding. Anesthesia Research. 2025; 2(1):4. https://doi.org/10.3390/anesthres2010004

Chicago/Turabian Style

Ratnagandhi, Jeevan Avinassh, Praghya Godavarthy, Mahindra Gnaneswaran, Bryan Lim, and Rupeshraj Vittalraj. 2025. "Enhancing Anesthetic Patient Education Through the Utilization of Large Language Models for Improved Communication and Understanding" Anesthesia Research 2, no. 1: 4. https://doi.org/10.3390/anesthres2010004

APA Style

Ratnagandhi, J. A., Godavarthy, P., Gnaneswaran, M., Lim, B., & Vittalraj, R. (2025). Enhancing Anesthetic Patient Education Through the Utilization of Large Language Models for Improved Communication and Understanding. Anesthesia Research, 2(1), 4. https://doi.org/10.3390/anesthres2010004

Article Metrics

Back to TopTop