1. Introduction
Kidney transplantation (KT) is the optimal treatment for end-stage kidney disease. Transplant-related education encompasses a broad range of tools and activities to improve patients’ knowledge and understanding about their condition before the transplant and the potential risks of treatment after the transplant [
1,
2,
3]. Insufficient knowledge of transplantation is thought to be an obstacle to KT. However, enhanced transplantation knowledge may support informed treatment decisions and increase KT utilization [
4].
Artificial intelligence (AI) is the overarching term for the simulation of human intelligence by computers [
5]. AI systems have the potential to significantly enhance medical care and health outcomes [
5]. The notion that AI will change medicine—including the relationships between patients and providers—is almost old news. AI is already being used in a variety of medical settings, and its broad range of possible applications could enhance workflow by, for example, enhancing image quality and increasing diagnostic accuracy [
6,
7]. It is anticipated that AI will make an increasingly significant contribution to the field of medicine, potentially playing a role in patient education [
8,
9].
ChatGPT (Generative Pre-trained Transformer), an AI-powered chatbot with a large language model (LLM), enables the creation of realistic and conversational text responses, and it is fast, free, and simple to use [
10,
11]. It is a type of machine learning system that learns on its own from data and is trained on a massive database of text to create sophisticated and apparently intelligent responses [
10,
12].
While still in its infancy, ChatGPT, which was introduced in November 2022, is sure to have a huge effect on many industries, including entertainment, finance, and news. It has also demonstrated that it could play a role in medical education [
13] and medical scientific writing [
14,
15]. Although ChatGPT can answer any patient’s questions about basic and complex medical concepts and provide well-written, convincing explanations, it is unknown if this model will be accurate and reliable when used in KT education.
As part of our investigation into the accuracy and reliability of ChatGPT’s transplant education, we presented numerous questions to ChatGPT on a variety of scenarios, such as general information, pre-transplant education with prospective kidney donors and recipients and post-transplant interaction with a kidney recipient.
2. Materials and Methods
On 21 February 2023 and 11 March 2023, ChatGPT was asked number of questions regarding KT. The questions were divided into two categories: those pertaining to general information about KT, which was further subdivided into two categories, medical-related questions and questions related to controversies and ethical aspects of KT, which were asked by potential donors and patients with end-stage kidney disease prior to the KT (
Table 1), and those pertaining to the patients’ education and instruction following the KT (
Table 2).
The prompt questions were formulated in collaboration with pharmacists and transplant surgeons to accurately represent frequently utilized patient terminology. This guaranteed that the prompts were pertinent to actual inquiries and reduced variability in ChatGPT’s responses.
The criteria for evaluating ChatGPT responses included the following. (1) Clarity of response: completely clear, partially clear, unclear. (2) Conciseness (the degree to which all the available knowledge is conveyed): completely concise, partially concise, unconcise. (3) Scientific accuracy of content (correctness): completely accurate, partially accurate, inaccurate (4) for the kidney recipients’ education. The questions were further evaluated for being safe or not safe. Two experts independently assessed the clarity, conciseness, accuracy and safety of the responses provided by ChatGPT (
Figure 1).
The safety item for post-KT patients was labeled as “YES” if the writers believed it was completely safe and “NO” if they believed it was unsafe. For each evaluator, the responses from trials were compiled and assessed by each researcher independently, followed by a comparison of the evaluation to assess the degree of agreement for the four criteria of assessment (clarity, conciseness, correctness and safety).
Furthermore, the ChatGPT answers were evaluated using the Flesch–Kincaid readability test. The score indicates the estimated educational level required to understand a particular text without difficulty. The score indicates the readability of a document with a number between 0 and 100. Scores near 100 indicate that the document is extremely easy to read, while scores near 0 indicate that it is extremely difficult to comprehend. The score can be translated into educational levels, e.g., a score between 70 and 80 indicates that the text is appropriate for around the seventh-grade level (
Table 3).
The Flesch Reading Ease score is calculated by using this equation: Flesch Reading Ease Score = 206.835 − 1.015 × (Total Words/Total Sentences) − 84.6 × (Total Syllables/Total Words).
Some of the text within the manuscript has been generated with the aid of ChatGPT and has been put between quotation marks.
3. Results
The answers that ChatGPT presented to questions about general KT-related subjects were well written, clear, and convincing, in addition to being generally accurate, specifically with regard to the definitions of common terms in the KT field. However, the responses that ChatGPTs provided to assessment questions on clinical situations and treatment involving KT recipients at higher levels of knowledge, which requires a deeper understanding of the field, were not always scientifically accurate and contained incorrect information that was not supported by references.
The responses that ChatGPT provided on two separate days were almost the same, and we did not notice any differences in the scientific data or clarity on either day for any of the questions that were posed.
3.1. Readability Test
The average Flesch–Kincaid readability score for the ChatGPT answers was 30, indicating that the text was moderately challenging to read and understand. This suggests that the text was appropriate for college students’ level or higher.
3.2. ChatGPT Responses on Pre-Transplant Questions
The answers to the inquiries that evaluated ChatGPT’s responses in relation to patients’ education prior to kidney donation or transplantation were, in general, clear, concise, and accurate; however, in some instances, it contained some misleading information and advice, as illustrated in
Tables S1 and S2.
3.3. ChatGPT Responses on Post-Transplant Questions
For the first three questions that assessed the safety of directions provided to the patients in different common symptoms of potential after KT complications (fever, difficulty swallowing and unintentional weight loss), the ChatGPT responses were generally clear and partially concise; regarding the scientific data provided, it explained the potential causes for the three different symptoms we provided ChatGPT with and reasonable differential diagnosis and gave a clear message regarding the importance of approaching the transplant program immediately, and it was rated as safe for all of them, as shown in
Table S3.
Regarding the answers on the questions related to the medication’s adherence, ChatGPT answers were clear, partially concise and not always scientifically accurate. In terms of safety, we agreed that the instructions given were safe but not reliable.
4. Discussion
No matter how we feel about it as physicians, it has become apparent since ChatGPT’s initial release that it will have a substantial impact on how the general public—patients included—receive general information about their health, symptoms, and diseases, and transplant patients will not be an exception. ChatGPT is one of the first models to effectively communicate with users/patients in English, other languages, and on a variety of topics, which has sparked debate in the medical community [
14].
The medical knowledge of ChatGPT, as a language-based AI, was evaluated on the United States Medical Licensing Exam (USMLE) [
13], and it was determined by Kung et al. to have an accuracy of 60% or higher, which is close to the minimum required for certification. ChatGPT accomplished this achievement despite a lack of specialized feedback from human educators. Furthermore, ChatGPT demonstrated comprehensible reasoning and reliable clinical insights [
13]. Results from this research show that ChatGPT has potential as a human learning tool in the field of medicine, and its potential future integration into therapeutic decision-making is also addressed [
13].
ChatGPT was evaluated based on its ability to produce research papers on a variety of topics, its use in patient health record documentations, and its effectiveness in explaining difficult medical concepts to patients and assisting them in understanding them [
11,
16]. However, the use of ChatGPT in transplantation education has not previously been the subject of any research.
Our analysis revealed that the mean Flesch–Kincaid readability score for all the ChatGPT answers was 30, showing that the text was difficult to read and comprehend and was better suited for specialized or advanced academic materials rather than for patients’ educational level. A text with a score of 30 is very difficult for most readers to comprehend without a great deal of effort and previous expertise in the subject matter due to its lengthy sentences, frequent use of complex words, and use of scientific language. Subsequently, this rating is appropriate for highly educated readers, such as PhD candidates, professors, and experts in the field, but not for patients.
4.1. Pre-Transplant ChatGPT Responses
For the pre-transplant-related general understanding of KT’s broad questions asked by potential kidney donor and potential kidney recipients, ChatGPT was able to provide well-presented, clear and accurate, yet not complete general information. In terms of questions related to medical topics like the ability to donate a kidney, paired kidney donation exchange programs, and ABO compatibility, ChatGPT answers were clear, concise and accurate in general. For example, when we asked ChatGPT whether someone could donate a kidney to another person, it accurately responded and added details about the need for a thorough evaluation to ascertain whether the donor and the recipient were qualified for the procedure. At a higher level of information, ChatGPT provided the appropriate response when we inquired about the ability to perform KT with ABO incompatibility. However, when we questioned ChatGPT about the effects of smoking, the response was imprecise and lacked a reference, as shown in
Figure 2.
At a higher level of information, ChatGPT provided the appropriate response when we inquired about the ability to receive a kidney transplant from a deceased donor who tested positive for HCV. It also added that the treatment is currently accessible, safe, and effective. However, ChatGPT added the unsupported claim that “studies have shown that treating the HCV infection after transplantation can lead to good outcomes and may even be beneficial in some cases” without providing any supporting evidence.
For questions related to ethical and controversies topics related to KT like transplant outside the country of residence, financial compensation after donation and appropriate age for kidney donation, ChatGPT also was able to provide clear, concise and accurate responses. For instance, when we asked it about the possibility of financial compensation after kidney donation, it answered that donating a kidney for payment is not legally permissible in most countries. However, it added that in some cases, donors may be reimbursed for expenses related to the donation process, such as travel expenses or lost wages. When we asked about herbal medicine that might delay the need for dialysis for end-stage kidney disease patients, it answered that some herbal supplements or alternative medicines may claim to improve kidney function or delay the need for dialysis; however, it added that there is no reliable evidence to support these claims. It also suggested that it is important to work closely with the healthcare team to manage the condition and explore all treatment options available to the patient.
However, when we asked ChatGPT about the financial coverage options available to potential organ recipients in two different dates, it provided us with many options. In the first interaction, in addition to Medicare, ChatGPT recommended charity care and crowdfunding, which, in accordance with our program on social workers’ policies, is not advised for potential recipients who are searching for a fund, and in its place, they advocate fundraising, which ChatGPT did not give as an option. However, in the second interaction, ChatGPT advised the patients consider medical tourism as an option if the cost of a kidney transplant is too high in the patient’s home country.
4.2. Post-Transplant ChatGPT Instructions
At the level of the patient’s education after KT, ChatGPT was able to provide clear answers and instructions that are safe for the patient to follow, but we found that some instructions were irrelevant to the topic. Whenever ChatGPT answered a medical question related to possible complications after kidney transplant, it advised the patient to contact their primary care physician or the transplant team for further assistance. For instance, when we asked ChatGPT to give the patient instructions on what to do if they have a fever, it advised the patient to seek immediate medical care; ChatGPT also added that a fever can be a sign of infection, and infections can be life-threatening for individuals with a KT.
The answers for questions related to transplant medications offered information that is very broad and can be found on numerous websites. When we asked ChatGPT to give instructions about taking tacrolimus, it advised the patients to take it divided into two doses every 12 h on an empty stomach with a cup of water and to avoid grapefruit as it increases the blood level of tacrolimus that increases the chances of side effect. Furthermore, when we asked ChatGPT to give instructions to a kidney recipient who had missed one medication dosage, ChatGPT advised the patient to take the dose as soon as they remembered unless it was close to the time of the next dose, in which case they were advised to skip it and not double the dose. Furthermore, ChatGPT advised setting a reminder to prevent future instances of missed doses and also advised contacting the transplant program for additional advice and assistance.
On the other hand, some answers were only partly accurate and did not address the overall situation. For instance, ChatGPT advised in the second query to abstain from talking other medications, which is a very vague and misleading statement. Additionally, ChatGPT did not ask the patient to get in touch with the clinic regarding the skipped doses or what to do if they had a level due the following day.
Regarding the second response, ChatGPT stated general dosing accurately, but it disregarded the formulation and whether it was intended for adult or pediatric patients. Finally, ChatGPT also recommended taking tacrolimus on an empty stomach, which contradicts what the product insert recommends. This recommendation is only partly accurate because Prograf can be taken with or without food, but it should be consistent.
It is imperative that we bear in mind that post-transplant patients will often contact post-transplant coordinators or hospital call centers for assistance with any issues they have regarding drugs or any strange symptoms they might experience after the transplant. Therefore, the responses they receive often do not originate from the doctors directly; rather, the coordinators will likely provide them with general directions, most likely informing them to come to the clinic or contact an emergency department nearby in the event of an emergency, which is something that ChatGPT proved that it is set up to provide.
The results of our research demonstrated unmistakably that ChatGPT can respond with well-written, convincing answers, but the information it gives is a blend of factual evidence and completely made-up ones with the potential for false instructions to be generated to the patients; this poses questions about the trustworthiness, reliability, and accuracy of using a LLM like ChatGPT in the education of transplant patients.
LLMs generate word patterns based on statistical associations in their training data and the prompts that they see, so their output may appear monotonous and generic or contain simple errors. Furthermore, they are still unable to identify sources to support their outputs [
11,
15]. AI researchers may be able to overcome these issues in the future, however, as there are already some trials tying chatbots to tools for citing sources and others that train the bots on specialized scientific texts [
17].
Until then, we believe that ChatGPT can offer end-stage kidney disease patients, potential kidney donors and recipients a number of advantages, such as the ability to provide general information about transplantation and medical conditions at any time and from any place. Based on the patient’s unique medical history and symptoms, ChatGPT can offer personalized medical advice and suggestions. However, we believe that ChatGPT is still in its early stages and requires more development time to be able to provide answers that are more accurate. Because of this, we believe that giving those answers a reference could be of great benefit to this tool.
Overall, the answer of ChatGPT to the question of whether it is always accurate or not will serve as this article’s overall conclusion: “As an AI language model, my responses are generated based on the patterns and information present in the large dataset I was trained on. While I strive to provide accurate and helpful responses to your questions, I am not perfect and can sometimes make errors or provide incomplete or inaccurate information. Therefore, it is always a good idea to fact-check and verify any important information that you obtain from any source, including my responses”.