Next Article in Journal
Use of Oral Anticoagulants in Patients with Atrial Fibrillation: Preliminary Data from the Italian Atrial Fibrillation (ITALY-AF) Registry
Next Article in Special Issue
Predicting Chronic Hyperplastic Candidiasis Retro-Angular Mucosa Using Machine Learning
Previous Article in Journal
Two Pregnancy Care Models in Poland—A Descriptive–Comparative Study
 
 
Article
Peer-Review Record

AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat

Clin. Pract. 2023, 13(5), 1160-1172; https://doi.org/10.3390/clinpract13050104
by Ahmad Qarajeh 1,2, Supawit Tangpanithandee 1,3, Charat Thongprayoon 1, Supawadee Suppadungsuk 1,3, Pajaree Krisanapan 1,4, Noppawit Aiumtrakul 5, Oscar A. Garcia Valencia 1, Jing Miao 1, Fawad Qureshi 1 and Wisit Cheungpasitporn 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Clin. Pract. 2023, 13(5), 1160-1172; https://doi.org/10.3390/clinpract13050104
Submission received: 27 August 2023 / Revised: 15 September 2023 / Accepted: 25 September 2023 / Published: 26 September 2023

Round 1

Reviewer 1 Report

This is an interesting study about the use of different AI models to improve the dietetical support in patients with CKD. This is a relevant topic because of the high and increasing prevalence of CKD world-wide, and the need for support other than medical in different aspects of the disease in these patients. Nutritional support is one of the most important and the use of AI could be a very helpful tool in the future.
The main question of the research is to evaluate the efficacy of 4 AI models in discerning potassium and phosphorus content in foods. It is correctly answered and discussed, and athough results do have to be validated in big cohorts of patients, it could be a very helpful tool in the future for these patients.
I find the authors discuss correctly the limitations of their study: being a controlled experimental environment it is difficult to generalize de use if these tools.
Perhaps I would suggest putting tables 1 and 2 as supplementary material due to their length.

 

Author Response

Reviewer 1

This is an interesting study about the use of different AI models to improve the dietetical support in patients with CKD. This is a relevant topic because of the high and increasing prevalence of CKD world-wide, and the need for support other than medical in different aspects of the disease in these patients. Nutritional support is one of the most important and the use of AI could be a very helpful tool in the future.

Response: We appreciate your interest in our study and agree that AI can play a significant role in improving dietary support for CKD patients.


The main question of the research is to evaluate the efficacy of 4 AI models in discerning potassium and phosphorus content in foods. It is correctly answered and discussed, and athough results do have to be validated in big cohorts of patients, it could be a very helpful tool in the future for these patients.

Response: The primary question of our research was to evaluate the efficacy of AI models in discerning potassium and phosphorus content in foods, and we are pleased that you found our answer to be correct and discussed. We acknowledge the need for validation in larger patient cohorts, and this is indeed an important avenue for future research.


I find the authors discuss correctly the limitations of their study: being a controlled experimental environment it is difficult to generalize de use if these tools.


Perhaps I would suggest putting tables 1 and 2 as supplementary material due to their length.

Response: Your suggestion to place Tables 1 and 2 as supplementary material due to their length is noted. We have revised this in the revised version of the manuscript as suggested.

We also created new Figure 1 Flowchart overview of the study.

 

Figure 1. Flowchart overview of the study.

 

 

We value your detailed, respectful, and constructive feedback, which will contribute to the improvement of our study. Thank you for your thoughtful review.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Qarajeh et. al assessed the performance of 4 different generative AI models to assess renal diet (focused on K and Pi), benchmarked against Mayo Clinic Renal Diet book.

 

Introductions:

Authors should provide more information on generative AI and the difference between those 4 models assessed in their paper.

 

Method:

The authors claimed that Mayo Clinic’s renal diet compendium is reputable. They should provide additional information including dietitian assessment or relevant references to support their statement.

Dates of analysis should be provided as these AI models are constantly being updated, therefore results may vary.

Prompt used to generate the response from these AI models should be provided for comprehensive review.

 

Results

The authors did not report the concordance between 2 separate GPT4 sessions.

Tables 1/2 are too large for main results, suggest a different more concise way to present the data and move extra information to supplement.

 

Discussions

Not all CKD patients need to do low K or low Pi diet. Many of vegetables and fruits may contain high amount of potassium but offer additional health benefits. Authors should provide more nuanced discussions.

needs some improvement

Author Response

Reviewer 2

Qarajeh et. al assessed the performance of 4 different generative AI models to assess renal diet (focused on K and Pi), benchmarked against Mayo Clinic Renal Diet book.

Comment:

Introductions:

Authors should provide more information on generative AI and the difference between those 4 models assessed in their paper.

Response: Thank you for taking the time to review our study and for your insightful feedback. We recognize the importance of providing a more detailed overview of generative AI and elucidating the differences between the four models we assessed. We value the importance of clarity and comprehensive information for our readers.

In light of your comments, we have expanded our introduction to encompass a more in-depth explanation of generative AI and the distinctive features of ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat.

Generative AI models have gained prominence in recent years due to their ability to generate new content by learning patterns and structures from vast amounts of data. These models are designed to understand context, predict subsequent sequences, and produce information that is coherent and contextually relevant. Such attributes make them potential tools in diverse applications, including healthcare.

ChatGPT 3.5 and ChatGPT 4 are both products of OpenAI, with the latter being an advanced version of the former. ChatGPT 4 boasts improved performance, finer-tuned algorithms, and an enhanced ability to handle complex tasks over its predecessor, ChatGPT 3.5. Both models are designed for a myriad of tasks, from straightforward information retrieval to complex problem solving.

Bard AI, on the other hand, emphasizes its capabilities in comprehending and generating narratives. It's tailored to grasp context over long passages and produce coherent extended responses, which could be invaluable when dealing with patient histories or explaining intricate dietary requirements.

Bing Chat, developed by Microsoft, has been optimized for web-based interactions and tends to generate concise and direct responses. This model, with its swift processing capabilities, could be particularly beneficial in scenarios where rapid information retrieval is essential.

The rationale behind evaluating these specific models was due to their prominence in the AI community and their potential applicability to the healthcare sector. While all these models operate on the foundation of generative AI, they each have unique strengths, algorithms, and operational nuances that we believed would bring varied perspectives and capabilities to the intricate task of renal diet assessment.

We hope this revised introduction addresses your concerns and provides the reader with a clearer understanding of generative AI models and their distinctions. Once again, we appreciate your invaluable feedback and remain open to any further suggestions or queries.

Comment:

Method:

The authors claimed that Mayo Clinic’s renal diet compendium is reputable. They should provide additional information including dietitian assessment or relevant references to support their statement.

Dates of analysis should be provided as these AI models are constantly being updated, therefore results may vary.

Response: Thank you for your feedback and recommendations on our manuscript. We concur with your suggestions and understand the importance of solidifying our claims with relevant references and offering transparency regarding the dates of our analysis. Given the evolving nature of AI models, the dates are indeed pertinent to the results obtained.

Following your suggestions, we have made revisions to the 'Materials and Methods' section to address these points.

2.3. Evaluation of AI Model Performance

Each of the chosen AI models was tasked with the duty of categorizing the curated dietary items based on their potassium and phosphorus content. This categorization procedure involved classifying the items into distinct categories: those possessing high or low potassium content, as well as those with high phosphorus content. Each AI model was tasked with categorizing the selected dietary items based on their potassium and phosphorus content on the following dates:

ChatGPT 3.5: First session on 10th of April and second session on 24th of April, 2023

ChatGPT 4: 10th of August, 2023

Bard AI: 15th of August, 2023

Bing Chat: 13th of August, 2023

 

Comment:

Prompt used to generate the response from these AI models should be provided for comprehensive review.

Response: We appreciate your feedback and understand the necessity of providing clarity regarding the prompts utilized to obtain results from the AI models. The inclusion of the exact prompts will indeed offer more transparency and will contribute to a comprehensive understanding of our methodology.

In response to your comment, we are amending the manuscript to incorporate the specific prompts we employed during our investigation.

2.3. Evaluation of AI Model Performance

Each of the chosen AI models was tasked with the responsibility of categorizing the curated dietary items based on their potassium and phosphorus content. This categorization procedure involved classifying the items into distinct categories: those possessing high or low potassium content, as well as those with high phosphorus content. For the purpose of generating responses from the AI models, the following prompts were utilized:

  1. Is ___ considered a low or high potassium/phosphorus diet?
  2. Classify the following as low or high potassium/phosphorus diet: ___.

Furthermore, each AI model was tasked with categorizing the selected dietary items based on their potassium and phosphorus content on the respective dates:

ChatGPT 3.5: First session on 10th of April and second session on 24th of April, 2023

ChatGPT 4: 10th of August, 2023

Bard AI: 15th of August, 2023

Bing Chat: 13th of August, 2023”

We trust these revisions elucidate the procedure more vividly and respond to the concerns raised. We highly value your expert review and feedback and are open to further clarifications or suggestions.

 

Comment:

Results

The authors did not report the concordance between 2 separate GPT4 sessions.

Response: Thank you for bringing to our attention the omission of the concordance between the two separate GPT-4 sessions. We value your thorough review and understand the significance of including such data for the comprehensiveness of our manuscript.

In response to your comment, the absence of the concordance data between the two GPT-4 sessions was due to the constraints of the current subscription model we have for GPT-4. Specifically, we are restricted to a maximum of 50 queries within a 4-hour window. Given this limitation, it was challenging to assess the model broadly within our dataset in a consistent manner. We recognize the importance of such an assessment and had intended to provide a comprehensive evaluation, but these restrictions precluded us from doing so. We hope this clarification addresses your concern, and we are grateful for your understanding regarding the limitations we encountered during our study. We remain open to any further feedback or inquiries.

Comment:

Tables 1/2 are too large for main results, suggest a different more concise way to present the data and move extra information to supplement.

Response: Your suggestion to place Tables 1 and 2 as supplementary material due to their length is noted. We have revised this in the revised version of the manuscript as suggested.

 

Comment:

Discussions

Not all CKD patients need to do low K or low Pi diet. Many of vegetables and fruits may contain high amount of potassium but offer additional health benefits. Authors should provide more nuanced discussions.

Response: Thank you for your valuable input on the Discussion section of our manuscript. We concur with your observation that not all CKD patients are prescribed a low potassium (K) or low phosphorus (Pi) diet, and many foods high in these nutrients have additional health benefits that should not be overlooked. We understand the importance of presenting a balanced view. We agree and we additionally made the necessary revisions to incorporate these nuances as suggested.

It is crucial to emphasize that not all CKD patients necessitate a low potassium or low phosphorus diet. While our study highlights the importance of accurately categorizing nutrient content, it is essential to understand that dietary prescriptions for CKD patients are multifaceted and highly individualized. Various factors, including the stage of CKD, comorbid conditions, and individual patient needs, influence dietary recommendations [32,33]. Furthermore, numerous vegetables and fruits, while containing significant amounts of potassium, are packed with other essential nutrients and health benefits. Antioxidants, fibers, and other phytochemicals present in these foods play a crucial role in overall health [33]. Therefore, it's essential to strike a balance between restricting certain nutrients and ensuring the intake of other beneficial components. This nuance is vital for a comprehensive understanding of dietary recommendations in CKD and other medical conditions.”

We value your detailed, respectful, and constructive feedback, which will contribute to the improvement of our study. Thank you for your thoughtful review.

 

 

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

AI-Powered Renal Diet Support: Performance of ChatGPT, 2 Bard AI, and Bing Chat

 

In this manuscript the authors explored the capabilities of four well-known artificial intelligence models: ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat, revealing insights into their potential to enhance medical nutrition therapy.

 

-In the introduction section describe why these two elements are relevant for the kidneys.

 

-The authors should indicate the source of the datasets used and what criteria used to define low or high level of these elements (potassium and  phosphorus). Explain the cut-off used.

 

-Authors must explain if the datasets used are validated.

 

-Describe the strategy used by the algorithms to check if they respected the values of the datasets used.

 

-Two-week interval between each instance to mitigate the likelihood of fortuitous results is low. Why did the authors performed in this way? Which criteria used ?

Moderate editing of English language required

Author Response

Reviewer 3

AI-Powered Renal Diet Support: Performance of ChatGPT, 2 Bard AI, and Bing Chat

In this manuscript the authors explored the capabilities of four well-known artificial intelligence models: ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat, revealing insights into their potential to enhance medical nutrition therapy.

Comment

-In the introduction section describe why these two elements are relevant for the kidneys.

Response: Thank you for taking the time to review our manuscript titled, "AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat." We value and appreciate your constructive feedback and are committed to addressing the comments raised.

Specifically, regarding your comment on the introduction section where you pointed out the need to clarify the relevance of potassium and phosphorus for the kidneys, we concur that providing this context will enhance readers' comprehension and underscore the significance of our study's focus. We agree and thus additionally revised our introduction as suggested.

“Chronic kidney disease (CKD), a medical condition characterized by the gradual decline in kidney function over time, presents a range of challenges for both patients and healthcare providers[1]. According to the World Health Organization (WHO) and multiple epidemiological studies, CKD affects approximately 10-13% of the population worldwide[2]. With hundreds of millions grappling with the condition, CKD has become a silent pandemic[3]. An alarming aspect of CKD is its association with disorders of potassium and phosphorus metabolism. The kidneys play an indispensable role in maintaining the balance of these minerals. Specifically, they filter out excess potassium and phosphorus from the blood, excreting them in the urine. When kidney function is compromised, as seen in CKD, their ability to maintain this balance diminishes, leading to harmful elevations in the levels of these minerals in the bloodstream, known as hyperkalemia and hyperphosphatemia [5]. These conditions can have severe cardiovascular and musculoskeletal implications, necessitating the need for vigilant dietary monitoring and interventions.”

With this amendment, we hope to have adequately addressed the relevance of potassium and phosphorus in the context of renal function and the overarching importance of monitoring their levels in CKD patients. We believe that this enhancement to the introduction provides the necessary foundation for our subsequent discussions on the application of AI models in renal dietary planning.

 

Comment

 -The authors should indicate the source of the datasets used and what criteria used to define low or high level of these elements (potassium and  phosphorus). Explain the cut-off used.

Response: Thank you for taking the time to review our manuscript and for bringing to our attention the necessity to specify the datasets' source and the criteria used for categorizing potassium and phosphorus content. We agree with the reviewer and thus additionally revised our method of manuscript including cut-off used and references as suggested.

2.5. Comparative Analysis with Established References

The outcomes yielded by the AI models were subsequently juxtaposed with the dietary recommendations furnished in the Mayo Clinic Renal Diet book. This comparison enabled the scrutiny of the accuracy and correspondence of the AI-generated classifications with the well-regarded standards advocated by an authoritative source.

             To determine the cut-offs for low vs high potassium and phosphorus content in dietary items, we relied on standard guidelines and inputs from dietitians at the Mayo Clinic, along with established recommendations from several renowned organizations such as National Kidney Foundation [19,20].

For potassium, the criteria are as follows:

The Academy of Nutrition and Dietetics suggests a limitation of potassium to 2-3 grams per day for patients on dialysis or with end-stage renal disease. Translated, this is equivalent to 2000-3000 mg of potassium daily. Given standard serving sizes, this guideline aligns with the recommendations to restrict high potassium foods (200-400 mg per serving) to 1-2 servings daily [21].

The National Kidney Foundation proposes an intake of 1500-2700 mg of potassium daily for patients with varying severities of chronic kidney disease. The lower end of this spectrum again emphasizes restricting high potassium foods to 1-2 servings every day [19].

The FDA stipulates that foods comprising more than 200 mg of potassium per serving are viewed as high in potassium [20].

For phosphorus, the criteria are:

High phosphorus foods exceed 300 mg of phosphorus per serving or surpass 30% Daily Value (DV)  for phosphorus. DV for phosphorus stands at 1250 mg per day. Consequently, 10% DV corresponds to 125 mg phosphorus per serving. It is pertinent to note that phosphorus levels can vary extensively depending on the food type, brand, and preparation method. Thus, inspecting ingredient labels for phosphorus additives is also advocated [22].

 

Comment

 -Authors must explain if the datasets used are validated.

Response: Thank you for highlighting the crucial point regarding the validation of the datasets used in our study. Your inquiry on this matter is of utmost significance as it ensures the reliability and authenticity of our study's foundation.

While the Mayo Clinic Renal Diet book serves as a trusted guide and has been heavily referenced in our study, we acknowledge that it has not undergone formal validation. However, it is essential to emphasize that our methodology is not solely grounded on this single source. We have judiciously followed and referenced established recommendations from several esteemed organizations, most notably the National Kidney Foundation. These organizations have been long-standing pillars in the realm of nephrology and renal dietary guidelines, and their recommendations have been widely accepted and applied in clinical practice. To ensure that this matter is lucidly communicated to our readers, we have incorporated revisions in the method section of our manuscript as suggested.

 

Comment

 -Describe the strategy used by the algorithms to check if they respected the values of the datasets used.

Response: Thank you for pointing out the need to elucidate the strategy employed by the AI algorithms in respecting the values from our datasets. We recognize the importance of this aspect, as it is crucial to understand the underpinnings of the AI models' decision-making processes to ensure accuracy and reliability.

In response to your query:

The AI models, including ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat, fundamentally operate using large-scale transformer architectures, which are trained on vast datasets. These models utilize a method known as pattern recognition based on their training data. In the context of our study:

  1. Each model was primed with specific prompts related to potassium and phosphorus content in foods. This "priming" phase aimed to orient the model's subsequent responses towards the renal diet context.
  2. When a dietary item from our dataset was inputted into the AI model, the model sought patterns from its training data that matched or closely resembled the input. The AI would then categorize the item based on its recognized patterns, essentially predicting the most likely categorization (low or high potassium/phosphorus) based on its prior knowledge.
  3. It is imperative to mention that while the models have vast knowledge, they do not "understand" the information in the same manner humans do. Instead, they generate outputs (responses) that have the highest probability of being correct based on patterns from their training data.

We have incorporated this detailed strategy in our manuscript to provide readers with a clearer understanding of the AI models' operation in relation to our dataset. The revised section is as follows:

2.8. AI Algorithm Strategy and Dataset Value Adherence

The selected AI models, namely ChatGPT 3.5, ChatGPT 4, Bard AI, and Bing Chat, utilize large-scale transformer architectures, which are adept at pattern recognition derived from extensive training datasets. These models were oriented to the context of renal diets through specific priming prompts. As dietary items were fed into the models, the algorithms matched the input with recognized patterns from their training, predicting the most probable categorization for potassium and phosphorus content. It is vital to recognize that these models, while extensive in their knowledge base, operate based on probabilistic pattern matching rather than human-like understanding.” 

 

 

Comment

 -Two-week interval between each instance to mitigate the likelihood of fortuitous results is low. Why did the authors performed in this way? Which criteria used ?

 

Response: Thank you for drawing attention to the decision regarding the two-week interval implemented between each instance of our study. We appreciate the opportunity to clarify the rationale behind this specific choice.

The two-week interval was determined based on several considerations:

  1. AI model behavior: With continuous improvements and updates, AI models like ChatGPT and Bard AI undergo incremental changes over time. A short interval, such as two weeks, is less likely to witness drastic model updates or retrainings. Therefore, the two-week period allows for a retest that is primarily free of model version bias, ensuring that discrepancies in results are more attributable to model inconsistency than to significant version changes.
  2. Operational considerations: The processing, evaluation, and comparison of results from four different AI models across hundreds of dietary items is an extensive task. Two weeks provided an optimal window to thoroughly assess initial results, make necessary preparations, and rerun the evaluation without significant logistical challenges.
  3. Temporal fluctuations: In the realm of machine learning, there might be temporal nuances in how models interact with data due to factors like server loads, slight algorithm adjustments, or other transient aspects. The two-week interval was deemed sufficient to factor in these minor fluctuations while ensuring that the fundamental model architecture and training remained largely unchanged.
  4. Balancing rigor with feasibility: While longer intervals might offer further separation between instances and potentially reduce the likelihood of similar errors repeating, they also introduce greater chances of encountering different external variables. Two weeks struck a balance between ensuring a fresh evaluation and maintaining operational and methodological feasibility.

In response to your feedback, we've expanded upon this in the manuscript to ensure clarity for readers.

 

The revised section is as follows:

Rationale for the Two-Week Interval

The interval of two weeks between each instance was a deliberate choice made after considering multiple factors. One primary consideration was the evolving nature of AI models, which undergo frequent updates. A two-week span minimizes the risk of significant model alterations, ensuring consistency in our evaluations. Additionally, from an operational standpoint, this duration allowed for comprehensive assessment, adjustments, and preparations for the subsequent test. It also accounted for potential short-term fluctuations in AI responses due to transient technical factors. This time frame, thus, represented a balanced approach that prioritized both methodological rigor and feasibility.

 

We hope this explanation provides clarity on our decision-making process for the specified interval. Your astute observations have enabled us to present our methodology with greater detail and precision, and for that, we are genuinely appreciative.

We value your detailed, respectful, and constructive feedback, which will contribute to the improvement of our study. Thank you for your thoughtful review.

With utmost respect and gratitude

 

 

 

 

 

 

 

 

 

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

satisfied with revision

Reviewer 3 Report

Dear, Thank you for your modifications.

Minor editing of English language required

Back to TopTop