The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Design and Question Formulation
2.1.1. Blinding and Randomization of Responses
2.1.2. Experts and Expert Evaluation
2.1.3. Evaluation Criteria and Scoring
2.1.4. Language and Cultural Considerations
2.2. Statistical Analysis
3. Results
4. Discussion
4.1. Limitations
4.2. Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
- Clusmann, J.; Kolbinger, F.R.; Muti, H.S.; Carrero, Z.I.; Eckardt, J.-N.; Laleh, N.G.; Löffler, C.M.L.; Schwarzkopf, S.-C.; Unger, M.; Veldhuizen, G.P.; et al. The future landscape of large language models in medicine. Commun. Med. 2023, 3, 141. [Google Scholar] [CrossRef] [PubMed]
- Haupt, C.E.; Marks, M. AI-Generated Medical Advice—GPT and Beyond. JAMA 2023, 329, 1349–1350. [Google Scholar] [CrossRef] [PubMed]
- Walters, R.; Leslie, S.J.; Polson, R.; Cusack, T.; Gorely, T. Establishing the efficacy of interventions to improve health literacy and health behaviours: A systematic review. BMC Public Health 2020, 20, 1040. [Google Scholar] [CrossRef] [PubMed]
- Shahid, R.; Shoker, M.; Chu, L.M.; Frehlick, R.; Ward, H.; Pahwa, P. Impact of low health literacy on patients’ health outcomes: A multicenter cohort study. BMC Health Serv. Res. 2022, 22, 1148. [Google Scholar] [CrossRef] [PubMed]
- Amin, K.S.; Mayes, L.C.; Khosla, P.; Doshi, R.H. Assessing the efficacy of Large Language Models in health literacy: A comprehensive cross-sectional study. Yale J. Biol. Med. 2024, 97, 17–27. [Google Scholar] [CrossRef] [PubMed]
- McMullan, M. Patients using the Internet to obtain health information: How this affects the patient–health professional relationship. Patient Educ. Couns. 2006, 63, 24–28. [Google Scholar] [CrossRef] [PubMed]
- Federatia Asociatiilor Bolnavilor de Cancer. Available online: https://shorturl.at/U8PSQ (accessed on 12 May 2024).
- Zhu, L.; Mou, W.; Chen, R. Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge? J. Transl. Med. 2023, 21, 269. [Google Scholar] [CrossRef] [PubMed]
- Iannantuono, G.M.; Bracken-Clarke, D.; Floudas, C.S.; Roselli, M.; Gulley, J.L.; Karzai, F. Applications of large language models in cancer care: Current evidence and future perspectives. Front. Oncol. 2023, 13, 1268915. [Google Scholar] [CrossRef]
- Geantă, M. Large Language Models and Prostate Cancer; Zenodo: Geneva, Switzerland, 2024. [Google Scholar] [CrossRef]
- Zhang, Y.; Kim, Y. Consumers’ evaluation of web-based health information quality: Meta-analysis. J. Med. Internet Res. 2022, 24, e36463. [Google Scholar] [CrossRef]
- Sbaffi, L.; Rowley, J. Trust and credibility in web-based health information: A review and agenda for future research. J. Med. Internet Res. 2017, 19, e218. [Google Scholar] [CrossRef] [PubMed]
- Stellefson, M.; Chaney, B.; Barry, A.E.; Chavarria, E.; Tennant, B.; Walsh-Childers, K.; Sriram, P.S.; Zagora, J. Web 2.0 chronic disease self-management for older adults: A systematic review. J. Med. Internet Res. 2013, 15, e35. [Google Scholar] [CrossRef] [PubMed]
- Keselman, A.; Browne, A.C.; Kaufman, D.R. Consumer health information seeking as hypothesis testing. J. Am. Med. Inform. Assoc. 2008, 15, 484–495. [Google Scholar] [CrossRef] [PubMed]
- Boone, H.N.; Boone, D.A. Analyzing Likert data. J. Ext. 2012, 50, 48. [Google Scholar] [CrossRef]
- Alasker, A.; Alsalamah, S.; Alshathri, N.; Almansour, N.; Alsalamah, F.; Alghafees, M.; AlKhamees, M.; Alsaikhan, B. Performance of Large Language Models (LLMs) in providing prostate cancer information. Res. Sq. 2023. [Google Scholar] [CrossRef]
- Sezgin, E. Redefining virtual assistants in health care: The future with Large Language Models. J. Med. Internet Res. 2024, 26, e53225. [Google Scholar] [CrossRef] [PubMed]
- Marcus, C. Strategies for improving the quality of verbal patient and family education: A review of the literature and creation of the EDUCATE model. Health Psychol. Behav. Med. 2014, 2, 482–495. [Google Scholar] [CrossRef] [PubMed]
- Abd-Alrazaq, A.; AlSaad, R.; Alhuwail, D.; Ahmed, A.; Healy, P.M.; Latifi, S.; Aziz, S.; Damseh, R.; Alabed Alrazak, S.; Sheikh, J. Large Language Models in medical education: Opportunities, challenges, and future directions. JMIR Med. Educ. 2023, 9, e48291. [Google Scholar] [CrossRef]
- Lucas, H.C.; Upperman, J.S.; Robinson, J.R. A systematic review of large language models and their implications in medical education. Med. Educ. 2024, in press. [CrossRef]
- Li, H.; Moon, J.T.; Purkayastha, S.; Celi, L.A.; Trivedi, H.; Gichoya, J.W. Ethics of large language models in medicine and medical research. Lancet Digit. Health 2023, 5, e333–e335. [Google Scholar] [CrossRef]
- Uriel, K.; Cohen, E.; Shachar, E.; Sommer, J.; Fink, A.; Morse, E.; Shreiber, B.; Wolf, I. GPT versus resident physicians—A benchmark based on official board scores. NEJM AI 2024, 1, AIdbp2300192. [Google Scholar]
- Bano, M.; Zowghi, D.; Whittle, J. AI and human reasoning: Qualitative research in the age of Large Language Models. AI Ethics J. 2023, 3, 1–15. [Google Scholar] [CrossRef]
- Ong, L.M.L.; de Haes, J.C.J.M.; Hoos, A.M.; Lammes, F.B. Doctor-patient communication: A review of the literature. Soc. Sci. Med. 1995, 40, 903–918. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Guevara, M.; Moningi, S.; Hoebers, F.; Elhalawani, H.; Kann, B.H.; Chipidza, F.E.; Leeman, J.; Aerts, H.J.W.L.; Miller, T.; et al. The effect of using a large language model to respond to patient messages. Lancet Digit. Health 2024, 6, e379–e381. [Google Scholar] [CrossRef] [PubMed]
- Guevara, M.; Chen, S.; Thomas, S.; Chaunzwa, T.L.; Franco, I.; Kann, B.H.; Moningi, S.; Qian, J.M.; Goldstein, M.; Harper, S.; et al. Large language models to identify social determinants of health in electronic health records. NPJ Digit. Med. 2024, 7, 6. [Google Scholar] [CrossRef] [PubMed]
- Lerner, J.; Tranmer, M.; Mowbray, J.; Hâncean, M.-G. REM beyond dyads: Relational hyperevent models for multi-actor interaction networks. arXiv 2019. [CrossRef]
- Lerner, J.; Hâncean, M.-G. Micro-level network dynamics of scientific collaboration and impact: Relational hyperevent models for the analysis of coauthor networks. Netw. Sci. 2023, 11, 5–35. [Google Scholar] [CrossRef]
- Meskó, B.; Topol, E.J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit. Med. 2023, 6, 120. [Google Scholar] [CrossRef]
- European Parliament. Available online: https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (accessed on 14 May 2024).
Criterion | Expert ID | ChatGPT | Gemini | CoPilot | Guide |
---|---|---|---|---|---|
Grand total | |||||
1 | 432 | 373 | 432 | 409 | |
2 | 408 | 275 | 335 | 249 | |
3 | 377 | 341 | 377 | 343 | |
4 | 434 | 394 | 410 | 376 | |
5 | 411 | 388 | 394 | 372 | |
6 | 456 | 363 | 416 | 359 | |
7 | 451 | 406 | 448 | 435 | |
8 | 363 | 354 | 366 | 349 | |
Accuracy | |||||
1 | 109 | 93 | 105 | 101 | |
2 | 99 | 66 | 78 | 62 | |
3 | 96 | 80 | 93 | 81 | |
4 | 112 | 101 | 101 | 101 | |
5 | 100 | 96 | 95 | 88 | |
6 | 113 | 91 | 106 | 90 | |
7 | 109 | 96 | 106 | 102 | |
8 | 88 | 79 | 84 | 77 | |
Timeliness | |||||
1 | 108 | 93 | 110 | 104 | |
2 | 100 | 66 | 77 | 66 | |
3 | 98 | 80 | 101 | 98 | |
4 | 112 | 101 | 103 | 104 | |
5 | 99 | 96 | 93 | 94 | |
6 | 109 | 91 | 100 | 95 | |
7 | 112 | 96 | 115 | 114 | |
8 | 92 | 79 | 97 | 98 | |
Comprehensiveness | |||||
1 | 98 | 78 | 102 | 90 | |
2 | 101 | 63 | 86 | 58 | |
3 | 94 | 74 | 90 | 78 | |
4 | 106 | 87 | 96 | 80 | |
5 | 97 | 89 | 93 | 82 | |
6 | 118 | 79 | 106 | 78 | |
7 | 109 | 89 | 107 | 99 | |
8 | 93 | 86 | 89 | 87 | |
Ease of use | |||||
1 | 117 | 108 | 115 | 114 | |
2 | 108 | 75 | 94 | 63 | |
3 | 89 | 93 | 93 | 86 | |
4 | 104 | 105 | 110 | 91 | |
5 | 115 | 109 | 113 | 108 | |
6 | 116 | 100 | 104 | 96 | |
7 | 121 | 113 | 120 | 120 | |
8 | 90 | 95 | 96 | 87 |
Model 1: General | ||||||
Random effects | Variance | Std. Dev. | ||||
Groups (intercept) | 1266.8 | 35.59 | ||||
Residual | 531.8 | 23.06 | ||||
Fixed intercepts | ||||||
Estimate | SE | df | t value | Pr(>|t|) | ||
(Intercept) | 361.50 | 14.99 | 11.25 | 24,109 | 0.000000 | *** |
CoPilot | 35.75 | 11.53 | 21.00 | 3100 | 0.005418 | ** |
Gemini | 0.25 | 11.53 | 21.00 | 0.022 | 0.982907 | |
ChatGPT | 55.00 | 11.53 | 21.00 | 4770 | 0.000103 | *** |
Model 2: Accuracy | ||||||
Random effects | Variance | Std. Dev. | ||||
Groups (intercept) | 105.11 | 10.252 | ||||
Residual | 26.27 | 5.125 | ||||
Fixed intercepts | ||||||
Estimate | SE | df | t value | Pr(>|t|) | ||
(Intercept) | 87.80 | 4.05 | 9.59 | 21,654 | 21.00 | *** |
CoPilot | 8.25 | 2.56 | 21.00 | 3219 | 0.00411 | ** |
Gemini | 0.00 | 2.56 | 21.00 | 0.000 | 100000 | |
ChatGPT | 15.50 | 2.56 | 21.00 | 6049 | 0.00000 | *** |
Model 3: Timeliness | ||||||
Random effects | Variance | Std. Dev. | ||||
Groups (intercept) | 88.73 | 9.419 | ||||
Residual | 40.95 | 6.399 | ||||
Fixed intercepts | ||||||
Estimate | SE | df | t value | Pr(>|t|) | ||
(Intercept) | 96,625 | 11,645 | 24,000 | 21,654 | 0.0000 | *** |
CoPilot | 2875 | 21,000 | 0.899 | 3219 | 0.3791 | |
Gemini | −8875 | 21,000 | −2774 | 0.000 | 0.0114 | * |
ChatGPT | 7125 | 21,000 | 2227 | 6049 | 0.0370 | * |
Model 4: Comprehensiveness | ||||||
Random effects | Variance | Std. Dev. | ||||
Groups (intercept) | 39.37 | 6.274 | ||||
Residual | 50.66 | 7.118 | ||||
Fixed intercepts | ||||||
Estimate | SE | df | t value | Pr(>|t|) | ||
(Intercept) | 81,500 | 3355 | 17,793 | 24,295 | 0.0000 | *** |
CoPilot | 14,625 | 3559 | 21,000 | 4110 | 0.0005 | *** |
Gemini | −0.875 | 3559 | 21,000 | −0.246 | 0.8082 | |
ChatGPT | 20,500 | 3559 | 21,000 | 5760 | 0.0000 | *** |
Model 5: Ease of use | ||||||
Random effects | Variance | Std. Dev. | ||||
Groups (intercept) | 132.68 | 11.518 | ||||
Residual | 52.87 | 7.271 | ||||
Fixed intercepts | ||||||
Estimate | SE | df | t value | Pr(>|t|) | ||
(Intercept) | 95,625 | 4816 | 11,050 | 19,856 | 0.00000 | *** |
CoPilot | 10,000 | 3636 | 21,000 | 2751 | 0.01198 | * |
Gemini | 4125 | 3636 | 21,000 | 1135 | 0.26932 | |
ChatGPT | 11,875 | 3636 | 21,000 | 3266 | 0.00369 | ** |
Contrast | Estimate | SE | df | t.Ratio | p Value |
---|---|---|---|---|---|
Model 1: General | |||||
Guide–CoPilot | −35.75 | 11.05 | 21 | −3.100 | 0.0257 |
Guide–Gemini | −0.25 | 11.05 | 21 | −0.022 | >0.9999 |
Guide–ChatGPT | −55.00 | 11.05 | 21 | −4.770 | 0.0006 |
CoPilot–Gemini | 35.50 | 11.05 | 21 | 3.079 | 0.0270 |
CoPilot–ChatGPT | −19.25 | 11.05 | 21 | −1.669 | 0.3638 |
Gemini–ChatGPT | −54.75 | 11.05 | 21 | −4.748 | 0.0006 |
Model 2: Accuracy | |||||
Guide–CoPilot | −8.25 | 2.56 | 21 | −3.219 | 0.0198 |
Guide–Gemini | 0.00 | 2.56 | 21 | 0.000 | >0.9999 |
Guide–ChatGPT | −15.50 | 2.56 | 21 | −6.049 | <0.0001 |
CoPilot–Gemini | 8.25 | 2.56 | 21 | 3.219 | 0.0198 |
CoPilot–ChatGPT | −7.25 | 2.56 | 21 | −2.829 | 0.0458 |
Gemini–ChatGPT | −15.50 | 2.56 | 21 | −6.049 | <0.0001 |
Model 3: Timeliness | |||||
Guide–CoPilot | −2.88 | 3.2 | 21 | −0.899 | 0.8057 |
Guide–Gemini | 8.88 | 3.2 | 21 | 2.774 | 0.0514 |
Guide–ChatGPT | −7.12 | 3.2 | 21 | −2.227 | 0.1485 |
CoPilot–Gemini | 11.75 | 3.2 | 21 | 3.672 | 0.0072 |
CoPilot–ChatGPT | −4.25 | 3.2 | 21 | −1.328 | 0.5559 |
Gemini–ChatGPT | −16.00 | 3.2 | 21 | −5.001 | 0.0003 |
Model 4: Comprehensiveness | |||||
Guide–CoPilot | −14.625 | 3.56 | 21 | −4.110 | 0.0026 |
Guide–Gemini | 0.875 | 3.56 | 21 | 0.246 | 0.9946 |
Guide–ChatGPT | −20.500 | 3.56 | 21 | −5.760 | 0.0001 |
CoPilot–Gemini | 15.500 | 3.56 | 21 | 4.355 | 0.0015 |
CoPilot–ChatGPT | −5.875 | 3.56 | 21 | −1.651 | 0.3734 |
Gemini–ChatGPT | −21.375 | 3.56 | 21 | −6.006 | <0.0001 |
Model 5: Ease of use | |||||
Guide–CoPilot | −10.00 | 3.64 | 21 | −2.751 | 0.0539 |
Guide–Gemini | −4.12 | 3.64 | 21 | −1.135 | 0.6729 |
Guide–ChatGPT | −11.88 | 3.64 | 21 | −3.266 | 0.0179 |
CoPilot–Gemini | 5.88 | 3.64 | 21 | 1.616 | 0.3915 |
CoPilot–ChatGPT | −1.88 | 3.64 | 21 | −0.516 | 0.9544 |
Gemini–ChatGPT | −7.75 | 3.64 | 21 | −2.132 | 0.1757 |
Normality Assumption | Homogeneitey of Variances |
---|---|
Overall evaluation | |
Guide, W = 0.91237, p = 0.3711 | Levene’s test: F(3, 28) = 0.1927, p = 0.9005 |
CoPilot, W = 0.98284, p = 0.9756 | Bartlett’s K-squared = 1.945, df = 3, p = 0.5839 |
Gemini, W = 0.88878, p = 0.2280 | Fligner–Killeen test: med chi-squared = 0.28729, df = 3, p = 0.9624 |
ChatGPT, W = 0.93274, p = 0.5414 | |
Accuracy evaluation | |
Guide, W = 0.90624, p = 0.3284 | Levene’s test: F(3, 28) = 0.3158, p = 0.8138 |
CoPilot, W = 0.88498, p = 0.2100 | Bartlett’s K-squared = 1.4537, df = 3, p = 0.693 |
Gemini, W = 0.90508, p = 0.3207 | Fligner–Killeen test: med chi-squared = 0.97968, df = 3, p = 0.8062 |
ChatGPT, W = 0.91362, p = 0.3802 | |
Timeliness evaluation | |
Guide, W = 0.83798, p = 0.0718 | Levene’s test: F(3, 28) = 0.1206, p = 0.9472 |
CoPilot, W = 0.94442, p = 0.6551 | Bartlett’s K-squared = 2.4542, df = 3, p = 0.4836 |
Gemini, W = 0.90508, p = 0.3207 | Fligner–Killeen test: med chi-squared = 0.19041, df = 3, p = 0.9791 |
ChatGPT, W = 0.90091, p = 0.2944 | |
Comprehensiveness | |
Guide, W = 0.92876, p = 0.5048 | Levene’s test: F(3, 28) = 0.1023, p = 0.9580 |
CoPilot, W = 0.91887, p = 0.4207 | Bartlett’s K-squared = 1.2837, df = 3, p = 0.733 |
Gemini, W = 0.87805, p = 0.1804 | Fligner–Killeen test: med chi-squared = 0.090494, df = 3, p = 0.993 |
ChatGPT, W = 0.91983, p = 0.4285 | |
Easy-to-use | |
Guide, W = 0.95716, p = 0.7827 | Levene’s test: F(3, 28) = 0.675, p = 0.5746 |
CoPilot, W = 0.91231, p = 0.3706 | Bartlett’s K-squared = 2.4896, df = 3, p = 0.4772 |
Gemini, W = 0.90322, p = 0.3088 | Fligner–Killeen test: med chi-squared = 1.4162, df = 3, p = 0.7017 |
ChatGPT, W = 0.87014, p = 0.1512 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Geantă, M.; Bădescu, D.; Chirca, N.; Nechita, O.C.; Radu, C.G.; Rascu, Ș.; Rădăvoi, D.; Sima, C.; Toma, C.; Jinga, V. The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering 2024, 11, 654. https://doi.org/10.3390/bioengineering11070654
Geantă M, Bădescu D, Chirca N, Nechita OC, Radu CG, Rascu Ș, Rădăvoi D, Sima C, Toma C, Jinga V. The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering. 2024; 11(7):654. https://doi.org/10.3390/bioengineering11070654
Chicago/Turabian StyleGeantă, Marius, Daniel Bădescu, Narcis Chirca, Ovidiu Cătălin Nechita, Cosmin George Radu, Ștefan Rascu, Daniel Rădăvoi, Cristian Sima, Cristian Toma, and Viorel Jinga. 2024. "The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy" Bioengineering 11, no. 7: 654. https://doi.org/10.3390/bioengineering11070654
APA StyleGeantă, M., Bădescu, D., Chirca, N., Nechita, O. C., Radu, C. G., Rascu, Ș., Rădăvoi, D., Sima, C., Toma, C., & Jinga, V. (2024). The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering, 11(7), 654. https://doi.org/10.3390/bioengineering11070654