A comparative analysis of GPT-3.5, GPT-4 and GPT-4.o in heart failure

Polatkan, Şeyda Günay; Sığırlı, Deniz

A comparative analysis of GPT-3.5, GPT-4 and GPT-4.o in heart failure

dc.contributor.author	Polatkan, Şeyda Günay
dc.contributor.author	Sığırlı, Deniz
dc.date.accessioned	2025-02-25T05:22:59Z
dc.date.available	2025-02-25T05:22:59Z
dc.date.issued	2024-11-18
dc.description.abstract	Digitalization have increasingly penetrated in healthcare. Generative artificial intelligence (AI) is a type of AI technology that can generate new content. Patients can use AI-powered chatbots to get medical information. Heart failure is a syndrome with high morbidity and mortality. Patients search about heart failure in many web sites commonly. This study aimed to assess Large Language Models (LLMs) -ChatGPT 3.5, GPT-4 and GPT-4.o- in terms of their accuracy in answering the questions about heart failure (HF). Thirteen questions regarding to the definition, causes, signs and symptoms, complications, treatment and lifestyle recommendations of the HF were evaluated. These questions to assess the knowledge and awareness of medical students about heart failure were taken from a previous study in literature. Of the students who participated in this study, 158 (58.7%) were first-year students, while 111 (41.3%) were sixth-year students and were taking their cardiology internship in their fourth year. The questions were entered in Turkish language and 2 cardiologists with over ten years of experience evaluated the responses generated by different models including GPT-3.5, GPT-4 and GPT-4.o. ChatGPT-3.5 yielded “correct” responses to 8/13 (61.5%) of the questions whereas, GPT-4 yielded “correct” responses to 11/13 (84.6%) of the questions. All of the responses of GPT-4.o were accurate and complete. Performance of medical students did not include 100% correct answers for any question. This study revealed that performance of GPT-4.o was superior to GPT-3.5, but similar with GPT-4
dc.description.abstract	Dijitalleşme sağlık hizmetleri alanında giderek daha fazla yer almaktadır. Üretken yapay zeka yeni içerik üretebilen bir yapay zeka teknolojisi türüdür. Hastalar tıbbi bilgi almak için yapay zeka destekli sohbet robotlarını kullanabilmektedir. Kalp yetersizliği, yüksek morbidite ve mortaliteye sahip bir sendromdur. Hastalar genellikle birçok web sitesinde kalp yetersizliği hakkında arama yapmaktadır. Bu çalışma, kalp yetersizliği hakkındaki soruları yanıtlamadaki doğrulukları açısından Büyük Dil Modelleri (LLM'ler) - ChatGPT 3.5, GPT-4 ve GPT-4.o'yu karşılaştırmayı amaçlamaktadır. Çalışmada kalp yetersizliğinin tanımı, nedenleri, belirti ve semptomları, komplikasyonları, tedavisi ve yaşam tarzı önerileriyle ilgili on üç soru soruldu. Bu sorular, tıp fakültesi öğrencilerinin kalp yetmezliği hakkındaki bilgi ve farkındalığını değerlendirmek için yapılan önceki bir çalışmadan alındı. Bu çalışmaya katılmış olan öğrencilerin 158 tanesi (%58,7) 1. Sınıf öğrencisi iken, 111 tanesi (%41,3) 6. Sınıf öğrencisiydi ve kardiyoloji stajı 4. sınıfta alınmaktaydı. Sorular yapay zeka destekli modellere Türkçe dilinde soruldu ve on yılı aşkın deneyime sahip 2 kardiyolog, GPT-3.5, GPT-4 ve GPT-4.o tarafından üretilen yanıtları değerlendirdi. ChatGPT-3.5 soruların 8/13'üne (61.5%) "doğru" yanıt verirken, GPT-4 soruların 11/13'üne (84.6%) "doğru" yanıt verdi. GPT-4.o'nun tüm yanıtları doğru ve eksiksizdi. Tıp fakültesi öğrencilerinin performansı hiçbir soru için %100 doğru yanıt içermiyordu. Bu çalışma GPT-4.o' nun performansının GPT-3.5'ten üstün olduğunu ancak GPT-4 ile benzer olduğunu ortaya koydu.
dc.identifier.doi	10.32708/uutfd.1543370
dc.identifier.endpage	447
dc.identifier.issue	3
dc.identifier.startpage	443
dc.identifier.uri	https://doi.org/10.32708/uutfd.1543370
dc.identifier.uri	https://dergipark.org.tr/tr/pub/uutfd/issue/89968/1543370
dc.identifier.uri	https://dergipark.org.tr/tr/download/article-file/4189791
dc.identifier.uri	https://hdl.handle.net/11452/50597
dc.identifier.volume	50
dc.language.iso	en
dc.publisher	Bursa Uludağ Üniversitesi
dc.relation.journal	Uludağ Üniversitesi Tıp Fakültesi Dergisi
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Heart failure
dc.subject	Artificial intelligence
dc.subject	Medical knowledge
dc.subject	Kalp yetersizliği
dc.subject	Yapay zeka
dc.subject	Tıbbi bilgi
dc.title	A comparative analysis of GPT-3.5, GPT-4 and GPT-4.o in heart failure
dc.title.alternative	Kalp yetersizliğinde GPT-3,5, GPT-4 ve GPT-4.o performansının karşılaştırılması
dc.type	Article

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 50_3_11.pdf
Boyut:: 452.39 KB
Format:: Adobe Portable Document Format

İndir

Koleksiyonlar

2024 Cilt 50 Sayı 3