Performance of generative AI models on cardiology practice in emergency service: A pilot evaluation of GPT-4.o and Gemini-1.5-Flash

Yayın:
Performance of generative AI models on cardiology practice in emergency service: A pilot evaluation of GPT-4.o and Gemini-1.5-Flash

dc.contributor.buuauthor	GÜNAY POLATKAN, ŞEYDA
dc.contributor.buuauthor	SIĞIRLI, DENİZ
dc.contributor.buuauthor	DURAK, VAHİDE ASLIHAN
dc.contributor.buuauthor	ALAK, ÇETİN
dc.contributor.buuauthor	KAN, İREM İRİS
dc.contributor.department	Tıp Fakültesi
dc.contributor.department	Kardiyoloji Ana Bilim Dalı
dc.contributor.department	Biyoistatistik Ana Bilim Dalı
dc.contributor.department	Acil Tıp Ana Bilim Dalı
dc.contributor.department	Kalp Damar Cerrahisi Ana Bilim Dalı
dc.contributor.orcid	0000-0003-0012-345X
dc.contributor.orcid	0000-0002-4006-3263
dc.contributor.orcid	0000-0003-0836-7862
dc.contributor.orcid	0000-0003-1875-2078
dc.contributor.orcid	0000-0002-1600-9531
dc.date.accessioned	2025-09-25T06:07:50Z
dc.date.issued	2025-07-02
dc.description.abstract	In healthcare, emergent clinical decision-making is complex and large language models (LLMs) may enhance both the quality and efficiency of care by aiding physicians. Case scenario-based multiple choice questions (CS-MCQs) are valuable for testing analytical skills and knowledge integration. Moreover, readability is as important as content accuracy. This study aims to compare the diagnostic and treatment capabilities of GPT-4.o and Gemini-1.5-Flash and to evaluate the readability of the responses for cardiac emergencies. A total of 70 single-answer MCQs were randomly selected from the Medscape Case Challenges and ECG Challenges series. The questions were about cardiac emergencies and were further categorized into four subgroups according to whether the question included a case presentation or an image, or not. ChatGPT and Gemini platforms were used to assess the selected questions. The Flesch–Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores were utilized to evaluate the readability of the responses. GPT-4.o had a correct response rate of 65.7%, outperforming Gemini-1.5-Flash, which had a 58.6% correct response rate (p=0.010). When comparing by question type, GPT-4.o was inferior to Gemini-1.5-Flash only for non-case questions (52.5% vs. 62.5%, p=0.011). For all other question types, there were no significant performance differences between the two models (p>0.05). Both models performed better on easy questions compared to difficult ones, and on questions without images compared to those with images. Additionally, while GPT-4.o performed better on case questions than non-case questions. Gemini-1.5-Flash’s FRE score was higher than GPT-4.o’s (median [min-max], 23.75 [0-64.60] vs. 17.0 [0-56.60], p<0.001). Although on the whole GPT-4.o outperformed Gemini-1.5-Flash, both models demonstrated an ability to comprehend the case scenarios and provided reasonable answers.
dc.description.abstract	Sağlık hizmetlerinde, acil klinik karar alma karmaşıktır ve büyük dil modelleri (LLM'ler) hekimlere yardımcı olarak hem bakımın kalitesini hem de verimliliğini artırabilir. Vaka senaryosuna dayalı çoktan seçmeli sorular (VS-ÇSS), analitik becerileri ve bilgi bütünleştirmeyi test etmek için değerlidir. Ayrıca, okunabilirlik, içerik doğruluğu kadar önemlidir. Bu çalışma, GPT-4.o ve Gemini-1.5-Flash'ın tanı ve tedavi yeteneklerini karşılaştırmayı ve kardiyak acil durumlar için yanıtların okunabilirliğini değerlendirmeyi amaçlamaktadır. Medscape Vaka Zorlukları ve EKG Zorlukları serilerinden toplam 70 tek cevaplı ÇSS rastgele seçildi. Sorular kardiyak acil durumlarla ilgiliydi ve sorunun bir vaka sunumu veya bir görüntü içerip içermemesine göre dört alt gruba ayrıldı. Seçilen soruları değerlendirmek için CahtGPT ve Gemini platformları kullanıldı. Yanıtların okunabilirliğini değerlendirmek için Flesch-Kincaid Sınıf Düzeyi (FKGL) ve Flesch Okuma Kolaylığı (FRE) puanları kullanıldı. GPT-4.o'nun doğru yanıt oranı %65,7'ydi ve %58,6 doğru yanıt oranına sahip Gemini-1.5-Flash'ı geride bıraktı (p=0,010). Soru türüne göre karşılaştırıldığında, GPT-4.o yalnızca vaka dışı sorularda Gemini-1.5-Flash'tan daha düşüktü (%52,5'e karşı %62,5, p=0,011). Diğer tüm soru türleri için, iki model arasında önemli bir performans farkı yoktu (p>0,05). Her iki model de kolay sorularda zor sorulara göre ve resimsiz sorularda resimli sorulara göre daha iyi performans gösterdi. Ek olarak, GPT-4.o vaka dışı sorulara göre vaka sorularında daha iyi performans gösterdi. Gemini-1.5-Flash'ın FRE puanı GPT-4.o'dan daha yüksekti (ortanca [min-maks], 23.75 [0-64.60] - 17.0 [0-56.60], p<0.001). Her ne kadar toplamda GPT-4.o, Gemini-1.5-Flash'tan daha iyi performans gösterse de, her iki model de durum senaryolarını anlama becerisi gösterdi ve makul yanıtlar sağladı.
dc.identifier.doi	10.32708/uutfd.1718121
dc.identifier.endpage	246
dc.identifier.issn	2645-9027
dc.identifier.issue	2
dc.identifier.startpage	239
dc.identifier.uri	https://dergipark.org.tr/tr/pub/uutfd/issue/92411/1718121
dc.identifier.uri	https://doi.org/10.32708/uutfd.1718121
dc.identifier.uri	https://dergipark.org.tr/tr/download/article-file/4952341
dc.identifier.uri	https://hdl.handle.net/11452/55172
dc.identifier.volume	51
dc.language.iso	en
dc.relation.journal	Uludağ Üniversitesi Tıp Fakültesi Dergisi
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Cardiology
dc.subject	Decision making
dc.subject	Artificial intelligence
dc.subject	GPT-4.o
dc.subject	Gemini-1.5-Flash
dc.subject	Kardiyoloji
dc.subject	Karar verme
dc.subject	Yapay zeka
dc.title	Performance of generative AI models on cardiology practice in emergency service: A pilot evaluation of GPT-4.o and Gemini-1.5-Flash
dc.title.alternative	Kardiyak acil durumların yönetiminde ChatGPT ve Gemini	tr
dc.type	Article
dspace.entity.type	Publication
local.contributor.department	Tıp Fakültesi/Kardiyoloji Ana Bilim Dalı
local.contributor.department	Tıp Fakültesi/Biyoistatistik Ana Bilim Dalı
local.contributor.department	Tıp Fakültesi/Acil Tıp Ana Bilim Dalı
local.contributor.department	Tıp Fakültesi/Kalp Damar Cerrahisi Ana Bilim Dalı
relation.isAuthorOfPublication	2fce7938-9be9-404c-b4d0-3798583496b8
relation.isAuthorOfPublication	f8b7b771-12ea-4f9a-889d-25079d8c862d
relation.isAuthorOfPublication	fef584c2-9e17-4aaf-a681-04eda6a3ea30
relation.isAuthorOfPublication	06de923b-9893-47dd-b557-884c23a68b99
relation.isAuthorOfPublication	49f4aee2-9b3a-46a4-b336-4a9d36508f72
relation.isAuthorOfPublication.latestForDiscovery	2fce7938-9be9-404c-b4d0-3798583496b8