Yayın:
Evaluating the performance of chatGPT, gemini, and bing compared with resident surgeons in the otorhinolaryngology in-service training examination

dc.contributor.authorMete, Utku
dc.contributor.buuauthorMETE, UTKU
dc.contributor.departmentTıp Fakültesi
dc.contributor.departmentKulak Burun Boğaz Ana Bilim Dalı
dc.contributor.orcid0000-0003-0902-8061
dc.contributor.researcheridJQZ-2315-2023
dc.date.accessioned2025-01-28T06:42:06Z
dc.date.available2025-01-28T06:42:06Z
dc.date.issued2024-06-01
dc.description.abstractObjective: Large language models (LLMs) are used in various fields for their ability to produce human-like text. They are particularly useful in medical education, aiding clinical management skills and exam preparation for residents. To evaluate and compare the performance of ChatGPT (GPT-4), Gemini, and Bing with each other and with otorhinolaryngology residents in answering in-service training exam questions and provide insights into the usefulness of these models in medical education and healthcare. Methods: Eight otorhinolaryngology in-service training exams were used for comparison. 316 questions were prepared from the Resident Training Textbook of the Turkish Society of Otorhinolaryngology Head and Neck Surgery. These questions were presented to the three artificial intelligence models. The exam results were evaluated to determine the accuracy of both models and residents. Results: GPT-4 achieved the highest accuracy among the LLMs at 54.75% (GPT-4 vs. Gemini p=0.002, GPT-4 vs. Bing p<0.001), followed by Gemini at 40.50% and Bing at 37.00% (Gemini vs. Bing p=0.327). However, senior residents outperformed all LLMs and other residents with an accuracy rate of 75.5% (p<0.001). The LLMs could only compete with junior residents. GPT4 and Gemini performed similarly to juniors, whose accuracy level was 46.90% (p=0.058 and p=0.120, respectively). However, juniors still outperformed Bing (p=0.019). Conclusion: The LLMs currently have limitations in achieving the same medical accuracy as senior and mid-level residents. However, they outperform in specific subspecialties, indicating the potential usefulness in certain medical fields.
dc.identifier.doi10.4274/tao.2024.3.5
dc.identifier.eissn2667-7474
dc.identifier.endpage57
dc.identifier.issn2667-7466
dc.identifier.issue2
dc.identifier.startpage48
dc.identifier.urihttps://doi.org/10.4274/tao.2024.3.5
dc.identifier.urihttps://www.turkarchotolaryngol.net/articles/evaluating-the-performance-of-chatgpt-gemini-and-bing-compared-with-resident-surgeons-in-the-otorhinolaryngology-in-service-training-examination/doi/tao.2024.3.5
dc.identifier.urihttps://hdl.handle.net/11452/49858
dc.identifier.volume62
dc.identifier.wos001352319700001
dc.indexed.wosWOS.ESCI
dc.language.isoen
dc.relation.journalTürk Kulak Burun Boğaz
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectArtificial intelligence
dc.subjectChatgpt
dc.subjectOtorhinolaryngology
dc.subjectOtorhinolaryngology in-service examination
dc.subjectResident education
dc.subjectScience & technology
dc.subjectLife sciences & biomedicine
dc.titleEvaluating the performance of chatGPT, gemini, and bing compared with resident surgeons in the otorhinolaryngology in-service training examination
dc.typeArticle
dspace.entity.typePublication
local.contributor.departmentTıp Fakültesi/Kulak Burun Boğaz Ana Bilim Dalı
local.indexed.atWOS
relation.isAuthorOfPublicationfdadf4a0-7bbe-46b0-90b4-36275b6ddf52
relation.isAuthorOfPublication.latestForDiscoveryfdadf4a0-7bbe-46b0-90b4-36275b6ddf52

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1
Küçük Resim
Ad:
Utku_2024.pdf
Boyut:
552.92 KB
Format:
Adobe Portable Document Format