Publication:
Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

dc.contributor.authorCoşkun, Belkıs Nihan
dc.contributor.authorYağız, Burcu
dc.contributor.authorOcakoğlu, Gökhan
dc.contributor.authorDalkılıç, Ediz
dc.contributor.authorPehlivan, Yavuz
dc.contributor.buuauthorCOŞKUN, BELKIS NİHAN
dc.contributor.buuauthorYAĞIZ, BURCU
dc.contributor.buuauthorOCAKOĞLU, GÖKHAN
dc.contributor.buuauthorDALKILIÇ, HÜSEYİN EDİZ
dc.contributor.buuauthorPEHLİVAN, YAVUZ
dc.contributor.departmentBursa Uludağ Üniversitesi/Tıp Fakültesi/Romatoloji Anabilim Dalı.
dc.contributor.departmentBursa Uludağ Üniversitesi/Tıp Fakültesi/Biyoistatistik Anabilim Dalı.
dc.contributor.orcid0000-0002-1114-6051
dc.contributor.researcheridAAH-5180-2021
dc.contributor.researcheridJQW-5031-2023
dc.contributor.researcheridCMF-4757-2022
dc.contributor.researcheridIRX-3951-2023
dc.contributor.researcheridHLG-6346-2023
dc.contributor.researcheridAAG-7155-2021
dc.date.accessioned2024-09-13T05:15:13Z
dc.date.available2024-09-13T05:15:13Z
dc.date.issued2023-09-14
dc.description.abstractWe aimed to assess Large Language Models (LLMs)-ChatGPT 3.5-4, BARD, and Bing-in their accuracy and completeness when answering Methotrexate (MTX) related questions for treating rheumatoid arthritis. We employed 23 questions from an earlier study related to MTX concerns. These questions were entered into the LLMs, and the responses generated by each model were evaluated by two reviewers using Likert scales to assess accuracy and completeness. The GPT models achieved a 100% correct answer rate, while BARD and Bing scored 73.91%. In terms of accuracy of the outputs (completely correct responses), GPT-4 achieved a score of 100%, GPT 3.5 secured 86.96%, and BARD and Bing each scored 60.87%. BARD produced 17.39% incorrect responses and 8.7% non-responses, while Bing recorded 13.04% incorrect and 13.04% non-responses. The ChatGPT models produced significantly more accurate responses than Bing for the "mechanism of action" category, and GPT-4 model showed significantly higher accuracy than BARD in the "side effects" category. There were no statistically significant differences among the models for the "lifestyle" category. GPT-4 achieved a comprehensive output of 100%, followed by GPT-3.5 at 86.96%, BARD at 60.86%, and Bing at 0%. In the "mechanism of action" category, both ChatGPT models and BARD produced significantly more comprehensive outputs than Bing. For the "side effects" and "lifestyle" categories, the ChatGPT models showed significantly higher completeness than Bing. The GPT models, particularly GPT 4, demonstrated superior performance in providing accurate and comprehensive patient information about MTX use. However, the study also identified inaccuracies and shortcomings in the generated responses.
dc.identifier.doi10.1007/s00296-023-05473-5
dc.identifier.endpage515
dc.identifier.issn0172-8172
dc.identifier.issue3
dc.identifier.startpage509
dc.identifier.urihttps://doi.org/10.1007/s00296-023-05473-5
dc.identifier.urihttps://link.springer.com/article/10.1007/s00296-023-05473-5
dc.identifier.urihttps://hdl.handle.net/11452/44674
dc.identifier.volume44
dc.identifier.wos001071597600001
dc.indexed.wosWOS.SCI
dc.language.isoen
dc.publisherSpringer Heidelberg
dc.relation.journalRheumatology International
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectRheumatoid-arthritis
dc.subjectMechanism
dc.subjectAccuracy
dc.subjectArtificial intelligence
dc.subjectCompleteness
dc.subjectLarge language models
dc.subjectMethotrexate
dc.subjectRheumatology
dc.titleAssessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use
dc.typeArticle
dspace.entity.typePublication
relation.isAuthorOfPublicationfaabfe30-a620-4cbe-8b6d-3db71b10ce0e
relation.isAuthorOfPublication02b3cfbb-e8e7-4a95-b025-294888ae9a91
relation.isAuthorOfPublication8ff963e8-284c-49e2-99b9-a46777690e8c
relation.isAuthorOfPublication1613225c-2f43-4052-9f82-210c854edcf4
relation.isAuthorOfPublication0075f2ae-ae8a-4690-bd46-128775e8efac
relation.isAuthorOfPublication.latestForDiscoveryfaabfe30-a620-4cbe-8b6d-3db71b10ce0e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Coskun_vd_2023.pdf
Size:
685.84 KB
Format:
Adobe Portable Document Format

Collections