Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

Coşkun, Belkıs Nihan; Yağız, Burcu; Ocakoğlu, Gökhan; Dalkılıç, Ediz; Pehlivan, Yavuz

Yayın:
Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

dc.contributor.author	Coşkun, Belkıs Nihan
dc.contributor.author	Yağız, Burcu
dc.contributor.author	Ocakoğlu, Gökhan
dc.contributor.author	Dalkılıç, Ediz
dc.contributor.author	Pehlivan, Yavuz
dc.contributor.buuauthor	COŞKUN, BELKIS NİHAN
dc.contributor.buuauthor	YAĞIZ, BURCU
dc.contributor.buuauthor	OCAKOĞLU, GÖKHAN
dc.contributor.buuauthor	DALKILIÇ, HÜSEYİN EDİZ
dc.contributor.buuauthor	PEHLİVAN, YAVUZ
dc.contributor.department	Tıp Fakültesi
dc.contributor.department	Tıp Fakültesi
dc.contributor.department	Biyoistatistik Ana Bilim Dalı
dc.contributor.department	Romatoloji Ana Bilim Dalı
dc.contributor.orcid	0000-0002-1114-6051
dc.contributor.researcherid	AAH-5180-2021
dc.contributor.researcherid	JQW-5031-2023
dc.contributor.researcherid	CMF-4757-2022
dc.contributor.researcherid	IRX-3951-2023
dc.contributor.researcherid	HLG-6346-2023
dc.contributor.researcherid	AAG-7155-2021
dc.date.accessioned	2024-09-13T05:15:13Z
dc.date.available	2024-09-13T05:15:13Z
dc.date.issued	2023-09-14
dc.description.abstract	We aimed to assess Large Language Models (LLMs)-ChatGPT 3.5-4, BARD, and Bing-in their accuracy and completeness when answering Methotrexate (MTX) related questions for treating rheumatoid arthritis. We employed 23 questions from an earlier study related to MTX concerns. These questions were entered into the LLMs, and the responses generated by each model were evaluated by two reviewers using Likert scales to assess accuracy and completeness. The GPT models achieved a 100% correct answer rate, while BARD and Bing scored 73.91%. In terms of accuracy of the outputs (completely correct responses), GPT-4 achieved a score of 100%, GPT 3.5 secured 86.96%, and BARD and Bing each scored 60.87%. BARD produced 17.39% incorrect responses and 8.7% non-responses, while Bing recorded 13.04% incorrect and 13.04% non-responses. The ChatGPT models produced significantly more accurate responses than Bing for the "mechanism of action" category, and GPT-4 model showed significantly higher accuracy than BARD in the "side effects" category. There were no statistically significant differences among the models for the "lifestyle" category. GPT-4 achieved a comprehensive output of 100%, followed by GPT-3.5 at 86.96%, BARD at 60.86%, and Bing at 0%. In the "mechanism of action" category, both ChatGPT models and BARD produced significantly more comprehensive outputs than Bing. For the "side effects" and "lifestyle" categories, the ChatGPT models showed significantly higher completeness than Bing. The GPT models, particularly GPT 4, demonstrated superior performance in providing accurate and comprehensive patient information about MTX use. However, the study also identified inaccuracies and shortcomings in the generated responses.
dc.identifier.doi	10.1007/s00296-023-05473-5
dc.identifier.endpage	515
dc.identifier.issn	0172-8172
dc.identifier.issue	3
dc.identifier.scopus	2-s2.0-85172274072
dc.identifier.startpage	509
dc.identifier.uri	https://doi.org/10.1007/s00296-023-05473-5
dc.identifier.uri	https://link.springer.com/article/10.1007/s00296-023-05473-5
dc.identifier.uri	https://hdl.handle.net/11452/44674
dc.identifier.volume	44
dc.identifier.wos	001071597600001
dc.indexed.wos	WOS.SCI
dc.language.iso	en
dc.publisher	Springer Heidelberg
dc.relation.journal	Rheumatology International
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Rheumatoid-arthritis
dc.subject	Mechanism
dc.subject	Accuracy
dc.subject	Artificial intelligence
dc.subject	Completeness
dc.subject	Large language models
dc.subject	Methotrexate
dc.subject	Rheumatology
dc.title	Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use
dc.type	Article
dspace.entity.type	Publication
local.contributor.department	Tıp Fakültesi/Romatoloji Ana Bilim Dalı
local.contributor.department	Tıp Fakültesi/Biyoistatistik Ana Bilim Dalı
local.indexed.at	WOS
local.indexed.at	Scopus
relation.isAuthorOfPublication	faabfe30-a620-4cbe-8b6d-3db71b10ce0e
relation.isAuthorOfPublication	02b3cfbb-e8e7-4a95-b025-294888ae9a91
relation.isAuthorOfPublication	8ff963e8-284c-49e2-99b9-a46777690e8c
relation.isAuthorOfPublication	1613225c-2f43-4052-9f82-210c854edcf4
relation.isAuthorOfPublication	0075f2ae-ae8a-4690-bd46-128775e8efac
relation.isAuthorOfPublication.latestForDiscovery	faabfe30-a620-4cbe-8b6d-3db71b10ce0e