Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain-specific challenges
  • Lee, Jong Kwon
  • Park, Sholhui
  • Hwang, Sang-Hyun
  • Lee, Jaejoon
  • Cho, Duck
  • 외 1명
Citations

WEB OF SCIENCE

2
Citations

SCOPUS

2

초록

Background and ObjectivesLarge language models (LLMs) such as GPT-4 are increasingly utilized in clinical and educational settings; however, their validity in subspecialized domains like transfusion medicine remains insufficiently characterized. This study assessed the performance of six LLMs on transfusion-related questions from Korean national licensing examinations for medical doctors (MDs) and medical technologists (MTs).Materials and MethodsA total of 23 MD and 67 MT questions (2020-2023) were extracted from publicly available sources. All items were originally written in Korean and subsequently translated into English to evaluate cross-linguistic performance. Each model received standardized multiple-choice prompts (five options), and correctness was determined by explicit answer selection. Accuracy was calculated as the proportion of correct responses, with 0.75 designated as the performance threshold. Chi-square tests were employed to analyse language-based differences.ResultsGPT-4 and GPT-4o consistently surpassed the 0.75 threshold across both languages and examination types. GPT-3.5 demonstrated reasonable accuracy in English but showed a marked decline in Korean, suggesting limitations in multilingual generalization. Gemini 1.5 outperformed Gemini 1, particularly in Korean, though both exhibited variability across technical subdomains. Clova X showed inconsistent results across settings. All models demonstrated limited performance in legal and ethical scenarios.ConclusionGPT-4 and GPT-4o exhibited robust and reliable performance across a range of transfusion medicine topics. Nonetheless, inter-model and inter-language variability highlights the need for targeted fine-tuning, particularly in the context of local regulatory and ethical frameworks, to support safe and context-appropriate implementation in clinical practice.

키워드

artificial intelligenceclinical decision supportlarge language modelsmedical licensing examinationpatient safetytransfusion medicine
제목
Comparative evaluation of six large language models in transfusion medicine: Addressing language and domain-specific challenges
저자
Lee, Jong KwonPark, SholhuiHwang, Sang-HyunLee, JaejoonCho, DuckChoi, Sooin
DOI
10.1111/vox.70050
발행일
2026-04
유형
Article; Early Access
저널명
Vox Sanguinis
121
4
페이지
496 ~ 504