상세 보기
- Ali, Muhammad Umair;
- Zafar, Amad;
- Kim, Seonghan;
- Kim, Kwang Su;
- Lee, Seung Won
WEB OF SCIENCE
0SCOPUS
0초록
Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.
키워드
- 제목
- From task-specific to foundation models: A paradigm shift in medical vision-language analysis
- 저자
- Ali, Muhammad Umair; Zafar, Amad; Kim, Seonghan; Kim, Kwang Su; Lee, Seung Won
- 발행일
- 2026-02
- 유형
- Article
- 권
- 59