From task-specific to foundation models: A paradigm shift in medical vision-language analysis
  • Ali, Muhammad Umair
  • Zafar, Amad
  • Kim, Seonghan
  • Kim, Kwang Su
  • Lee, Seung Won
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.

키워드

Vision-language modelsModality fusionMedical image analysisFoundation modelsREPORT GENERATIONDEEP
제목
From task-specific to foundation models: A paradigm shift in medical vision-language analysis
저자
Ali, Muhammad UmairZafar, AmadKim, SeonghanKim, Kwang SuLee, Seung Won
DOI
10.1016/j.cosrev.2025.100831
발행일
2026-02
유형
Article
저널명
Computer Science Review
59