상세 보기
초록
Recent vision–language models (VLMs) demonstrate impressive performance but often hallucinate when essential knowledge is not encoded in their parameters. Retrieval-augmented generation (RAG), which integrates external knowledge bases (KBs), has thus been widely adopted in computer vision. Beyond VLMs, retrieval augmentation has also been applied independently to improve task performance. However, no systematic analysis has investigated which types of KBs are most suitable for particular vision tasks. In this study, we present the first comprehensive survey of more than sixty studies (2021–2025) with a novel focus on KB types. We propose a taxonomy consisting of six categories: unstructured text (V-UT), ontology (V-OT), image (V-IM), image–text pairs (V-IT), structured graphs (V-SG), and domain-specific data (V-DM). For each category, we review retrieval pipelines, downstream tasks, representative datasets, and indexing strategies under consistent criteria. Our analysis shows that, regardless of KB type, most systems converge on dense encoders with vector databases, reflecting a mature technical stack. Nevertheless, this convergence often underutilizes KB-specific structures, highlighting significant opportunities for future studies. Finally, we provide practical guidelines for KB selection and retrieval design in vision and emphasize the need for KB-specific retrieval methods and standardized benchmarks.
키워드
- 제목
- A Taxonomy of Knowledge Bases for Retrieval-Augmented Methods in Vision: A Comprehensive Survey
- 저자
- Kim, Geonwoo; Lee, Dong-Hwan; Yoo, Jang-Hee
- 발행일
- 2026
- 유형
- Article
- 저널명
- IEEE Access
- 권
- 14
- 페이지
- 32736 ~ 32754