상세 보기
- Zhang, Youjia;
- Liu, Huiling;
- Kim, Youngeun;
- Hong, Sungeun
WEB OF SCIENCE
0SCOPUS
0초록
Prompt tuning has emerged as an effective method for adapting pre-trained vision-language models (VLMs) to diverse downstream tasks. However, it often struggles with generalization to unseen domains due to its dependence on labeled data. Unlike traditional approaches that rely on fixed prompts or parameters learned during training, Test-time Prompt Tuning (TPT) dynamically refines learnable prompts for individual samples at test time. Nevertheless, existing TPT methods frequently overlook alignment between visual and textual embeddings and lack mechanisms to ensure intra-modal diversity. In this work, we introduce CAT-TPT (Class-Agnostic Text-based Test-time Prompt Tuning), a novel approach that integrates attribute-guided augmentation, improved visual-textual alignment, and label-free adaptation for VLMs. By leveraging class-agnostic attributes generated by a large language model, CAT-TPT jointly optimizes both vision and language modalities, promoting enhanced intra-class diversity and seamless adaptation at test time. Extensive experiments demonstrate that CAT-TPT consistently outperforms state-of-the-art methods in zero-shot generalization, achieving an average improvement of 6.66% over existing TPT methods on out-of-distribution (OOD) data across five benchmarks, 3.17% in cross-dataset evaluations across ten fine-grained datasets, and 4.04% under fifteen diverse and challenging corruption types. Code is available at https://github.com/AIM-SKKU/CAT-TPT.
키워드
- 제목
- CAT-TPT: Class-Agnostic Text-based Test-time Prompt Tuning for Vision-Language Models
- 저자
- Zhang, Youjia; Liu, Huiling; Kim, Youngeun; Hong, Sungeun
- 발행일
- 2025-10
- 유형
- Article; Early Access
- 권
- 133
- 호
- 10
- 페이지
- 6930 ~ 6952