CAT-TPT: Class-Agnostic Text-based Test-time Prompt Tuning for Vision-Language Models
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Prompt tuning has emerged as an effective method for adapting pre-trained vision-language models (VLMs) to diverse downstream tasks. However, it often struggles with generalization to unseen domains due to its dependence on labeled data. Unlike traditional approaches that rely on fixed prompts or parameters learned during training, Test-time Prompt Tuning (TPT) dynamically refines learnable prompts for individual samples at test time. Nevertheless, existing TPT methods frequently overlook alignment between visual and textual embeddings and lack mechanisms to ensure intra-modal diversity. In this work, we introduce CAT-TPT (Class-Agnostic Text-based Test-time Prompt Tuning), a novel approach that integrates attribute-guided augmentation, improved visual-textual alignment, and label-free adaptation for VLMs. By leveraging class-agnostic attributes generated by a large language model, CAT-TPT jointly optimizes both vision and language modalities, promoting enhanced intra-class diversity and seamless adaptation at test time. Extensive experiments demonstrate that CAT-TPT consistently outperforms state-of-the-art methods in zero-shot generalization, achieving an average improvement of 6.66% over existing TPT methods on out-of-distribution (OOD) data across five benchmarks, 3.17% in cross-dataset evaluations across ten fine-grained datasets, and 4.04% under fifteen diverse and challenging corruption types. Code is available at https://github.com/AIM-SKKU/CAT-TPT.

키워드

Test-time prompt tuningVision-language modelsClass-agnostic attributesData augmentationOut-of-distribution
제목
CAT-TPT: Class-Agnostic Text-based Test-time Prompt Tuning for Vision-Language Models
저자
Zhang, YoujiaLiu, HuilingKim, YoungeunHong, Sungeun
DOI
10.1007/s11263-025-02508-1
발행일
2025-10
유형
Article; Early Access
저널명
International Journal of Computer Vision
133
10
페이지
6930 ~ 6952