상세 보기
- Lee, Sangeun;
- Lee, Yubeen;
- Park, Eunil;
- Chae, Wonseok
SCOPUS
0초록
Understanding human emotions from images is a challenging yet essential task for vision-language models. While recent efforts have fine-tuned vision-language models to enhance emotional awareness, most approaches rely on global visual representations and fail to capture the nuanced and multifaceted nature of emotional cues. Furthermore, most existing approaches adopt instruction tuning, which requires costly dataset construction and involves training a large number of parameters, thereby limiting their scalability and efficiency. To address these challenges, we propose MASP, a novel framework for Multi-Aspect guided emotion reasoning with Soft Prompt tuning in vision-language models. MASP explicitly separates emotion-relevant visual cues via multi-aspect cross-attention modules and guides the language model using soft prompts, enabling efficient and scalable task adaptation without modifying the base model. Our method achieves state-of-the-art performance on various emotion recognition benchmarks, demonstrating that the explicit modeling of multi-aspect emotional cues with soft prompt tuning leads to more accurate and interpretable emotion reasoning in vision-language models.
- 제목
- MASP: Multi-Aspect Guided Emotion Reasoning with Soft Prompt Tuning in Vision-Language Models
- 저자
- Lee, Sangeun; Lee, Yubeen; Park, Eunil; Chae, Wonseok
- 발행일
- 2026
- 유형
- Conference Paper
- 저널명
- Proceedings of the AAAI Conference on Artificial Intelligence
- 권
- 40
- 호
- 3
- 페이지
- 1882 ~ 1890