상세 보기
- Lee, Yubeen;
- Lee, Sangeun;
- Park, Chaewon;
- Cha, Junyeop;
- Park, Eunil
WEB OF SCIENCE
1SCOPUS
1초록
Valence-arousal estimation in multimodal emotion recognition struggles to capture dynamic emotional transitions and maintain temporal coherence across audio-visual modalities. To address this challenge, we introduce TAGF, a Time-aware Gated Fusion framework for multimodal emotion recognition. The TAGF adaptively modulates recursive attention outputs by learning temporal importance patterns through a BiLSTM-based gating mechanism. This approach enables the selective integration of cross-modal features across multiple recursive steps, emphasizing emotionally salient moments while suppressing temporal noise. By incorporating temporal awareness into the recursive fusion process, TAGF effectively captures the sequential evolution of emotional expressions and the complex interplay between the modalities. Experimental results on the Aff-Wild2 dataset demonstrate that TAGF achieves competitive performance compared to existing recursive attention-based models in the 9th Affective Behavior Analysis in-the-Wild Competition. Furthermore, TAGF exhibits superior robustness under cross-modal misalignment conditions and maintains a stable inference performance in real-world scenarios characterized by frequent dynamic emotional transitions. The code is available at https://github.com/leeyubin10/9th-ABAW.git.
키워드
- 제목
- Dynamic Temporal Gating Networks for Cross-Modal Valence-Arousal Estimation
- 저자
- Lee, Yubeen; Lee, Sangeun; Park, Chaewon; Cha, Junyeop; Park, Eunil
- 발행일
- 2025
- 유형
- Proceedings Paper
- 저널명
- Proceedings - 2025 IEEE/CVF International Conference on Computer Vision Workshops, ICCV-W 2025
- 페이지
- 61 ~ 70