MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Zhicheng | - |
dc.contributor.author | Zhao, Pancheng | - |
dc.contributor.author | Park, Eunil | - |
dc.contributor.author | Yang, Jufeng | - |
dc.date.accessioned | 2025-01-21T00:30:22Z | - |
dc.date.available | 2025-01-21T00:30:22Z | - |
dc.date.issued | 2024-01 | - |
dc.identifier.issn | 1063-6919 | - |
dc.identifier.issn | 2575-7075 | - |
dc.identifier.uri | https://scholarx.skku.edu/handle/2021.sw.skku/119853 | - |
dc.description.abstract | Limited training data is a long-standing problem for video emotion analysis (VEA). Existing works leverage the power of large-scale image datasets for transferring while failing to extract the temporal correlation of affective cues in the video. Inspired by psychology research and empirical theory, we verify that the degree of emotion may vary in different segments of the video, thus introducing the sentiment complementary and emotion intrinsic among temporal segments. We propose an MAE-style method for learning robust affective representation of videos via masking, termed MART. First, we extract the affective cues of the lexicon and verify the extracted one by computing its matching score with video content, in terms of sentiment and emotion scores alongside the temporal dimension. Then, with the verified cues, we propose masked affective modeling to recover temporal emotion distribution. We present temporal affective complementary learning that pulls the complementary part and pushes the intrinsic one of masked multimodal features, where the constraint is set with cross-modal attention among features to mask the video and recover the degree of emotion among segments. Extensive experiments on five benchmarks show the superiority of our method in video sentiment analysis, video emotion recognition, multimodal sentiment analysis, and multimodal emotion recognition. | - |
dc.format.extent | 11 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | IEEE COMPUTER SOC | - |
dc.title | MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/CVPR52733.2024.01219 | - |
dc.identifier.scopusid | 2-s2.0-85189212796 | - |
dc.identifier.wosid | 001342442404020 | - |
dc.identifier.bibliographicCitation | 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pp 12830 - 12840 | - |
dc.citation.title | 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | - |
dc.citation.startPage | 12830 | - |
dc.citation.endPage | 12840 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.subject.keywordPlus | WHITE | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(03063) 25-2, SUNGKYUNKWAN-RO, JONGNO-GU, SEOUL, KOREA samsunglib@skku.edu
COPYRIGHT © 2021 SUNGKYUNKWAN UNIVERSITY ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.