Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusionopen access

Authors
Khan, MustaqeemTran, Phuong-NamPham, Nhat TruongEl Saddik, AbdulmotalebOthmani, Alice
Issue Date
14-Feb-2025
Publisher
Nature Research
Keywords
Cross-modal transformer; Deep learning; Feature fusion; Multimodal emotion recognition; Speech emotion recognition
Citation
Scientific Reports, v.15, no.1
Indexed
SCIE
SCOPUS
Journal Title
Scientific Reports
Volume
15
Number
1
URI
https://scholarx.skku.edu/handle/2021.sw.skku/121124
DOI
10.1038/s41598-025-89202-x
ISSN
2045-2322
2045-2322
Abstract
Speech emotion recognition has seen a surge in transformer models, which excel at understanding the overall message by analyzing long-term patterns in speech. However, these models come at a computational cost. In contrast, convolutional neural networks are faster but struggle with capturing these long-range relationships. Our proposed system, MemoCMT, tackles this challenge using a novel “cross-modal transformer” (CMT). This CMT can effectively analyze local and global speech features and their corresponding text. To boost efficiency, MemoCMT leverages recent advancements in pre-trained models: HuBERT extracts meaningful features from the audio, while BERT analyzes the text. The core innovation lies in how the CMT component utilizes and integrates these audio and text features. After this integration, different fusion techniques are applied before final emotion classification. Experiments show that MemoCMT achieves impressive performance, with the CMT using min aggregation achieving the highest unweighted accuracy (UW-Acc) of 81.33% and 91.93%, and weighted accuracy (W-Acc) of 81.85% and 91.84% respectively on benchmark IEMOCAP and ESD corpora. The results of our system demonstrate the generalization capacity and robustness for real-world industrial applications. Moreover, the implementation details of MemoCMT are publicly available at https://github.com/tpnam0901/MemoCMT/ for reproducibility purposes. © The Author(s) 2025.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE