Simin Hong

2024

pdf bib abs
Amanda: Adaptively Modality-Balanced Domain Adaptation for Multimodal Emotion Recognition
Xinxin Zhang | Jun Sun | Simin Hong | Taihao Li
Findings of the Association for Computational Linguistics: ACL 2024

This paper investigates unsupervised multimodal domain adaptation for multimodal emotion recognition, which is a solution for data scarcity yet remains under studied. Due to the varying distribution discrepancies of different modalities between source and target domains, the primary challenge lies in how to balance the domain alignment across modalities to guarantee they are all well aligned. To achieve this, we first develop our model based on the information bottleneck theory to learn optimal representation for each modality independently. Then, we align the domains via matching the label distributions and the representations. In order to balance the representation alignment, we propose to minimize a surrogate of the alignment losses, which is equivalent to adaptively adjusting the weights of the modalities throughout training, thus achieving balanced domain alignment across modalities. Overall, the proposed approach features Adaptively modality-balanced domain adaptation, dubbed Amanda, for multimodal emotion recognition. Extensive empirical results on commonly used benchmark datasets demonstrate that Amanda significantly outperforms competing approaches. The code is available at https://github.com/sunjunaimer/Amanda.

pdf bib abs
DetectiveNN: Imitating Human Emotional Reasoning with a Recall-Detect-Predict Framework for Emotion Recognition in Conversations
Simin Hong | Jun Sun | Taihao Li
Findings of the Association for Computational Linguistics: EMNLP 2024

Emotion Recognition in conversations (ERC) involves an internal cognitive process that interprets emotional cues by using a collection of past emotional experiences. However, many existing methods struggle to decipher emotional cues in dialogues since they are insufficient in understanding the rich historical emotional context. In this work, we introduce an innovative Detective Network (DetectiveNN), a novel model that is grounded in the cognitive theory of emotion and utilizes a “recall-detect-predict” framework to imitate human emotional reasoning. This process begins by ‘recalling’ past interactions of a specific speaker to collect emotional cues. It then ‘detects’ relevant emotional patterns by interpreting these cues in the context of the ongoing conversation. Finally, it ‘predicts’ the speaker’s current emotional state. Tested on three benchmark datasets, our approach significantly outperforms existing methods. This highlights the advantages of incorporating cognitive factors into deep learning for ERC, enhancing task efficacy and prediction accuracy.

Co-authors

Venues

findings2

Fix data