ESCP: Enhancing Emotion Recognition in Conversation with Speech and Contextual Prefixes

Xiujuan Xu, Xiaoxiao Shi, Zhehuan Zhao, Yu Liu


Abstract
Emotion Recognition in Conversation (ERC) aims to analyze the speaker’s emotional state in a conversation. Fully mining the information in multimodal and historical utterances plays a crucial role in the performance of the model. However, recent works in ERC focus on historical utterances modeling and generally concatenate the multimodal features directly, which neglects mining deep multimodal information and brings redundancy at the same time. To address the shortcomings of existing models, we propose a novel model, termed Enhancing Emotion Recognition in Conversation with Speech and Contextual Prefixes (ESCP). ESCP employs a directed acyclic graph (DAG) to model historical utterances in a conversation and incorporates a contextual prefix containing the sentiment and semantics of historical utterances. By adding speech and contextual prefixes, the inter- and intra-modal emotion information is efficiently modeled using the prior knowledge of the large-scale pre-trained model. Experiments conducted on several public benchmarks demonstrate that the proposed approach achieves state-of-the-art (SOTA) performances. These results affirm the effectiveness of the novel ESCP model and underscore the significance of incorporating speech and contextual prefixes to guide the pre-trained model.
Anthology ID:
2024.lrec-main.555
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6278–6287
Language:
URL:
https://aclanthology.org/2024.lrec-main.555
DOI:
Bibkey:
Cite (ACL):
Xiujuan Xu, Xiaoxiao Shi, Zhehuan Zhao, and Yu Liu. 2024. ESCP: Enhancing Emotion Recognition in Conversation with Speech and Contextual Prefixes. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6278–6287, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ESCP: Enhancing Emotion Recognition in Conversation with Speech and Contextual Prefixes (Xu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.555.pdf