Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Maya Srikanth; Run Chen (陈润); Julia Hirschberg

Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Maya Srikanth, Run Chen, Julia Hirschberg

Abstract

Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by annotator uncertainty. Our analysis shows that dominant signals in one modality can mislead fusion when unsupported by others. We also observe that humans, like models, do not consistently benefit from multimodal input. These insights position disagreement as a useful diagnostic signal for identifying challenging examples and improving empathy system robustness.

Anthology ID:: 2025.findings-ijcnlp.124
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 1978–1991
Language:
URL:: https://aclanthology.org/2025.findings-ijcnlp.124/
DOI:
Bibkey:
Cite (ACL):: Maya Srikanth, Run Chen, and Julia Hirschberg. 2025. Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1978–1991, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection (Srikanth et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-ijcnlp.124.pdf

PDF Cite Search Fix data