Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection

Zhanpeng Chen, Zhihong Zhu, Xianwei Zhuang, Zhiqi Huang, Yuexian Zou


Abstract
Multimodal intent detection is designed to leverage diverse modalities for a comprehensive understanding of user intentions in real-world scenarios, thus playing a critical role in modern task-oriented dialogue systems. Existing methods have made great progress in modal alignment and fusion, however, two vital limitations are neglected: (I) close entanglement of multimodal semantics with modal structures; (II) insufficient learning of the causal effects of semantic and modality-specific information on the final predictions under the end-to-end training fashion. To alleviate the above limitations, we introduce the Dual-oriented Disentangled Network with Counterfactual Intervention (DuoDN). DuoDN addresses key limitations in current systems by effectively disentangling and utilizing modality-specific and multimodal semantic information. The model consists of a Dual-oriented Disentangled Encoder that decouples semantics-oriented and modality-oriented representations, alongside a Counterfactual Intervention Module that applies causal inference to understand causal effects by injecting confounders. Experiments on three benchmark datasets demonstrate DuoDN’s superiority over existing methods, with extensive analysis validating its advantages.
Anthology ID:
2024.emnlp-main.972
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17554–17567
Language:
URL:
https://aclanthology.org/2024.emnlp-main.972
DOI:
Bibkey:
Cite (ACL):
Zhanpeng Chen, Zhihong Zhu, Xianwei Zhuang, Zhiqi Huang, and Yuexian Zou. 2024. Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17554–17567, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection (Chen et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.972.pdf