Representation Learning for Conversational Data using Discourse Mutual Information Maximization

Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal


Abstract
Although many pretrained models exist for text or images, there have been relatively fewer attempts to train representations specifically for dialog understanding. Prior works usually relied on finetuned representations based on generic text representation models like BERT or GPT-2. But such language modeling pretraining objectives do not take the structural information of conversational text into consideration. Although generative dialog models can learn structural features too, we argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling. We empirically demonstrate that such representations do not perform consistently across various dialog understanding tasks. Hence, we propose a structure-aware Mutual Information based loss-function DMI (Discourse Mutual Information) for training dialog-representation models, that additionally captures the inherent uncertainty in response prediction. Extensive evaluation on nine diverse dialog modeling tasks shows that our proposed DMI-based models outperform strong baselines by significant margins.
Anthology ID:
2022.naacl-main.124
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1718–1734
Language:
URL:
https://aclanthology.org/2022.naacl-main.124
DOI:
10.18653/v1/2022.naacl-main.124
Bibkey:
Cite (ACL):
Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, and Pawan Goyal. 2022. Representation Learning for Conversational Data using Discourse Mutual Information Maximization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1718–1734, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Representation Learning for Conversational Data using Discourse Mutual Information Maximization (Santra et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.124.pdf
Data
DailyDialogDailyDialog++MuTual