Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation

Yiting Liu; Liang Li; Beichen Zhang; Qingming Huang

doi:10.18653/v1/2022.findings-emnlp.226

Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation

Yiting Liu, Liang Li, Beichen Zhang, Qingming Huang

Abstract

Commonsense knowledge has been widely considered for building intelligent open-domain dialogue agents, aiming to generate meaningful and diverse responses. Previous works in this field usually lack the ability to effectively obtain and utilize auxiliary commonsense from the external visual world. In this paper, we argue that exploiting logical information in images related to context can be effective to enrich and steer the generation process. In view of this, we propose VICTOR, a context-relevant VIsual Commonsense enhanced dialogue generaTOR for generating coherent and informative responses. To obtain the associated visual commonsense, we devise a novel approach that expands topic words on the knowledge graph and maps them into daily scenarios. During the generation, the model adopts multimodal fusion mechanism to integrate visual and textual information, and adaptively combine their decoding distributions for better response generation. The experimental results on two public datasets show that our proposed method outperforms the latest competitive methods in terms of coherence and diversity.

Anthology ID:: 2022.findings-emnlp.226
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3106–3117
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.226/
DOI:: 10.18653/v1/2022.findings-emnlp.226
Bibkey:
Cite (ACL):: Yiting Liu, Liang Li, Beichen Zhang, and Qingming Huang. 2022. Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3106–3117, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Think Beyond Words: Exploring Context-Relevant Visual Commonsense for Diverse Dialogue Generation (Liu et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.226.pdf

PDF Cite Search Fix data