Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

Gi-Cheon Kang; Junseok Park; Hwaran Lee; Byoung-Tak Zhang; Jin-Hwa Kim

doi:10.18653/v1/2021.findings-emnlp.31

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, Jin-Hwa Kim

Abstract

Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain multiple reasonable answers. As a result, our proposed model significantly improves reasoning capability compared to baseline methods and outperforms the state-of-the-art approaches on the VisDial v1.0 dataset. The source code is available at https://github.com/gicheonkang/SGLKT-VisDial.

Anthology ID:: 2021.findings-emnlp.31
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 327–339
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.31
DOI:: 10.18653/v1/2021.findings-emnlp.31
Bibkey:
Cite (ACL):: Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, and Jin-Hwa Kim. 2021. Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 327–339, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer (Kang et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.31.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.31.mp4
Code: gicheonkang/sglkt-visdial
Data: VisDial

PDF Cite Search Code Video