%0 Conference Proceedings
%T Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
%A Kang, Gi-Cheon
%A Park, Junseok
%A Lee, Hwaran
%A Zhang, Byoung-Tak
%A Kim, Jin-Hwa
%Y Moens, Marie-Francine
%Y Huang, Xuanjing
%Y Specia, Lucia
%Y Yih, Scott Wen-tau
%S Findings of the Association for Computational Linguistics: EMNLP 2021
%D 2021
%8 November
%I Association for Computational Linguistics
%C Punta Cana, Dominican Republic
%F kang-etal-2021-reasoning-visual
%X Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain multiple reasonable answers. As a result, our proposed model significantly improves reasoning capability compared to baseline methods and outperforms the state-of-the-art approaches on the VisDial v1.0 dataset. The source code is available at https://github.com/gicheonkang/SGLKT-VisDial.
%R 10.18653/v1/2021.findings-emnlp.31
%U https://aclanthology.org/2021.findings-emnlp.31
%U https://doi.org/10.18653/v1/2021.findings-emnlp.31
%P 327-339