%0 Conference Proceedings %T Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer %A Kang, Gi-Cheon %A Park, Junseok %A Lee, Hwaran %A Zhang, Byoung-Tak %A Kim, Jin-Hwa %Y Moens, Marie-Francine %Y Huang, Xuanjing %Y Specia, Lucia %Y Yih, Scott Wen-tau %S Findings of the Association for Computational Linguistics: EMNLP 2021 %D 2021 %8 November %I Association for Computational Linguistics %C Punta Cana, Dominican Republic %F kang-etal-2021-reasoning-visual %X Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently sparse dialog structures by incorporating binary and score edges and leveraging a new structural loss function. Next, we introduce a Knowledge Transfer (KT) method that extracts the answer predictions from the teacher model and uses them as pseudo labels. We propose KT to remedy the shortcomings of single ground-truth labels, which severely limit the ability of a model to obtain multiple reasonable answers. As a result, our proposed model significantly improves reasoning capability compared to baseline methods and outperforms the state-of-the-art approaches on the VisDial v1.0 dataset. The source code is available at https://github.com/gicheonkang/SGLKT-VisDial. %R 10.18653/v1/2021.findings-emnlp.31 %U https://aclanthology.org/2021.findings-emnlp.31 %U https://doi.org/10.18653/v1/2021.findings-emnlp.31 %P 327-339