Region under Discussion for visual dialog

Mauricio Mazuecos, Franco M. Luque, Jorge Sánchez, Hernán Maina, Thomas Vadora, Luciana Benotti


Abstract
Visual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.
Anthology ID:
2021.emnlp-main.390
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4745–4759
Language:
URL:
https://aclanthology.org/2021.emnlp-main.390
DOI:
10.18653/v1/2021.emnlp-main.390
Bibkey:
Cite (ACL):
Mauricio Mazuecos, Franco M. Luque, Jorge Sánchez, Hernán Maina, Thomas Vadora, and Luciana Benotti. 2021. Region under Discussion for visual dialog. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4745–4759, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Region under Discussion for visual dialog (Mazuecos et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.390.pdf
Software:
 2021.emnlp-main.390.Software.zip
Video:
 https://aclanthology.org/2021.emnlp-main.390.mp4
Data
GuessWhat?!MS COCOVisDial