Thomas Vadora


2021

pdf bib
Region under Discussion for visual dialog
Mauricio Mazuecos | Franco M. Luque | Jorge Sánchez | Hernán Maina | Thomas Vadora | Luciana Benotti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Visual Dialog is assumed to require the dialog history to generate correct responses during a dialog. However, it is not clear from previous work how dialog history is needed for visual dialog. In this paper we define what it means for a visual question to require dialog history and we release a subset of the Guesswhat?! questions for which their dialog history completely changes their responses. We propose a novel interpretable representation that visually grounds dialog history: the Region under Discussion. It constrains the image’s spatial features according to a semantic representation of the history inspired by the information structure notion of Question under Discussion.We evaluate the architecture on task-specific multimodal models and the visual transformer model LXMERT.