Joon-Young Lee


2020

pdf bib
History for Visual Dialog: Do we really need it?
Shubham Agarwal | Trung Bui | Joon-Young Lee | Ioannis Konstas | Verena Rieser
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Visual Dialogue involves “understanding” the dialogue history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to accurately generate the correct response. In this paper, we show that co-attention models which explicitly encode dialoh history outperform models that don’t, achieving state-of-the-art performance (72 % NDCG on val set). However, we also expose shortcomings of the crowdsourcing dataset collection procedure, by showing that dialogue history is indeed only required for a small amount of the data, and that the current evaluation metric encourages generic replies. To that end, we propose a challenging subset (VisdialConv) of the VisdialVal set and the benchmark NDCG of 63%.