CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon


Abstract
Approaches to Grounded Language Learning are commonly focused on a single task-based final performance measure which may not depend on desirable properties of the learned hidden representations, such as their ability to predict object attributes or generalize to unseen situations. To remedy this, we present GroLLA, an evaluation framework for Grounded Language Learning with Attributes based on three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular with respect to attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with several attributes from resources such as VISA and ImSitu. We then compare several hidden state representations from current state-of-the-art approaches to Grounded Language Learning. By using diagnostic classifiers, we show that current models’ learned representations are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).
Anthology ID:
2020.acl-main.682
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7625–7641
Language:
URL:
https://aclanthology.org/2020.acl-main.682
DOI:
10.18653/v1/2020.acl-main.682
Bibkey:
Cite (ACL):
Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, and Oliver Lemon. 2020. CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7625–7641, Online. Association for Computational Linguistics.
Cite (Informal):
CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning (Suglia et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.682.pdf
Video:
 http://slideslive.com/38929114
Data
CompGuessWhat?!GQAGuessWhat?!MS COCONoCapsVisual GenomeVisual Question Answering