ConceptBert: Concept-Aware Representation for Visual Question Answering

François Gardères, Maryam Ziaeefard, Baptiste Abeloos, Freddy Lecue


Abstract
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. A VQA model combines visual and textual features in order to answer questions grounded in an image. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visual elements of the image and a Knowledge Graph (KG) to infer the correct answer. We introduce a multi-modal representation which learns a joint Concept-Vision-Language embedding inspired by the popular BERT architecture. We exploit ConceptNet KG for encoding the common sense knowledge and evaluate our methodology on the Outside Knowledge-VQA (OK-VQA) and VQA datasets.
Anthology ID:
2020.findings-emnlp.44
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
489–498
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.44
DOI:
10.18653/v1/2020.findings-emnlp.44
Bibkey:
Cite (ACL):
François Gardères, Maryam Ziaeefard, Baptiste Abeloos, and Freddy Lecue. 2020. ConceptBert: Concept-Aware Representation for Visual Question Answering. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 489–498, Online. Association for Computational Linguistics.
Cite (Informal):
ConceptBert: Concept-Aware Representation for Visual Question Answering (Gardères et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.44.pdf
Code
 ThalesGroup/ConceptBERT +  additional community code
Data
ConceptNetOK-VQAVisual Question Answering