In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Peter Vickers; Nikolaos Aletras; Emilio Monti; Loic Barrault

doi:10.18653/v1/2021.acl-short.60

In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Peter Vickers, Nikolaos Aletras, Emilio Monti, Loïc Barrault

Abstract

Visual Question Answering (VQA) methods aim at leveraging visual input to answer questions that may require complex reasoning over entities. Current models are trained on labelled data that may be insufficient to learn complex knowledge representations. In this paper, we propose a new method to enhance the reasoning capabilities of a multi-modal pretrained model (Vision+Language BERT) by integrating facts extracted from an external knowledge base. Evaluation on the KVQA dataset benchmark demonstrates that our method outperforms competitive baselines by 19%, achieving new state-of-the-art results. We also perform an extensive analysis highlighting the limitations of our best performing model through an ablation study.

Anthology ID:: 2021.acl-short.60
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 468–475
Language:
URL:: https://aclanthology.org/2021.acl-short.60/
DOI:: 10.18653/v1/2021.acl-short.60
Bibkey:
Cite (ACL):: Peter Vickers, Nikolaos Aletras, Emilio Monti, and Loïc Barrault. 2021. In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 468–475, Online. Association for Computational Linguistics.
Cite (Informal):: In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering (Vickers et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-short.60.pdf
Video:: https://aclanthology.org/2021.acl-short.60.mp4

PDF Cite Search Video Fix data