Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

Manas Jhalani, Annervaz K M, Pushpak Bhattacharyya


Abstract
In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding external knowledge along with images to respond to questions. We introduce an approach for KBVQA, augmenting the existing vision-language transformer encoder-decoder (OFA) model . Our main contribution involves enhancing questions by incorporating relevant external knowledge extracted from knowledge graphs, using a dynamic triple extraction
Anthology ID:
2024.icon-1.3
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
21–36
Language:
URL:
https://aclanthology.org/2024.icon-1.3/
DOI:
Bibkey:
Cite (ACL):
Manas Jhalani, Annervaz K M, and Pushpak Bhattacharyya. 2024. Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 21–36, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models (Jhalani et al., ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.3.pdf