eaVQA: An Experimental Analysis on Visual Question Answering Models

Souvik Chowdhury, Badal Soni


Abstract
Visual Question Answering (VQA) has recently become a popular research area. VQA problem lies in the boundary of Computer Vision and Natural Language Processing research domains. In VQA research, the dataset is a very important aspect because of its variety in image types i.e. natural and synthetic and also question answer source i.e. originated from human source or computer-generated question answer. Various details about each dataset is given in this paper, which can help future researchers to a great extent. In this paper, we discussed and compared the experimental performance of Stacked Attention Network Model (SANM) and bidirectional LSTM and MUTAN based fusion models. As per the experimental results, MUTAN accuracy and loss are 29% and 3.5 respectively. SANM model is giving 55% accuracy and a loss of 2.2 whereas VQA model is giving 59% accuracy and 1.9 loss.
Anthology ID:
2021.icon-main.67
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
550–554
Language:
URL:
https://aclanthology.org/2021.icon-main.67
DOI:
Bibkey:
Cite (ACL):
Souvik Chowdhury and Badal Soni. 2021. eaVQA: An Experimental Analysis on Visual Question Answering Models. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 550–554, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
eaVQA: An Experimental Analysis on Visual Question Answering Models (Chowdhury & Soni, ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.67.pdf
Data
Visual Question Answering