Souvik Chowdhury


2021

pdf bib
eaVQA: An Experimental Analysis on Visual Question Answering Models
Souvik Chowdhury | Badal Soni
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Visual Question Answering (VQA) has recently become a popular research area. VQA problem lies in the boundary of Computer Vision and Natural Language Processing research domains. In VQA research, the dataset is a very important aspect because of its variety in image types i.e. natural and synthetic and also question answer source i.e. originated from human source or computer-generated question answer. Various details about each dataset is given in this paper, which can help future researchers to a great extent. In this paper, we discussed and compared the experimental performance of Stacked Attention Network Model (SANM) and bidirectional LSTM and MUTAN based fusion models. As per the experimental results, MUTAN accuracy and loss are 29% and 3.5 respectively. SANM model is giving 55% accuracy and a loss of 2.2 whereas VQA model is giving 59% accuracy and 1.9 loss.
Search
Co-authors
Venues