A Unified Framework for Multilingual and Code-Mixed Visual Question Answering

Deepak Gupta; Pabitra Lenka; Asif Ekbal; Pushpak Bhattacharyya

doi:10.18653/v1/2020.aacl-main.90

A Unified Framework for Multilingual and Code-Mixed Visual Question Answering

Deepak Gupta, Pabitra Lenka, Asif Ekbal, Pushpak Bhattacharyya

Abstract

In this paper, we propose an effective deep learning framework for multilingual and code- mixed visual question answering. The pro- posed model is capable of predicting answers from the questions in Hindi, English or Code- mixed (Hinglish: Hindi-English) languages. The majority of the existing techniques on Vi- sual Question Answering (VQA) focus on En- glish questions only. However, many applica- tions such as medical imaging, tourism, visual assistants require a multilinguality-enabled module for their widespread usages. As there is no available dataset in English-Hindi VQA, we firstly create Hindi and Code-mixed VQA datasets by exploiting the linguistic properties of these languages. We propose a robust tech- nique capable of handling the multilingual and code-mixed question to provide the answer against the visual information (image). To better encode the multilingual and code-mixed questions, we introduce a hierarchy of shared layers. We control the behaviour of these shared layers by an attention-based soft layer sharing mechanism, which learns how shared layers are applied in different ways for the dif- ferent languages of the question. Further, our model uses bi-linear attention with a residual connection to fuse the language and image fea- tures. We perform extensive evaluation and ablation studies for English, Hindi and Code- mixed VQA. The evaluation shows that the proposed multilingual model achieves state-of- the-art performance in all these settings.

Anthology ID:: 2020.aacl-main.90
Volume:: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:: December
Year:: 2020
Address:: Suzhou, China
Editors:: Kam-Fai Wong, Kevin Knight, Hua Wu
Venue:: AACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 900–913
Language:
URL:: https://aclanthology.org/2020.aacl-main.90/
DOI:: 10.18653/v1/2020.aacl-main.90
Bibkey:
Cite (ACL):: Deepak Gupta, Pabitra Lenka, Asif Ekbal, and Pushpak Bhattacharyya. 2020. A Unified Framework for Multilingual and Code-Mixed Visual Question Answering. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 900–913, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Unified Framework for Multilingual and Code-Mixed Visual Question Answering (Gupta et al., AACL 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.aacl-main.90.pdf
Data: MCVQA, MS COCO, Visual Question Answering

PDF Cite Search Fix data