Anshul Nasery
2021
MIMOQA: Multimodal Input Multimodal Output Question Answering
Hrituraj Singh
|
Anshul Nasery
|
Denil Mehta
|
Aishwarya Agarwal
|
Jatin Lamba
|
Balaji Vasan Srinivasan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Multimodal research has picked up significantly in the space of question answering with the task being extended to visual question answering, charts question answering as well as multimodal input question answering. However, all these explorations produce a unimodal textual output as the answer. In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal. Through human experiments, we empirically show that such multimodal outputs provide better cognitive understanding of the answers. We also propose a novel multimodal question-answering framework, MExBERT, that incorporates a joint textual and visual attention towards producing such a multimodal output. Our method relies on a novel multimodal dataset curated for this problem from publicly available unimodal datasets. We show the superior performance of MExBERT against strong baselines on both the automatic as well as human metrics.
Rule Augmented Unsupervised Constituency Parsing
Atul Sahay
|
Anshul Nasery
|
Ayush Maheshwari
|
Ganesh Ramakrishnan
|
Rishabh Iyer
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Search
Co-authors
- Hrituraj Singh 1
- Denil Mehta 1
- Aishwarya Agarwal 1
- Jatin Lamba 1
- Balaji Vasan Srinivasan 1
- show all...