MMCoQA: Conversational Question Answering over Text, Tables, and Images

Yongqi Li, Wenjie Li, Liqiang Nie


Abstract
The rapid development of conversational assistants accelerates the study on conversational question answering (QA). However, the existing conversational QA systems usually answer users’ questions with a single knowledge source, e.g., paragraphs or a knowledge graph, but overlook the important visual cues, let alone multiple knowledge sources of different modalities. In this paper, we hence define a novel research task, i.e., multimodal conversational question answering (MMCoQA), aiming to answer users’ questions with multimodal knowledge sources via multi-turn conversations. This new task brings a series of research challenges, including but not limited to priority, consistency, and complementarity of multimodal knowledge. To facilitate the data-driven approaches in this area, we construct the first multimodal conversational QA dataset, named MMConvQA. Questions are fully annotated with not only natural language answers but also the corresponding evidence and valuable decontextualized self-contained questions. Meanwhile, we introduce an end-to-end baseline model, which divides this complex research task into question understanding, multi-modal evidence retrieval, and answer extraction. Moreover, we report a set of benchmarking results, and the results indicate that there is ample room for improvement.
Anthology ID:
2022.acl-long.290
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4220–4231
Language:
URL:
https://aclanthology.org/2022.acl-long.290
DOI:
10.18653/v1/2022.acl-long.290
Bibkey:
Cite (ACL):
Yongqi Li, Wenjie Li, and Liqiang Nie. 2022. MMCoQA: Conversational Question Answering over Text, Tables, and Images. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4220–4231, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
MMCoQA: Conversational Question Answering over Text, Tables, and Images (Li et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.290.pdf
Software:
 2022.acl-long.290.software.zip
Code
 liyongqi67/mmcoqa
Data
ManyModalQAORConvQA