Zero-shot cross-lingual open domain question answering

Sumit Agarwal, Suraj Tripathi, Teruko Mitamura, Carolyn Penstein Rose


Abstract
People speaking different kinds of languages search for information in a cross-lingual manner. They tend to ask questions in their language and expect the answer to be in the same language, despite the evidence lying in another language. In this paper, we present our approach for this task of cross-lingual open-domain question-answering. Our proposed method employs a passage reranker, the fusion-in-decoder technique for generation, and a wiki data entity-based post-processing system to tackle the inability to generate entities across all languages. Our end-2-end pipeline shows an improvement of 3 and 4.6 points on F1 and EM metrics respectively, when compared with the baseline CORA model on the XOR-TyDi dataset. We also evaluate the effectiveness of our proposed techniques in the zero-shot setting using the MKQA dataset and show an improvement of 5 points in F1 for high-resource and 3 points improvement for low-resource zero-shot languages. Our team, CMUmQA’s submission in the MIA-Shared task ranked 1st in the constrained setup for the dev and 2nd in the test setting.
Anthology ID:
2022.mia-1.9
Volume:
Proceedings of the Workshop on Multilingual Information Access (MIA)
Month:
July
Year:
2022
Address:
Seattle, USA
Venue:
MIA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–99
Language:
URL:
https://aclanthology.org/2022.mia-1.9
DOI:
10.18653/v1/2022.mia-1.9
Bibkey:
Cite (ACL):
Sumit Agarwal, Suraj Tripathi, Teruko Mitamura, and Carolyn Penstein Rose. 2022. Zero-shot cross-lingual open domain question answering. In Proceedings of the Workshop on Multilingual Information Access (MIA), pages 91–99, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
Zero-shot cross-lingual open domain question answering (Agarwal et al., MIA 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mia-1.9.pdf
Data
MKQANatural QuestionsXQuAD