Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains

Alon Albalak; Sharon Levy; William Yang Wang

doi:10.18653/v1/2023.eacl-demo.1

Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains

Alon Albalak, Sharon Levy, William Yang Wang

Abstract

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains. However, low-resource settings such as new and emerging domains would especially benefit from reliable question answering systems. Furthermore, multilingual and cross-lingual resources in emergent domains are scarce, leading to few or no such systems. In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable. To address the scarcity of cross-lingual training data in emergent domains, we present a method utilizing automatic translation, alignment, and filtering to produce English-to-all datasets. We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting. We illustrate the capabilities of our system with examples and release all code necessary to train and deploy such a system.

Anthology ID:: 2023.eacl-demo.1
Volume:: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Danilo Croce, Luca Soldaini
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–10
Language:
URL:: https://aclanthology.org/2023.eacl-demo.1
DOI:: 10.18653/v1/2023.eacl-demo.1
Bibkey:
Cite (ACL):: Alon Albalak, Sharon Levy, and William Yang Wang. 2023. Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 1–10, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains (Albalak et al., EACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.eacl-demo.1.pdf
Video:: https://aclanthology.org/2023.eacl-demo.1.mp4

PDF Cite Search Video