Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering

Fahim Faisal, Antonios Anastasopoulos


Abstract
Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all languages, they need to operate cross-lingually. In this work we investigate the capabilities of multilingually pretrained language models on cross-lingual QA. We find that explicitly aligning the representations across languages with a post-hoc finetuning step generally leads to improved performance. We additionally investigate the effect of data size as well as the language choice in this fine-tuning step, also releasing a dataset for evaluating cross-lingual QA systems.
Anthology ID:
2021.mrqa-1.14
Volume:
Proceedings of the 3rd Workshop on Machine Reading for Question Answering
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
MRQA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
133–148
Language:
URL:
https://aclanthology.org/2021.mrqa-1.14
DOI:
10.18653/v1/2021.mrqa-1.14
Bibkey:
Cite (ACL):
Fahim Faisal and Antonios Anastasopoulos. 2021. Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering. In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, pages 133–148, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering (Faisal & Anastasopoulos, MRQA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mrqa-1.14.pdf
Data
MKQAMLQASQuADTyDi QAXQuAD