Cross-Lingual Training of Dense Retrievers for Document Retrieval

Peng Shi, Rui Zhang, He Bai, Jimmy Lin


Abstract
Dense retrieval has shown great success for passage ranking in English. However, its effectiveness for non-English languages remains unexplored due to limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to non-English languages. Our experiments reveal that zero-shot model-based transfer using mBERT improves search quality. We find that weakly-supervised target language transfer is competitive compared to generation-based target language transfer, which requires translation models.
Anthology ID:
2021.mrl-1.24
Volume:
Proceedings of the 1st Workshop on Multilingual Representation Learning
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Duygu Ataman, Alexandra Birch, Alexis Conneau, Orhan Firat, Sebastian Ruder, Gozde Gul Sahin
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
251–253
Language:
URL:
https://aclanthology.org/2021.mrl-1.24
DOI:
10.18653/v1/2021.mrl-1.24
Bibkey:
Cite (ACL):
Peng Shi, Rui Zhang, He Bai, and Jimmy Lin. 2021. Cross-Lingual Training of Dense Retrievers for Document Retrieval. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 251–253, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Training of Dense Retrievers for Document Retrieval (Shi et al., MRL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mrl-1.24.pdf