Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval

Nattapol Trijakwanich, Peerat Limkonchotiwat, Raheem Sarwar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, Sarana Nutanong


Abstract
Cross-lingual Sentence Retrieval (CLSR) aims at retrieving parallel sentence pairs that are translations of each other from a multilingual set of comparable documents. The retrieved parallel sentence pairs can be used in other downstream NLP tasks such as machine translation and cross-lingual word sense disambiguation. We propose a CLSR framework called Robust Fragment-level Representation (RFR) CLSR framework to address Out-of-Domain (OOD) CLSR problems. In particular, we improve the sentence retrieval robustness by representing each sentence as a collection of fragments. In this way, we change the retrieval granularity from the sentence to the fragment level. We performed CLSR experiments based on three OOD datasets, four language pairs, and three base well-known sentence encoders: m-USE, LASER, and LaBSE. Experimental results show that RFR significantly improves the base encoders’ performance for more than 85% of the cases.
Anthology ID:
2021.findings-emnlp.80
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
935–944
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.80
DOI:
10.18653/v1/2021.findings-emnlp.80
Bibkey:
Cite (ACL):
Nattapol Trijakwanich, Peerat Limkonchotiwat, Raheem Sarwar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, and Sarana Nutanong. 2021. Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 935–944, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval (Trijakwanich et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.80.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.80.mp4
Data
XQuAD