Parallel sentences mining with transfer learning in an unsupervised setting

Yu Sun, Shaolin Zhu, Feng Yifan, Chenggang Mi


Abstract
The quality and quantity of parallel sentences are known as very important training data for constructing neural machine translation (NMT) systems. However, these resources are not available for many low-resource language pairs. Many existing methods need strong supervision are not suitable. Although several attempts at developing unsupervised models, they ignore the language-invariant between languages. In this paper, we propose an approach based on transfer learning to mine parallel sentences in the unsupervised setting. With the help of bilingual corpora of rich-resource language pairs, we can mine parallel sentences without bilingual supervision of low-resource language pairs. Experiments show that our approach improves the performance of mined parallel sentences compared with previous methods. In particular, we achieve excellent results at two real-world low-resource language pairs.
Anthology ID:
2021.naacl-srw.17
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:
June
Year:
2021
Address:
Online
Editors:
Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–142
Language:
URL:
https://aclanthology.org/2021.naacl-srw.17
DOI:
10.18653/v1/2021.naacl-srw.17
Bibkey:
Cite (ACL):
Yu Sun, Shaolin Zhu, Feng Yifan, and Chenggang Mi. 2021. Parallel sentences mining with transfer learning in an unsupervised setting. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 136–142, Online. Association for Computational Linguistics.
Cite (Informal):
Parallel sentences mining with transfer learning in an unsupervised setting (Sun et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-srw.17.pdf
Video:
 https://aclanthology.org/2021.naacl-srw.17.mp4