Unsupervised Joint Training of Bilingual Word Embeddings

Benjamin Marie, Atsushi Fujita


Abstract
State-of-the-art methods for unsupervised bilingual word embeddings (BWE) train a mapping function that maps pre-trained monolingual word embeddings into a bilingual space. Despite its remarkable results, unsupervised mapping is also well-known to be limited by the original dissimilarity between the word embedding spaces to be mapped. In this work, we propose a new approach that trains unsupervised BWE jointly on synthetic parallel data generated through unsupervised machine translation. We demonstrate that existing algorithms that jointly train BWE are very robust to noisy training data and show that unsupervised BWE jointly trained significantly outperform unsupervised mapped BWE in several cross-lingual NLP tasks.
Anthology ID:
P19-1312
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3224–3230
Language:
URL:
https://aclanthology.org/P19-1312
DOI:
10.18653/v1/P19-1312
Bibkey:
Cite (ACL):
Benjamin Marie and Atsushi Fujita. 2019. Unsupervised Joint Training of Bilingual Word Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3224–3230, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Joint Training of Bilingual Word Embeddings (Marie & Fujita, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1312.pdf