A Simple Approach to Learning Unsupervised Multilingual Embeddings

Pratik Jawanpuria, Mayank Meghwanshi, Bamdev Mishra


Abstract
Recent progress on unsupervised cross-lingual embeddings in the bilingual setting has given the impetus to learning a shared embedding space for several languages. A popular framework to solve the latter problem is to solve the following two sub-problems jointly: 1) learning unsupervised word alignment between several language pairs, and 2) learning how to map the monolingual embeddings of every language to shared multilingual space. In contrast, we propose a simple approach by decoupling the above two sub-problems and solving them separately, one after another, using existing techniques. We show that this proposed approach obtains surprisingly good performance in tasks such as bilingual lexicon induction, cross-lingual word similarity, multilingual document classification, and multilingual dependency parsing. When distant languages are involved, the proposed approach shows robust behavior and outperforms existing unsupervised multilingual word embedding approaches.
Anthology ID:
2020.emnlp-main.240
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2995–3001
Language:
URL:
https://aclanthology.org/2020.emnlp-main.240
DOI:
10.18653/v1/2020.emnlp-main.240
Bibkey:
Cite (ACL):
Pratik Jawanpuria, Mayank Meghwanshi, and Bamdev Mishra. 2020. A Simple Approach to Learning Unsupervised Multilingual Embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2995–3001, Online. Association for Computational Linguistics.
Cite (Informal):
A Simple Approach to Learning Unsupervised Multilingual Embeddings (Jawanpuria et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.240.pdf
Video:
 https://slideslive.com/38938704