Mayank Meghwanshi
2020
Geometry-aware domain adaptation for unsupervised alignment of word embeddings
Pratik Jawanpuria
|
Mayank Meghwanshi
|
Bamdev Mishra
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We propose a novel manifold based geometric approach for learning unsupervised alignment of word embeddings between the source and the target languages. Our approach formulates the alignment learning problem as a domain adaptation problem over the manifold of doubly stochastic matrices. This viewpoint arises from the aim to align the second order information of the two language spaces. The rich geometry of the doubly stochastic manifold allows to employ efficient Riemannian conjugate gradient algorithm for the proposed formulation. Empirically, the proposed approach outperforms state-of-the-art optimal transport based approach on the bilingual lexicon induction task across several language pairs. The performance improvement is more significant for distant language pairs.
A Simple Approach to Learning Unsupervised Multilingual Embeddings
Pratik Jawanpuria
|
Mayank Meghwanshi
|
Bamdev Mishra
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Recent progress on unsupervised cross-lingual embeddings in the bilingual setting has given the impetus to learning a shared embedding space for several languages. A popular framework to solve the latter problem is to solve the following two sub-problems jointly: 1) learning unsupervised word alignment between several language pairs, and 2) learning how to map the monolingual embeddings of every language to shared multilingual space. In contrast, we propose a simple approach by decoupling the above two sub-problems and solving them separately, one after another, using existing techniques. We show that this proposed approach obtains surprisingly good performance in tasks such as bilingual lexicon induction, cross-lingual word similarity, multilingual document classification, and multilingual dependency parsing. When distant languages are involved, the proposed approach shows robust behavior and outperforms existing unsupervised multilingual word embedding approaches.