Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces

Goran Glavaš, Ivan Vulić


Abstract
We present InstaMap, an instance-based method for learning projection-based cross-lingual word embeddings. Unlike prior work, it deviates from learning a single global linear projection. InstaMap is a non-parametric model that learns a non-linear projection by iteratively: (1) finding a globally optimal rotation of the source embedding space relying on the Kabsch algorithm, and then (2) moving each point along an instance-specific translation vector estimated from the translation vectors of the point’s nearest neighbours in the training dictionary. We report performance gains with InstaMap over four representative state-of-the-art projection-based models on bilingual lexicon induction across a set of 28 diverse language pairs. We note prominent improvements, especially for more distant language pairs (i.e., languages with non-isomorphic monolingual spaces).
Anthology ID:
2020.acl-main.675
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7548–7555
Language:
URL:
https://aclanthology.org/2020.acl-main.675
DOI:
10.18653/v1/2020.acl-main.675
Bibkey:
Cite (ACL):
Goran Glavaš and Ivan Vulić. 2020. Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7548–7555, Online. Association for Computational Linguistics.
Cite (Informal):
Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces (Glavaš & Vulić, ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.675.pdf
Video:
 http://slideslive.com/38928885