Analyzing the Limitations of Cross-lingual Word Embedding Mappings

Aitor Ormazabal; Mikel Artetxe; Gorka Labaka; Aitor Soroa; Eneko Agirre

doi:10.18653/v1/P19-1492

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Abstract

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations. While several authors have questioned the underlying isomorphism assumption, which states that word embeddings in different languages have approximately the same structure, it is not clear whether this is an inherent limitation of mapping approaches or a more general issue when learning cross-lingual embeddings. So as to answer this question, we experiment with parallel corpora, which allows us to compare offline mapping to an extension of skip-gram that jointly learns both embedding spaces. We observe that, under these ideal conditions, joint learning yields to more isomorphic embeddings, is less sensitive to hubness, and obtains stronger results in bilingual lexicon induction. We thus conclude that current mapping methods do have strong limitations, calling for further research to jointly learn cross-lingual embeddings with a weaker cross-lingual signal.

Anthology ID:: P19-1492
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4990–4995
Language:
URL:: https://aclanthology.org/P19-1492/
DOI:: 10.18653/v1/P19-1492
Bibkey:
Cite (ACL):: Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, and Eneko Agirre. 2019. Analyzing the Limitations of Cross-lingual Word Embedding Mappings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990–4995, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Analyzing the Limitations of Cross-lingual Word Embedding Mappings (Ormazabal et al., ACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/P19-1492.pdf
Video:: https://aclanthology.org/P19-1492.mp4

PDF Cite Search Video Fix data