Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces

Ivan Vulić; Anna Korhonen; Goran Glavaš

doi:10.18653/v1/2020.repl4nlp-1.7

Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces

Abstract

Work on projection-based induction of cross-lingual word embedding spaces (CLWEs) predominantly focuses on the improvement of the projection (i.e., mapping) mechanisms. In this work, in contrast, we show that a simple method for post-processing monolingual embedding spaces facilitates learning of the cross-lingual alignment and, in turn, substantially improves bilingual lexicon induction (BLI). The post-processing method we examine is grounded in the generalisation of first- and second-order monolingual similarities to the nth-order similarity. By post-processing monolingual spaces before the cross-lingual alignment, the method can be coupled with any projection-based method for inducing CLWE spaces. We demonstrate the effectiveness of this simple monolingual post-processing across a set of 15 typologically diverse languages (i.e., 15*14 BLI setups), and in combination with two different projection methods.

Anthology ID:: 2020.repl4nlp-1.7
Volume:: Proceedings of the 5th Workshop on Representation Learning for NLP
Month:: July
Year:: 2020
Address:: Online
Editors:: Spandana Gella, Johannes Welbl, Marek Rei, Fabio Petroni, Patrick Lewis, Emma Strubell, Minjoon Seo, Hannaneh Hajishirzi
Venue:: RepL4NLP
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45–54
Language:
URL:: https://aclanthology.org/2020.repl4nlp-1.7/
DOI:: 10.18653/v1/2020.repl4nlp-1.7
Bibkey:
Cite (ACL):: Ivan Vulić, Anna Korhonen, and Goran Glavaš. 2020. Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces. In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 45–54, Online. Association for Computational Linguistics.
Cite (Informal):: Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces (Vulić et al., RepL4NLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.repl4nlp-1.7.pdf
Video:: http://slideslive.com/38929773

PDF Cite Search Video Fix data