On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning

Yerai Doval, Jose Camacho-Collados, Luis Espinosa Anke, Steven Schockaert


Abstract
Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language. Recent developments which construct these embeddings by aligning monolingual spaces have shown that accurate alignments can be obtained with little or no supervision, which usually comes in the form of bilingual dictionaries. However, the focus has been on a particular controlled scenario for evaluation, and there is no strong evidence on how current state-of-the-art systems would fare with noisy text or for language pairs with major linguistic differences. In this paper we present an extensive evaluation over multiple cross-lingual embedding models, analyzing their strengths and limitations with respect to different variables such as target language, training corpora and amount of supervision. Our conclusions put in doubt the view that high-quality cross-lingual embeddings can always be learned without much supervision.
Anthology ID:
2020.lrec-1.495
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4013–4023
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.495
DOI:
Bibkey:
Cite (ACL):
Yerai Doval, Jose Camacho-Collados, Luis Espinosa Anke, and Steven Schockaert. 2020. On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4013–4023, Marseille, France. European Language Resources Association.
Cite (Informal):
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning (Doval et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.495.pdf
Data
XNLI