Should All Cross-Lingual Embeddings Speak English?

Antonios Anastasopoulos, Graham Neubig


Abstract
Most of recent work in cross-lingual word embeddings is severely Anglocentric. The vast majority of lexicon induction evaluation dictionaries are between English and another language, and the English embedding space is selected by default as the hub when learning in a multilingual setting. With this work, however, we challenge these practices. First, we show that the choice of hub language can significantly impact downstream lexicon induction zero-shot POS tagging performance. Second, we both expand a standard English-centered evaluation dictionary collection to include all language pairs using triangulation, and create new dictionaries for under-represented languages. Evaluating established methods over all these language pairs sheds light into their suitability for aligning embeddings from distant languages and presents new challenges for the field. Finally, in our analysis we identify general guidelines for strong cross-lingual embedding baselines, that extend to language pairs that do not include English.
Anthology ID:
2020.acl-main.766
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8658–8679
Language:
URL:
https://aclanthology.org/2020.acl-main.766
DOI:
10.18653/v1/2020.acl-main.766
Bibkey:
Cite (ACL):
Antonios Anastasopoulos and Graham Neubig. 2020. Should All Cross-Lingual Embeddings Speak English?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8658–8679, Online. Association for Computational Linguistics.
Cite (Informal):
Should All Cross-Lingual Embeddings Speak English? (Anastasopoulos & Neubig, ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.766.pdf
Video:
 http://slideslive.com/38928883
Code
 antonisa/embeddings
Data
Universal DependenciesWikiMatrix