Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea


Abstract
Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.
Anthology ID:
2021.emnlp-main.476
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5891–5901
Language:
URL:
https://aclanthology.org/2021.emnlp-main.476
DOI:
10.18653/v1/2021.emnlp-main.476
Bibkey:
Cite (ACL):
Laura Burdick, Jonathan K. Kummerfeld, and Rada Mihalcea. 2021. Analyzing the Surprising Variability in Word Embedding Stability Across Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5891–5901, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Analyzing the Surprising Variability in Word Embedding Stability Across Languages (Burdick et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.476.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.476.mp4
Code
 laura-burdick/multilingual-stability