Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality

Thomas Pickard


Abstract
This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.
Anthology ID:
2020.mwe-1.12
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:
December
Year:
2020
Address:
online
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
95–100
Language:
URL:
https://aclanthology.org/2020.mwe-1.12
DOI:
Bibkey:
Cite (ACL):
Thomas Pickard. 2020. Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 95–100, online. Association for Computational Linguistics.
Cite (Informal):
Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality (Pickard, MWE 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.mwe-1.12.pdf