From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources

Ilia Kuznetsov, Iryna Gurevych


Abstract
Distributional word representations (often referred to as word embeddings) are omnipresent in modern NLP. Early work has focused on building representations for word types, and recent studies show that lemmatization and part of speech (POS) disambiguation of targets in isolation improve the performance of word embeddings on a range of downstream tasks. However, the reasons behind these improvements, the qualitative effects of these operations and the combined performance of lemmatized and POS disambiguated targets are less studied. This work aims to close this gap and puts previous findings into a general perspective. We examine the effect of lemmatization and POS typing on word embedding performance in a novel resource-based evaluation scenario, as well as on standard similarity benchmarks. We show that these two operations have complimentary qualitative and vocabulary-level effects and are best used in combination. We find that the improvement is more pronounced for verbs and show how lemmatization and POS typing implicitly target some of the verb-specific issues. We claim that the observed improvement is a result of better conceptual alignment between word embeddings and lexical resources, stressing the need for conceptually plausible modeling of word embedding targets.
Anthology ID:
C18-1020
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
233–244
Language:
URL:
https://aclanthology.org/C18-1020
DOI:
Bibkey:
Cite (ACL):
Ilia Kuznetsov and Iryna Gurevych. 2018. From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources. In Proceedings of the 27th International Conference on Computational Linguistics, pages 233–244, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources (Kuznetsov & Gurevych, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1020.pdf
Code
 UKPLab/coling2018-wcs