Leveraging Contextual Embeddings and Idiom Principle for Detecting Idiomaticity in Potentially Idiomatic Expressions

Reyhaneh Hashempour, Aline Villavicencio


Abstract
The majority of studies on detecting idiomatic expressions have focused on discovering potentially idiomatic expressions overlooking the context. However, many idioms like blow the whistle could be interpreted idiomatically or literally depending on the context. In this work, we leverage the Idiom Principle (Sinclair et al., 1991) and contextualized word embeddings (CWEs), focusing on Context2Vec (Melamud et al., 2016) and BERT (Devlin et al., 2019) to distinguish between literal and idiomatic senses of such expressions in context. We also experiment with a non-contextualized word embedding baseline, in this case word2Vec (Mikolov et al., 2013) and compare its performance with that of CWEs. The results show that CWEs outperform the non-CWEs, especially when the Idiom Principle is applied, as it improves the results by 6%. We further show that the Context2Vec model, trained based on Idiom Principle, can place potentially idiomatic expressions into distinct ‘sense’ (idiomatic/literal) regions of the embedding space, whereas Word2Vec and BERT seem to lack this capacity. The model is also capable of producing suitable substitutes for ambiguous expressions in context which is promising for downstream tasks like text simplification.
Anthology ID:
2020.cogalex-1.9
Volume:
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Month:
December
Year:
2020
Address:
Online
Editors:
Michael Zock, Emmanuele Chersoni, Alessandro Lenci, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
72–80
Language:
URL:
https://aclanthology.org/2020.cogalex-1.9
DOI:
Bibkey:
Cite (ACL):
Reyhaneh Hashempour and Aline Villavicencio. 2020. Leveraging Contextual Embeddings and Idiom Principle for Detecting Idiomaticity in Potentially Idiomatic Expressions. In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 72–80, Online. Association for Computational Linguistics.
Cite (Informal):
Leveraging Contextual Embeddings and Idiom Principle for Detecting Idiomaticity in Potentially Idiomatic Expressions (Hashempour & Villavicencio, CogALex 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cogalex-1.9.pdf