Navnita Nandakumar
2019
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
Navnita Nandakumar
|
Timothy Baldwin
|
Bahar Salehi
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.
2018
A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions
Navnita Nandakumar
|
Bahar Salehi
|
Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2018
In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions("MWEs"). Our experimental results suggest that character- and document-level models capture knowledge of MWE compositionality and are effective in modelling varying levels of compositionality, with the advantage over word-level models that they do not require token-level identification of MWEs in the training corpus.