Samin Fakharian


2021

pdf bib
Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity
Samin Fakharian | Paul Cook
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Potentially idiomatic expressions (PIEs) are ambiguous between non-compositional idiomatic interpretations and transparent literal interpretations. For example, “hit the road” can have an idiomatic meaning corresponding to ‘start a journey’ or have a literal interpretation. In this paper we propose a supervised model based on contextualized embeddings for predicting whether usages of PIEs are idiomatic or literal. We consider monolingual experiments for English and Russian, and show that the proposed model outperforms previous approaches, including in the case that the model is tested on instances of PIE types that were not observed during training. We then consider cross-lingual experiments in which the model is trained on PIE instances in one language, English or Russian, and tested on the other language. We find that the model outperforms baselines in this setting. These findings suggest that contextualized embeddings are able to learn representations that encode knowledge of idiomaticity that is not restricted to specific expressions, nor to a specific language.

pdf bib
UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders
Milton King | Ali Hakimi Parizi | Samin Fakharian | Paul Cook
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.