Greta Smolenska


2021

pdf bib
CLULEX at SemEval-2021 Task 1: A Simple System Goes a Long Way
Greta Smolenska | Peter Kolb | Sinan Tang | Mironas Bitinis | Héctor Hernández | Elin Asklöv
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the system we submitted to the first Lexical Complexity Prediction (LCP) Shared Task 2021. The Shared Task provides participants with a new English dataset that includes context of the target word. We participate in the single-word complexity prediction sub-task and focus on feature engineering. Our best system is trained on linguistic features and word embeddings (Pearson’s score of 0.7942). We demonstrate, however, that a simpler feature set achieves comparable results and submit a model trained on 36 linguistic features (Pearson’s score of 0.7925).