Greta Smolenska

2021

This paper presents the system we submitted to the first Lexical Complexity Prediction (LCP) Shared Task 2021. The Shared Task provides participants with a new English dataset that includes context of the target word. We participate in the single-word complexity prediction sub-task and focus on feature engineering. Our best system is trained on linguistic features and word embeddings (Pearson’s score of 0.7942). We demonstrate, however, that a simpler feature set achieves comparable results and submit a model trained on 36 linguistic features (Pearson’s score of 0.7925).

Co-authors

Elin Asklöv 1
Mironas Bitinis 1
Héctor Hernández 1
Peter Kolb 1
Sinan Tang 1

Venues

SemEval1

Fix author