CLULEX at SemEval-2021 Task 1: A Simple System Goes a Long Way

Greta Smolenska, Peter Kolb, Sinan Tang, Mironas Bitinis, Héctor Hernández, Elin Asklöv


Abstract
This paper presents the system we submitted to the first Lexical Complexity Prediction (LCP) Shared Task 2021. The Shared Task provides participants with a new English dataset that includes context of the target word. We participate in the single-word complexity prediction sub-task and focus on feature engineering. Our best system is trained on linguistic features and word embeddings (Pearson’s score of 0.7942). We demonstrate, however, that a simpler feature set achieves comparable results and submit a model trained on 36 linguistic features (Pearson’s score of 0.7925).
Anthology ID:
2021.semeval-1.81
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
632–639
Language:
URL:
https://aclanthology.org/2021.semeval-1.81
DOI:
10.18653/v1/2021.semeval-1.81
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.81.pdf