Stanford MLab at SemEval-2021 Task 1: Tree-Based Modelling of Lexical Complexity using Word Embeddings

Erik Rozi, Niveditha Iyer, Gordon Chi, Enok Choe, Kathy J. Lee, Kevin Liu, Patrick Liu, Zander Lack, Jillian Tang, Ethan A. Chi


Abstract
This paper presents our system for the single- and multi-word lexical complexity prediction tasks of SemEval Task 1: Lexical Complexity Prediction. Text comprehension depends on the reader’s ability to understand the words present in it; evaluating the lexical complexity of such texts can enable readers to find an appropriate text and systems to tailor a text to an audience’s needs. We present our model pipeline, which applies a combination of embedding-based and manual features to predict lexical complexity on the CompLex English dataset using various tree-based and linear models. Our method is ranked 27 / 54 on single-word prediction and 14 / 37 on multi-word prediction.
Anthology ID:
2021.semeval-1.89
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP | SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
688–693
Language:
URL:
https://aclanthology.org/2021.semeval-1.89
DOI:
10.18653/v1/2021.semeval-1.89
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.89.pdf