UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features

Gustavo Henrique Paetzold


Abstract
We describe the UTFPR systems submitted to the Lexical Complexity Prediction shared task of SemEval 2021. They perform complexity prediction by combining classic features, such as word frequency, n-gram frequency, word length, and number of senses, with BERT vectors. We test numerous feature combinations and machine learning models in our experiments and find that BERT vectors, even if not optimized for the task at hand, are a great complement to classic features. We also find that employing the principle of compositionality can potentially help in phrase complexity prediction. Our systems place 45th out of 55 for single words and 29th out of 38 for phrases.
Anthology ID:
2021.semeval-1.78
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venue:
SemEval
SIGs:
SIGSEM | SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
617–622
Language:
URL:
https://aclanthology.org/2021.semeval-1.78
DOI:
10.18653/v1/2021.semeval-1.78
Bibkey:
Cite (ACL):
Gustavo Henrique Paetzold. 2021. UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 617–622, Online. Association for Computational Linguistics.
Cite (Informal):
UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features (Paetzold, SemEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.78.pdf