Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022

Ulf A. Hamster


Abstract
The German Text Complexity Assessment Shared Task in KONVENS 2022 explores how to predict a complexity score for sentence examples from language learners’ perspective. Our modeling approach for this shared task utilizes off-the-shelf NLP tools for feature engineering and a Random Forest regression model. We identified the text length, or resp. the logarithm of a sentence’s string length, as the most important feature to predict the complexity score. Further analysis showed that the Pearson correlation between text length and complexity score is about 𝜌 ≈ 0.777. A sensitivity analysis on the loss function revealed that semantic SBert features impact the complexity score as well.
Anthology ID:
2022.germeval-1.2
Volume:
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
Month:
September
Year:
2022
Address:
Potsdam, Germany
Editors:
Sebastian Möller, Salar Mohtaj, Babak Naderi
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–14
Language:
URL:
https://aclanthology.org/2022.germeval-1.2
DOI:
Bibkey:
Cite (ACL):
Ulf A. Hamster. 2022. Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, pages 10–14, Potsdam, Germany. Association for Computational Linguistics.
Cite (Informal):
Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022 (Hamster, GermEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.germeval-1.2.pdf
Data
TextComplexityDE