TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction

Miriam Anschütz, Georg Groh


Abstract
In this paper, we describe our submission to the GermEval 2022 Shared Task on Text Complexity Assessment of German Text. It addresses the problem of predicting the complexity of German sentences on a continuous scale. While many related works still rely on handcrafted statistical features, neural networks have emerged as state-of-the-art in other natural language processing tasks. Therefore, we investigate how both can complement each other and which features are most relevant for text complexity prediction in German. We propose a fine-tuned German DistilBERT model enriched with statistical text features that achieved fourth place in the shared task with a RMSE of 0.481 on the competition’s test data.
Anthology ID:
2022.germeval-1.4
Volume:
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
Month:
September
Year:
2022
Address:
Potsdam, Germany
Editors:
Sebastian Möller, Salar Mohtaj, Babak Naderi
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21–26
Language:
URL:
https://aclanthology.org/2022.germeval-1.4
DOI:
Bibkey:
Cite (ACL):
Miriam Anschütz and Georg Groh. 2022. TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, pages 21–26, Potsdam, Germany. Association for Computational Linguistics.
Cite (Informal):
TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction (Anschütz & Groh, GermEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.germeval-1.4.pdf