TUM sebis at GermEval 2022: A Hybrid Model Leveraging Gaussian Processes and Fine-Tuned XLM-RoBERTa for German Text Complexity Analysis

Juraj Vladika, Stephen Meisenbacher, Florian Matthes


Abstract
The task of quantifying the complexity of written language presents an interesting endeavor, particularly in the opportunity that it presents for aiding language learners. In this pursuit, the question of what exactly about natural language contributes to its complexity (or lack thereof) is an interesting point of investigation. We propose a hybrid approach, utilizing shallow models to capture linguistic features, while leveraging a fine-tuned embedding model to encode the semantics of input text. By harmonizing these two methods, we achieve competitive scores in the given metric, and we demonstrate improvements over either singular method. In addition, we uncover the effectiveness of Gaussian processes in the training of shallow models for text complexity analysis.
Anthology ID:
2022.germeval-1.9
Volume:
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
Month:
September
Year:
2022
Address:
Potsdam, Germany
Editors:
Sebastian Möller, Salar Mohtaj, Babak Naderi
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
51–56
Language:
URL:
https://aclanthology.org/2022.germeval-1.9
DOI:
Bibkey:
Cite (ACL):
Juraj Vladika, Stephen Meisenbacher, and Florian Matthes. 2022. TUM sebis at GermEval 2022: A Hybrid Model Leveraging Gaussian Processes and Fine-Tuned XLM-RoBERTa for German Text Complexity Analysis. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, pages 51–56, Potsdam, Germany. Association for Computational Linguistics.
Cite (Informal):
TUM sebis at GermEval 2022: A Hybrid Model Leveraging Gaussian Processes and Fine-Tuned XLM-RoBERTa for German Text Complexity Analysis (Vladika et al., GermEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.germeval-1.9.pdf
Code
 sebischair/text-complexity-de-2022
Data
TextComplexityDE