GermEval Shared Task (2022)


up

pdf (full)
bib (full)
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

pdf bib
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
Sebastian Möller | Salar Mohtaj | Babak Naderi

pdf bib
Overview of the GermEval 2022 Shared Task on Text Complexity Assessment of German Text
Salar Mohtaj | Babak Naderi | Sebastian Möller

In this paper we present the GermEval 2022 shared task on Text Complexity Assessment of German text. Text forms an integral part of exchanging information and interacting with the world, correlating with quality and experience of life. Text complexity is one of the factors which affects a reader’s understanding of a text. The mapping of a body of text to a mathematical unit quantifying the degree of readability is the basis of complexity assessment. As readability might be influenced by representation, we only target the text complexity for readers in this task. We designed the task as text regression in which participants developed models to predict complexity of pieces of text for a German learner in a range from 1 to 7. The shared task is organized in two phases; the development and the test phases. Among 24 participants who registered for the shared task, ten teams submitted their results on the test data.

pdf bib
Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022
Ulf A. Hamster

The German Text Complexity Assessment Shared Task in KONVENS 2022 explores how to predict a complexity score for sentence examples from language learners’ perspective. Our modeling approach for this shared task utilizes off-the-shelf NLP tools for feature engineering and a Random Forest regression model. We identified the text length, or resp. the logarithm of a sentence’s string length, as the most important feature to predict the complexity score. Further analysis showed that the Pearson correlation between text length and complexity score is about 𝜌 ≈ 0.777. A sensitivity analysis on the loss function revealed that semantic SBert features impact the complexity score as well.

pdf bib
HIIG at GermEval 2022: Best of Both Worlds Ensemble for Automatic Text Complexity Assessment
Hadi Asghari | Freya Hewett

In this paper we explain HIIG’s contribution to the shared task Text Complexity DE Challenge 2022. Our best-performing model for the task of automatically determining the complexity level of a German-language sentence is a combination of a transformer model and a classic feature-based model, which achieves a mapped root square mean error of 0.446 on the test data.

pdf bib
TUM Social Computing at GermEval 2022: Towards the Significance of Text Statistics and Neural Embeddings in Text Complexity Prediction
Miriam Anschütz | Georg Groh

In this paper, we describe our submission to the GermEval 2022 Shared Task on Text Complexity Assessment of German Text. It addresses the problem of predicting the complexity of German sentences on a continuous scale. While many related works still rely on handcrafted statistical features, neural networks have emerged as state-of-the-art in other natural language processing tasks. Therefore, we investigate how both can complement each other and which features are most relevant for text complexity prediction in German. We propose a fine-tuned German DistilBERT model enriched with statistical text features that achieved fourth place in the shared task with a RMSE of 0.481 on the competition’s test data.

pdf bib
HHUplexity at Text Complexity DE Challenge 2022
David Arps | Jan Kels | Florian Krämer | Yunus Renz | Regina Stodden | Wiebke Petersen

In this paper, we describe our submission to the ‘Text Complexity DE Challenge 2022’ shared task on predicting the complexity of German sentences. We compare performance of different feature-based regression architectures and transformer language models. Our best candidate is a fine-tuned German Distilbert model that ignores linguistic features of the sentences. Our model ranks 7th place in the shared task.

pdf bib
Pseudo-Labels Are All You Need
Bogdan Kostić | Mathis Lucka | Julian Risch

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches. In this paper, we present our submission to the Text Complexity DE Challenge 2022, a regression task where the goal is to predict the complexity of a German sentence for German learners at level B. Our approach relies on more than 220,000 pseudolabels created from the German Wikipedia and other corpora to train Transformer-based models, and refrains from any feature engineering or any additional, labeled data. We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task and therefore could be easily adapted to other domains and tasks.

pdf bib
Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation
Alejandro Mosquera

This paper describes the winning approach in the first automated German text complexity assessment shared task as part of KONVENS 2022. To solve this difficult problem, the evaluated system relies on an ensemble of regression models that successfully combines both traditional feature engineering and pre-trained resources. Moreover, the use of adversarial validation is proposed as a method for countering the data drift identified during the development phase, thus helping to select relevant models and features and avoid leaderboard overfitting. The best submission reached 0.43 mapped RMSE on the test set during the final phase of the competition.

pdf bib
Text Complexity DE Challenge 2022 Submission Description: Pairwise Regression for Complexity Prediction
Leander Girrbach

This paper describes our submission to the Text Complexity DE Challenge 2022 (Mohtaj et al., 2022). We evaluate a pairwise regression model that predicts the relative difference in complexity of two sentences, instead of predicting a complexity score from a single sentence. In consequence, the model returns samples of scores (as many as there are training sentences) instead of a point estimate. Due to an error in the submission, test set results are unavailable. However, we show by cross-validation that pairwise regression does not improve performance over standard regression models using sentence embeddings taken from pretrained language models as input. Furthermore, we do not find the distribution standard deviations to reflect differences in “uncertainty” of the model predictions in an useful way.

pdf bib
TUM sebis at GermEval 2022: A Hybrid Model Leveraging Gaussian Processes and Fine-Tuned XLM-RoBERTa for German Text Complexity Analysis
Juraj Vladika | Stephen Meisenbacher | Florian Matthes

The task of quantifying the complexity of written language presents an interesting endeavor, particularly in the opportunity that it presents for aiding language learners. In this pursuit, the question of what exactly about natural language contributes to its complexity (or lack thereof) is an interesting point of investigation. We propose a hybrid approach, utilizing shallow models to capture linguistic features, while leveraging a fine-tuned embedding model to encode the semantics of input text. By harmonizing these two methods, we achieve competitive scores in the given metric, and we demonstrate improvements over either singular method. In addition, we uncover the effectiveness of Gaussian processes in the training of shallow models for text complexity analysis.

pdf bib
Automatic Readability Assessment of German Sentences with Transformer Ensembles
Patrick Gustav Blaneck | Tobias Bornheim | Niklas Grieger | Stephan Bialonski

Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0:435.