Regina Stodden

2025

We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR (Common European Framework of Reference) scale in 13 languages. To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modeling across tasks and languages. To demonstrate its utility, we conduct benchmark experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results further support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution in language proficiency research by standardising dataset formats and promoting their accessibility to the global research community.

pdf bib
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Matthew Shardlow | Fernando Alva-Manchego | Kai North | Regina Stodden | Horacio Saggion | Nouran Khallaf | Akio Hayakawa
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

pdf bib abs
Findings of the TSAR 2025 Shared Task on Readability-Controlled Text Simplification
Fernando Alva-Manchego | Regina Stodden | Joseph Marvin Imperial | Abdullah Barayan | Kai North | Harish Tayyar Madabushi
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

This paper presents the findings of the first Shared Task on Readability-Controlled Text Simplification at TSAR 2025. The task required systems to simplify English texts to specific target readability levels of the Common European Framework of Reference for Languages (CEFR). We received 48 submissions from 20 participating teams, with approaches predominantly based on large language models (LLMs), which included iterative refinement, multi-agent setups, and LLM-as-a-judge pipelines. For this shared task, we developed a new dataset of pedagogical texts and evaluated submissions using a weighted combination of semantic similarity and CEFR-level accuracy. The results of the participating teams demonstrate that while LLMs can perform substantially well on this task, dependable and controlled simplification often requires complex, multi-iterative processes. Our findings also suggest that the capabilities of current systems are beginning to saturate existing automatic evaluation metrics, underscoring the need for reevaluation and practicality.

2024

pdf bib abs
Using Discourse Connectives to Test Genre Bias in Masked Language Models
Heidrun Dorgeloh | Lea Kawaletz | Simon Stein | Regina Stodden | Stefan Conrad
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

This paper presents evidence for an effect of genre on the use of discourse connectives in argumentation. Drawing from discourse processing research on reasoning based structures, we use fill-mask computation to measure genre-induced expectations of argument realisation, and beta regression to model the probabilities of these realisations against a set of predictors. Contrasting fill-mask probabilities for the presence or absence of a discourse connective in baseline and finetuned language models reveals that genre introduces biases for the realisation of argument structure. These outcomes suggest that cross-domain discourse processing, but also argument mining, should take into account generalisations about specific features, such as connectives, and their probability related to the genre context.

pdf bib abs
Can Text Simplification Help to Increase the Acceptance of E-participation?
Regina Stodden | Phillip Nguyen
Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024

This study investigated the effect of text simplification (with and without artificial intelligence support) and the role of participants (author or reader) on the acceptance of e-participation processes. Therefore, a near-realistic experimental study with 276 participants was conducted simulating a participatory budgeting process. The results of our study show, on the one hand, that text simplification and the role of participants has no direct influence on the intention to use e-participation. Although a higher level of participation cannot be achieved by text simplification, our results also show that no negative consequences for usage intention can be expected from text simplification. On the other hand, the results show that people with reading and writing difficulties prefer text simplification for proposals in e-participation.

pdf bib abs
Reproduction & Benchmarking of German Text Simplification Systems
Regina Stodden
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

The paper investigates the reproducibility of various approaches to automatically simplify German texts and identifies key challenges in the process. We reproduce eight sentence simplification systems including rules-based models, fine-tuned models, and prompting of autoregressive models. We highlight three main issues of reproducibility: the impossibility of reproduction due to missing details, code, or restricted access to data/models; variations in reproduction, hindering meaningful comparisons; and discrepancies in evaluation scores between reported and reproduced models. To enhance reproducibility and facilitate model comparison, we recommend the publication of model-related details, including checkpoints, code, and training methodologies. Our study also emphasizes the importance of releasing system generations, when possible, for thorough analysis and better understanding of original works. In our effort to compare reproduced models, we also create a German sentence simplification benchmark of the eight models across six test sets. Overall, the study underscores the significance of transparency, documentation, and diverse training data for advancing reproducibility and meaningful model comparison in automated German text simplification.

pdf bib
Proceedings of GermEval 2024 Shared Task on Statement Segmentation in German Easy Language (StaGE)
Thorben Schomacker | Miriam Anschütz | Regina Stodden
Proceedings of GermEval 2024 Shared Task on Statement Segmentation in German Easy Language (StaGE)

pdf bib
Overview of the GermEval 2024 Shared Task on Statement Segmentation in German Easy Language (StaGE)
Thorben Schomacker | Miriam Anschütz | Regina Stodden | Georg Groh | Marina Tropmann-Frick
Proceedings of GermEval 2024 Shared Task on Statement Segmentation in German Easy Language (StaGE)

pdf bib
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
Matthew Shardlow | Horacio Saggion | Fernando Alva-Manchego | Marcos Zampieri | Kai North | Sanja Štajner | Regina Stodden
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

pdf bib abs
EASSE-DE & EASSE-multi: Easier Automatic Sentence Simplification Evaluation for German & Multiple Languages
Regina Stodden
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

In this work, we propose EASSE-multi, a framework for easier automatic sentence evaluation for languages other than English. Compared to the original EASSE framework, EASSE-multi does not focus only on English.It contains tokenizers and versions of text simplification evaluation metrics which are suitable for multiple languages. In this paper, we exemplify the usage of EASSE-multi for German TS resulting in EASSE-DE. Further, we compare text simplification results when evaluating with different language or tokenization settings of the metrics. Based on this, we formulate recommendations on how to make the evaluation of (German) TS models more transparent and better comparable. Additionally, we present a benchmark on German TS evaluated with EASSE-DE and make its resources (i.e., test sets, system outputs, and evaluation reports) available. The code of EASSE-multi and its German specialisation (EASSE-DE) can be found at https://github.com/rstodden/easse-multi and https://github.com/rstodden/easse-de.

2023

pdf bib abs
DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification
Regina Stodden | Omar Momen | Laura Kallmeyer
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text simplification is an intralingual translation task in which documents, or sentences of a complex source text are simplified for a target audience. The success of automatic text simplification systems is highly dependent on the quality of parallel data used for training and evaluation. To advance sentence simplification and document simplification in German, this paper presents DEplain, a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” or in German: “Einfache Sprache”. DEplain consists of a news-domain (approx. 500 document pairs, approx. 13k sentence pairs) and a web-domain corpus (approx. 150 aligned documents, approx. 2k aligned sentence pairs). In addition, we are building a web harvester and experimenting with automatic alignment methods to facilitate the integration of non-aligned and to be-published parallel documents. Using this approach, we are dynamically increasing the web-domain corpus, so it is currently extended to approx. 750 document pairs and approx. 3.5k aligned sentence pairs. We show that using DEplain to train a transformer-based seq2seq text simplification model can achieve promising results. We make available the corpus, the adapted alignment methods for German, the web harvester and the trained models here: https://github.com/rstodden/DEPlain.

pdf bib abs
Using Masked Language Model Probabilities of Connectives for Stance Detection in English Discourse
Regina Stodden | Laura Kallmeyer | Lea Kawaletz | Heidrun Dorgeloh
Proceedings of the 10th Workshop on Argument Mining

This paper introduces an approach which operationalizes the role of discourse connectives for detecting argument stance. Specifically, the study investigates the utility of masked language model probabilities of discourse connectives inserted between a claim and a premise that supports or attacks it. The research focuses on a range of connectives known to signal support or attack, such as because, but, so, or although. By employing a LightGBM classifier, the study reveals promising results in stance detection in English discourse. While the proposed system does not aim to outperform state-of-the-art architectures, the classification accuracy is surprisingly high, highlighting the potential of these features to enhance argument mining tasks, including stance detection.

2022

pdf bib abs
TS-ANNO: An Annotation Tool to Build, Annotate and Evaluate Text Simplification Corpora
Regina Stodden | Laura Kallmeyer
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We introduce TS-ANNO, an open-source web application for manual creation and for evaluation of parallel corpora for text simplification. TS-ANNO can be used for i) sentence–wise alignment, ii) rating alignment pairs (e.g., w.r.t. grammaticality, meaning preservation, ...), iii) annotating alignment pairs w.r.t. simplification transformations (e.g., lexical substitution, sentence splitting, ...), and iv) manual simplification of complex documents. For evaluation, TS-ANNO calculates inter-annotator agreement of alignments (i) and annotations (ii).

In this paper, we describe our submission to the ‘Text Complexity DE Challenge 2022’ shared task on predicting the complexity of German sentences. We compare performance of different feature-based regression architectures and transformer language models. Our best candidate is a fine-tuned German Distilbert model that ignores linguistic features of the sentences. Our model ranks 7th place in the shared task.

2021

pdf bib abs
RS_GV at SemEval-2021 Task 1: Sense Relative Lexical Complexity Prediction
Regina Stodden | Gayatri Venugopal
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We present the technical report of the system called RS_GV at SemEval-2021 Task 1 on lexical complexity prediction of English words. RS_GV is a neural network using hand-crafted linguistic features in combination with character and word embeddings to predict target words’ complexity. For the generation of the hand-crafted features, we set the target words in relation to their senses. RS_GV predicts the complexity well of biomedical terms but it has problems with the complexity prediction of very complex and very simple target words.

2020

pdf bib abs
Do you Feel Certain about your Annotation? A Web-based Semantic Frame Annotation Tool Considering Annotators’ Concerns and Behaviors
Regina Stodden | Behrang QasemiZadeh | Laura Kallmeyer
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this system demonstration paper, we present an open-source web-based application with a responsive design for modular semantic frame annotation (SFA). Besides letting experienced and inexperienced users do suggestion-based and slightly-controlled annotations, the system keeps track of the time and changes during the annotation process and stores the users’ confidence with the current annotation. This collected metadata can be used to get insights regarding the difficulty of an annotation with the same type or frame or can be used as an input of an annotation cost measurement for an active learning algorithm. The tool was already used to build a manually annotated corpus with semantic frames and its arguments for task 2 of SemEval 2019 regarding unsupervised lexical frame induction (QasemiZadeh et al., 2019). Although English sentences from the Wall Street Journal corpus of the Penn Treebank were annotated for this task, it is also possible to use the proposed tool for the annotation of sentences in other languages.

pdf bib abs
A multi-lingual and cross-domain analysis of features for text simplification
Regina Stodden | Laura Kallmeyer
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

In text simplification and readability research, several features have been proposed to estimate or simplify a complex text, e.g., readability scores, sentence length, or proportion of POS tags. These features are however mainly developed for English. In this paper, we investigate their relevance for Czech, German, English, Spanish, and Italian text simplification corpora. Our multi-lingual and multi-domain corpus analysis shows that the relevance of different features for text simplification is different per corpora, language, and domain. For example, the relevance of the lexical complexity is different across all languages, the BLEU score across all domains, and 14 features within the web domain corpora. Overall, the negative statistical tests regarding the other features across and within domains and languages lead to the assumption that text simplification models may be transferable between different domains or different languages.

2019

pdf bib abs
SemEval-2019 Task 2: Unsupervised Lexical Frame Induction
Behrang QasemiZadeh | Miriam R. L. Petruck | Regina Stodden | Laura Kallmeyer | Marie Candito
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents Unsupervised Lexical Frame Induction, Task 2 of the International Workshop on Semantic Evaluation in 2019. Given a set of prespecified syntactic forms in context, the task requires that verbs and their arguments be clustered to resemble semantic frame structures. Results are useful in identifying polysemous words, i.e., those whose frame structures are not easily distinguished, as well as discerning semantic relations of the arguments. Evaluation of unsupervised frame induction methods fell into two tracks: Task A) Verb Clustering based on FrameNet 1.7; and B) Argument Clustering, with B.1) based on FrameNet’s core frame elements, and B.2) on VerbNet 3.2 semantic roles. The shared task attracted nine teams, of whom three reported promising results. This paper describes the task and its data, reports on methods and resources that these systems used, and offers a comparison to human annotation.

pdf bib abs
A Neural Graph-based Approach to Verbal MWE Identification
Jakub Waszczuk | Rafael Ehren | Regina Stodden | Laura Kallmeyer
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

We propose to tackle the problem of verbal multiword expression (VMWE) identification using a neural graph parsing-based approach. Our solution involves encoding VMWE annotations as labellings of dependency trees and, subsequently, applying a neural network to model the probabilities of different labellings. This strategy can be particularly effective when applied to discontinuous VMWEs and, thanks to dense, pre-trained word vector representations, VMWEs unseen during training. Evaluation of our approach on three PARSEME datasets (German, French, and Polish) shows that it allows to achieve performance on par with the previous state-of-the-art (Al Saied et al., 2018).

2018

pdf bib abs
TRAPACC and TRAPACCS at PARSEME Shared Task 2018: Neural Transition Tagging of Verbal Multiword Expressions
Regina Stodden | Behrang QasemiZadeh | Laura Kallmeyer
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

We describe the TRAPACC system and its variant TRAPACCS that participated in the closed track of the PARSEME Shared Task 2018 on labeling verbal multiword expressions (VMWEs). TRAPACC is a modified arc-standard transition system based on Constant and Nivre’s (2016) model of joint syntactic and lexical analysis in which the oracle is approximated using a classifier. For TRAPACC, the classifier consists of a data-independent dimension reduction and a convolutional neural network (CNN) for learning and labelling transitions. TRAPACCS extends TRAPACC by replacing the softmax layer of the CNN with a support vector machine (SVM). We report the results obtained for 19 languages, for 8 of which our system yields the best results compared to other participating systems in the closed-track of the shared task.

Venues

law1