Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024

Rodrigo Wilkens, Rémi Cardon, Amalia Todirascu, Núria Gala (Editors)

Anthology ID:
Torino, Italia
Bib Export formats:

pdf bib
Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024
Rodrigo Wilkens | Rémi Cardon | Amalia Todirascu | Núria Gala

pdf bib
Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation
Liam Cripwell | Joël Legrand | Claire Gardent

Text simplification intends to make a text easier to read while preserving its core meaning. Intuitively and as shown in previous works, these two dimensions (simplification and meaning preservation) are often-times inversely correlated. An overly conservative text will fail to simplify sufficiently, whereas extreme simplification will degrade meaning preservation. Yet, popular evaluation metrics either aggregate meaning preservation and simplification into a single score (SARI, LENS), or target meaning preservation alone (BERTScore, QuestEval). Moreover, these metrics usually require a set of references and most previous work has only focused on sentence-level simplification. In this paper, we focus on the evaluation of document-level text simplification and compare existing models using distinct metrics for meaning preservation and simplification. We leverage existing metrics from similar tasks and introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation, seldom performing well on both dimensions. Making use of the fact that the metrics we use are all reference-less, we also investigate the performance of existing models when applied to unseen data (where reference simplifications are unavailable).

pdf bib
Malmon: A Crowd-Sourcing Platform for Simple Language
Helgi Björn Hjartarson | Steinunn Rut Friðriksdóttir

This paper presents a crowd-sourcing platform designed to address the need for parallel corpora in the field of Automatic Text Simplification (ATS). ATS aims to automatically reduce the linguistic complexity of text to aid individuals with reading difficulties, such as those with cognitive disorders, dyslexia, children, and non-native speakers. ATS does not only facilitate improved reading comprehension among these groups but can also enhance the preprocessing stage for various NLP tasks through summarization, contextual simplification, and paraphrasing. Our work introduces a language independent, openly accessible platform that crowdsources training data for ATS models, potentially benefiting low-resource languages where parallel data is scarce. The platform can efficiently aid in the collection of parallel corpora by providing a user-friendly data-collection environment. Furthermore, using human crowd-workers for the data collection process offers a potential resource for linguistic research on text simplification practices. The paper discusses the platform’s architecture, built with modern web technologies, and its user-friendly interface designed to encourage widespread participation. Through gamification and a robust admin panel, the platform incentivizes high-quality data collection and engagement from crowdworkers.

pdf bib
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models
Andreas Säuberli | Simon Clematide

Reading comprehension tests are used in a variety of applications, reaching from education to assessing the comprehensibility of simplified texts. However, creating such tests manually and ensuring their quality is difficult and time-consuming. In this paper, we explore how large language models (LLMs) can be used to generate and evaluate multiple-choice reading comprehension items. To this end, we compiled a dataset of German reading comprehension items and developed a new protocol for human and automatic evaluation, including a metric we call text informativity, which is based on guessability and answerability. We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4. Our results suggest that both models are capable of generating items of acceptable quality in a zero-shot setting, but GPT-4 clearly outperforms Llama 2. We also show that LLMs can be used for automatic evaluation by eliciting item reponses from them. In this scenario, evaluation results with GPT-4 were the most similar to human annotators. Overall, zero-shot generation with LLMs is a promising approach for generating and evaluating reading comprehension test items, in particular for languages without large amounts of available data.

pdf bib
An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework
Matthew Shardlow | Fernando Alva-Manchego | Riza Batista-Navarro | Stefan Bott | Saul Calderon Ramirez | Rémi Cardon | Thomas François | Akio Hayakawa | Andrea Horbach | Anna Huelsing | Yusuke Ide | Joseph Marvin Imperial | Adam Nohejl | Kai North | Laura Occhipinti | Nelson Peréz Rojas | Nishat Raihan | Tharindu Ranasinghe | Martin Solis Salazar | Marcos Zampieri | Horacio Saggion

We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.

pdf bib
SIERA: An Evaluation Metric for Text Simplification using the Ranking Model and Data Augmentation by Edit Operations
Hikaru Yamanaka | Takenobu Tokunaga

Automatic evaluation metrics are indispensable for text simplification (TS) research. The past TS research adopts three evaluation aspects: fluency, meaning preservation and simplicity. However, there is little consensus on a metric to measure simplicity, a unique aspect of TS compared with other text generation tasks. In addition, many of the existing metrics require reference simplified texts for evaluation. Thus, the cost of collecting reference texts is also an issue. This study proposes a new automatic evaluation metric, SIERA, for sentence simplification. SIERA employs a ranking model for the order relation of simplicity, which is trained by pairs of the original and simplified sentences. It does not require reference sentences for either training or evaluation. The sentence pairs for training are further augmented by the proposed method that utlizes edit operations to generate intermediate sentences with the simplicity between the original and simplified sentences. Using three evaluation datasets for text simplification, we compare SIERA with other metrics by calculating the correlations between metric values and human ratings. The results showed SIERA’s superiority over other metrics with a reservation that the quality of evaluation sentences is consistent with that of the training data.

pdf bib
Transfer Learning for Russian Legal Text Simplification
Mark Athugodage | Olga Mitrofanove | Vadim Gudkov

We present novel results in legal text simplification for Russian. We introduce the first dataset for such a task in Russian - a parallel corpus based on the data extracted from “Rossiyskaya Gazeta Legal Papers”. In this study we discuss three approaches for text simplification which involve T5 and GPT model architectures. We evaluate the proposed models on a set of metrics: ROUGE, SARI and BERTScore. We also analysed the models’ results on such readability indices as Flesch-Kinkaid Grade Level and Gunning Fog Index. And, finally, we performed human evaluation of simplified texts generated by T5 and GPT models; expertise was carried out by native speakers of Russian and Russian lawyers. In this research we compared T5 and GPT architectures for text simplification task and found out that GPT handles better when it is fine-tuned on dataset of coped texts. Our research makes a big step in improving Russian legal text readability and accessibility for common people.

pdf bib
Accessible Communication: a systematic review and comparative analysis of official English Easy-to-Understand (E2U) language guidelines
Andreea Maria Deleanu | Constantin Orasan | Sabine Braun

Easy-to-Understand (E2U) language varieties have been recognized by the United Nation’s Convention on the Rights of Persons with Disabilities (2006) as a means to guarantee the fundamental right to Accessible Communication. Increased awareness has driven changes in European (European Commission, 2015, 2021; European Parliament, 2016) and International legislation (ODI, 2010), prompting public-sector and other institutions to offer domain-specific content into E2U language to prevent communicative exclusion of those facing cognitive barriers (COGA, 2017; Maaß, 2020; Perego, 2020). However, guidance on what it is that makes language actually ‘easier to understand’ is still fragmented and vague. For this reason, we carried out a systematic review of official guidelines for English Plain Language and Easy Language to identify the most effective lexical, syntactic and adaptation strategies that can reduce complexity in verbal discourse according to official bodies. This article will present the methods and preliminary results of the guidelines analysis.

pdf bib
LanguageTool as a CAT tool for Easy-to-Read in Spanish
Margot Madina | Itziar Gonzalez-Dios | Melanie Siegel

Easy-to-Read (E2R) is an approach to content creation that emphasizes simplicity and clarity in language to make texts more accessible to readers with cognitive challenges or learning disabilities. The Spanish version of E2R is called Lectura Fácil (LF). E2R and its variants, such as LF, focus on straightforward language and structure to enhance readability. The manual production of such texts is both time and resource expensive. In this work, we have developed LFWriteAssist, an authoring support tool that aligns with the guidelines of LF. It is underpinned by the functionalities of LanguageTool, a free and open source grammar, style and spelling checker. Our tool assists in ensuring compliance with LF standard, provides definitions for complex, polysemic, or infrequently used terms, and acronym extensions. The tool is primarily targeted at LF creators, as it serves as an authoring aid, identifying any rule infringements and assisting with language simplifications. However, it can be used by anyone who seek to enhance text readability and inclusivity. The tool’s code is made available as open source, thereby contributing to the wider effort of creating inclusive and comprehensible content.

pdf bib
Paying attention to the words: explaining readability prediction for French as a foreign language
Rodrigo Wilkens | Patrick Watrin | Thomas François

Automatic text Readability Assessment (ARA) has been seen as a way of helping people with reading difficulties. Recent advancements in Natural Language Processing have shifted ARA from linguistic-based models to more precise black-box models. However, this shift has weakened the alignment between ARA models and the reading literature, potentially leading to inaccurate predictions based on unintended factors. In this paper, we investigate the explainability of ARA models, inspecting the relationship between attention mechanism scores, ARA features, and CEFR level predictions made by the model. We propose a method for identifying features associated with the predictions made by a model through the use of the attention mechanism. Exploring three feature families (i.e., psycho-linguistic, work frequency and graded lexicon), we associated features with the model’s attention heads. Finally, while not fully explanatory of the model’s performance, the correlations of these associations surpass those between features and text readability levels.