Aura Cristina Udrea


2024

pdf bib
Towards Building the LEMI Readability Platform for Children’s Literature in the Romanian Language
Madalina Chitez | Mihai Dascalu | Aura Cristina Udrea | Cosmin Strilețchi | Karla Csürös | Roxana Rogobete | Alexandru Oravițan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Readability is a crucial characteristic of texts, greatly influencing comprehension and reading efficacy. Unfortunately, limited research is available for less-resourced languages, especially for young populations where its impact is even higher. This paper introduces a new readability tool for children’s literature in the Romanian language, explicitly targeting primary school students aged 7-11. The tool consists of a digital repository of school reading texts (self-compiled corpus) and a text analysis interface that generates automatic readability reports for uploaded short texts. The methodology involves extracting, testing, and calibrating a readability formula for Romanian using the children’s literature corpus. Related work on readability and readability tools is discussed, followed by a description of the children’s literature corpus and the platform functionalities. The first steps are presented towards validating the readability formula for children’s literature in Romanian using the ReaderBench framework, while calibration variables relevant to the Romanian language and children’s literature are examined. Currently, no existing platform integrates a research-based readability formula for the Romanian language, making this tool unique. Overall, this research contributes to applied corpus linguistics and Digital Humanities studies and offers a valuable resource for educators, parents, and children in accessing age-appropriate and readable texts.