Karla Csürös


2024

pdf bib
Towards Building the LEMI Readability Platform for Children’s Literature in the Romanian Language
Madalina Chitez | Mihai Dascalu | Aura Cristina Udrea | Cosmin Strilețchi | Karla Csürös | Roxana Rogobete | Alexandru Oravițan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Readability is a crucial characteristic of texts, greatly influencing comprehension and reading efficacy. Unfortunately, limited research is available for less-resourced languages, especially for young populations where its impact is even higher. This paper introduces a new readability tool for children’s literature in the Romanian language, explicitly targeting primary school students aged 7-11. The tool consists of a digital repository of school reading texts (self-compiled corpus) and a text analysis interface that generates automatic readability reports for uploaded short texts. The methodology involves extracting, testing, and calibrating a readability formula for Romanian using the children’s literature corpus. Related work on readability and readability tools is discussed, followed by a description of the children’s literature corpus and the platform functionalities. The first steps are presented towards validating the readability formula for children’s literature in Romanian using the ReaderBench framework, while calibration variables relevant to the Romanian language and children’s literature are examined. Currently, no existing platform integrates a research-based readability formula for the Romanian language, making this tool unique. Overall, this research contributes to applied corpus linguistics and Digital Humanities studies and offers a valuable resource for educators, parents, and children in accessing age-appropriate and readable texts.

2022

pdf bib
Users Hate Blondes: Detecting Sexism in User Comments on Online Romanian News
Andreea Moldovan | Karla Csürös | Ana-maria Bucur | Loredana Bercuci
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Romania ranks almost last in Europe when it comes to gender equality in political representation, with about 10% fewer women in politics than the E.U. average. We proceed from the assumption that this underrepresentation is also influenced by the sexism and verbal abuse female politicians face in the public sphere, especially in online media. We collect a novel dataset with sexist comments in Romanian language from newspaper articles about Romanian female politicians and propose baseline models using classical machine learning models and fine-tuned pretrained transformer models for the classification of sexist language in the online medium.