Mădălina Chitez

Also published as: Madalina Chitez


2024

pdf bib
Towards a Romanian Phrasal Academic Lexicon
Madalina Chitez | Ana-Maria Bucur | Andreea Dinca | Roxana Rogobete
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

The lack of NLP based research studies on academic writing in Romania results in an unbalanced development of automatic support tools in Romanian compared to other languages, such as English. For this study, we use Romanian subsets of two bilingual academic writing corpora: the ROGER corpus, consisting of university student papers, and the EXPRES corpus, composed of expert research articles. Working with the Romanian Academic Word List / RoAWL, we present two phrase extraction phases: (i) use Ro-AWL words as node words to extract collocations according to the thresholds of statistical measures and (ii) classify extracted phrases into general versus domain-specific multi-word units. We show how manual rhetorical function annotation of resulting phrases can be combined with automatic function detection. The comparison between academic phrases in ROGER and EXPRES validates the final phrase list. The Romanian phrasal academic lexicon (ROPAL), similar to the Oxford Phrasal Academic Lexicon (OPAL), is a written academic phrase lexicon for Romanian language made available for academic use and further research or applications.

pdf bib
Towards Building the LEMI Readability Platform for Children’s Literature in the Romanian Language
Madalina Chitez | Mihai Dascalu | Aura Cristina Udrea | Cosmin Strilețchi | Karla Csürös | Roxana Rogobete | Alexandru Oravițan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Readability is a crucial characteristic of texts, greatly influencing comprehension and reading efficacy. Unfortunately, limited research is available for less-resourced languages, especially for young populations where its impact is even higher. This paper introduces a new readability tool for children’s literature in the Romanian language, explicitly targeting primary school students aged 7-11. The tool consists of a digital repository of school reading texts (self-compiled corpus) and a text analysis interface that generates automatic readability reports for uploaded short texts. The methodology involves extracting, testing, and calibrating a readability formula for Romanian using the children’s literature corpus. Related work on readability and readability tools is discussed, followed by a description of the children’s literature corpus and the platform functionalities. The first steps are presented towards validating the readability formula for children’s literature in Romanian using the ReaderBench framework, while calibration variables relevant to the Romanian language and children’s literature are examined. Currently, no existing platform integrates a research-based readability formula for the Romanian language, making this tool unique. Overall, this research contributes to applied corpus linguistics and Digital Humanities studies and offers a valuable resource for educators, parents, and children in accessing age-appropriate and readable texts.

2023

pdf bib
Automatic Extraction of the Romanian Academic Word List: Data and Methods
Ana-Maria Bucur | Andreea Dincă | Madalina Chitez | Roxana Rogobete
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

This paper presents the methodology and data used for the automatic extraction of the Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 teaching contexts. For the Romanian language, no such resource exists so far. Ro-AWL has been generated by combining methods from corpus and computational linguistics with L2 academic writing approaches. We use two types of data: (a) existing data, such as the Romanian Frequency List based on the ROMBAC corpus, and (b) self-compiled data, such as the expert academic writing corpus EXPRES. For constructing the academic word list, we follow the methodology for building the Academic Vocabulary List for the English language. The distribution of Ro-AWL features (general distribution, POS distribution) into four disciplinary datasets is in line with previous research. Ro-AWL is freely available and can be used for teaching, research and NLP applications.

2022

pdf bib
EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing
Ana-Maria Bucur | Madalina Chitez | Valentina Muresan | Andreea Dinca | Roxana Rogobete
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Field Specific Expert Scientific Writing in English as a Lingua Franca is essential for the effective research networking and dissemination worldwide. Extracting the linguistic profile of the research articles written in L2 English can help young researchers and expert scholars in various disciplines adapt to the scientific writing norms of their communities of practice. In this exploratory study, we present and test an automated linguistic assessment model that includes features relevant for the cross-disciplinary second language framework: Text Complexity Analysis features, such as Syntactic and Lexical Complexity, and Field Specific Academic Word Lists. We analyse how these features vary across four disciplinary fields (Economics, IT, Linguistics and Political Science) in a corpus of L2-English Expert Scientific Writing, part of the EXPRES corpus (Corpus of Expert Writing in Romanian and English). The variation in field specific writing is also analysed in groups of linguistic features extracted from the higher visibility (Hv) versus lower visibility (Lv) journals. After applying lexical sophistication, lexical variation and syntactic complexity formulae, significant differences between disciplines were identified, mainly that research articles from Lv journals have higher lexical complexity, but lower syntactic complexity than articles from Hv journals; while academic vocabulary proved to have discipline specific variation.

pdf bib
ParlaMint-RO: Chamber of the Eternal Future
Petru Rebeja | Mădălina Chitez | Roxana Rogobete | Andreea Dincă | Loredana Bercuci
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

The present paper aims to describe the collection of ParlaMint-RO corpus and to analyse several trends in parliamentary debates (plenary sessions of the Lower House) held in between 2000 and 2020). After a short description of the data collection (of existing transcripts), the workflow of data processing (text extraction, conversion, encoding, linguistic annotation), and an overview of the corpus, the paper will move on to a multi-layered linguistic analysis to validate interdisciplinary perspectives. We use computational methods and corpus linguistics approaches to scrutinize the future tense forms used by Romanian speakers, in order to create a data-supported profile of the parliamentary group strategies and planning.