Integrating Archaic and Regional Lexicons to Improve the Readability of Old Romanian Texts

Madalina Chitez, Roxana Rogobete, Cristina Aura Udrea, Karla Csürös, Ana-Maria Bucur, Mihai Dascalu


Abstract
Access to age-appropriate texts is critical for young readers’ literacy acquisition. For limited-resourced languages, such as Romanian, this area remains under-researched. As such, we present ongoing work on improving readability for old Romanian texts by applying Large Language Models (LLMs). First, we compiled and cleaned a comprehensive list of archaic and regional terms from lexicographic sources, including DEX online and printed dictionaries. The cleaning process involved duplicate removal, orthographic normalization, context-based filtering, and manual review. Key challenges included distinguishing archaic forms from rare or poetic ones, resolving polysemous entries, and managing inconsistent labeling across sources. Second, LLMs were utilized to validate the archaic and regional nature of identified terms and replace them with modern equivalents, while also determining the appropriate reading level for both original and modified versions. Results show that through the replacement of archaic and regional terms, the appropriate age for the modified texts decreases by approximately 0.5 years for texts extracted from textbooks and canonical writings.
Anthology ID:
2025.ranlp-1.29
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
240–246
Language:
URL:
https://aclanthology.org/2025.ranlp-1.29/
DOI:
Bibkey:
Cite (ACL):
Madalina Chitez, Roxana Rogobete, Cristina Aura Udrea, Karla Csürös, Ana-Maria Bucur, and Mihai Dascalu. 2025. Integrating Archaic and Regional Lexicons to Improve the Readability of Old Romanian Texts. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 240–246, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Integrating Archaic and Regional Lexicons to Improve the Readability of Old Romanian Texts (Chitez et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.29.pdf