GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs

Marius Dumitran; Angela Dumitran; Alexandra Mihaela Danila

GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs

Marius Dumitran, Angela Dumitran, Alexandra Mihaela Danila

Abstract

Large language models (LLMs) have revolutionised NLP, yet their pedagogical value for low‐resource languages remains unclear. We present GRILE, the first open benchmark of 1 151 multiple‐choice questions harvested from Romanian high‐stakes exams (National Evaluation, Baccalaureate, university admissions). GRILE enables us to probe two complementary abilities of seven state‐of‐the‐art multilingual and Romanian‐specific LLMs: (i) selecting the correct answer, and (ii) producing linguistically faithful explanations. While Gemini 2·5 Pro reaches 83% accuracy, most open‐weight models stay below 65%, and 48% of their explanations contain factual or pedagogical flaws according to expert review. A detailed error analysis pinpoints systematic weaknesses in morphology and in applying the latest DOOM 3 orthographic norms. All data, code and a public web demo are released to catalyse future research. Our findings expose open challenges for trustworthy educational NLP in low‐resource settings and establish GRILE as a new test‐bed for controllable explanation generation and evaluation.

Anthology ID:: 2025.ranlp-1.39
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 316–324
Language:
URL:: https://aclanthology.org/2025.ranlp-1.39/
DOI:
Bibkey:
Cite (ACL):: Marius Dumitran, Angela Dumitran, and Alexandra Mihaela Danila. 2025. GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 316–324, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs (Dumitran et al., RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.39.pdf

PDF Cite Search Fix data