LLM-as-a-Judge for Low-Resource Languages: Adapting Ragas and Comparative Ranking for Romanian

Claudiu Creanga, Liviu P Dinu


Abstract
Evaluating Retrieval-Augmented Generation (RAG) systems remains a challenge for Low-Resource Languages (LRLs), where standard reference-based metrics fall short. This paper investigates the viability of the "LLM-as-a-Judge" paradigm for Romanian by adapting the Ragas framework using next-generation models (Gemini 2.5 and Gemini 3). We introduce AdminRo-Eval, a curated dataset of Romanian administrative documents annotated by native speakers, to serve as a ground truth for benchmarking automated evaluators. We compare three evaluation methodologies—direct scoring, comparative ranking, and granular decomposition—across metrics for Faithfulness, Answer Relevance, and Context Relevance. Our findings reveal that evaluation strategies must be metric-specific: granular decomposition achieves the highest human alignment for Faithfulness (96% with Gemini 2.5 Pro), while comparative ranking outperforms in Answer Relevance (90%). Furthermore, we demonstrate that while lightweight models struggle with complex reasoning in LRLs, the Gemini 2.5 Pro architecture establishes a robust, transferable baseline for automated Romanian RAG evaluation.
Anthology ID:
2026.loreslm-1.15
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–167
Language:
URL:
https://aclanthology.org/2026.loreslm-1.15/
DOI:
Bibkey:
Cite (ACL):
Claudiu Creanga and Liviu P Dinu. 2026. LLM-as-a-Judge for Low-Resource Languages: Adapting Ragas and Comparative Ranking for Romanian. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 157–167, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
LLM-as-a-Judge for Low-Resource Languages: Adapting Ragas and Comparative Ranking for Romanian (Creanga & Dinu, LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.15.pdf