GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations

Odysseas S. Chlapanis, Dimitris Galanis, Nikolaos Aletras, Ion Androutsopoulos


Abstract
We introduce GreekBarBench, a benchmark that evaluates LLMs on legal questions across five different legal areas from the Greek Bar exams, requiring citations to statutory articles and case facts. To tackle the challenges of free-text evaluation, we propose a three-dimensional scoring system combined with an LLM-as-a-judge approach. We also develop a meta-evaluation benchmark to assess the correlation between LLM-judges and human expert evaluations, revealing that simple, span-based rubrics improve their alignment. Our extensive evaluation of 13 proprietary and open-weight LLMs shows that even though the top models exhibit impressive performance, they remain susceptible to critical errors, most notably a failure to identify the correct statutory articles.
Anthology ID:
2025.findings-emnlp.1368
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25099–25119
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1368/
DOI:
Bibkey:
Cite (ACL):
Odysseas S. Chlapanis, Dimitris Galanis, Nikolaos Aletras, and Ion Androutsopoulos. 2025. GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25099–25119, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GreekBarBench: A Challenging Benchmark for Free-Text Legal Reasoning and Citations (Chlapanis et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1368.pdf
Checklist:
 2025.findings-emnlp.1368.checklist.pdf