ELAB: Extensive LLM Alignment Benchmark in Persian Language

Zahra Pourbahman; Fatemeh Rajabi; Mohammadhossein Sadeghi; Omid Ghahroodi; Somayeh Bakhshaei; Arash Amini; Reza Kazemi; Mahdieh Soleymani Baghshah

ELAB: Extensive LLM Alignment Benchmark in Persian Language

Zahra Pourbahman, Fatemeh Rajabi, Mohammadhossein Sadeghi, Omid Ghahroodi, Somayeh Bakhshaei, Arash Amini, Reza Kazemi, Mahdieh Soleymani Baghshah

Abstract

This paper presents a comprehensive evaluation framework for aligning Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms. It addresses the gaps in existing LLM evaluation frameworks by adapting them to Persian linguistic and cultural contexts. This benchmark creates three types of Persian-language benchmarks: (i) translated data, (ii) new data generated synthetically, and (iii) new naturally collected data. We translate Anthropic Red Teaming data, AdvBench, HarmBench, and DecodingTrust into Persian. Furthermore, we create ProhibiBench-fa, SafeBench-fa, FairBench-fa, and SocialBench-fa as new datasets to address harmful and prohibited content in indigenous culture. Moreover, we collect extensive dataset as GuardBench-fa to consider Persian cultural norms. By combining these datasets, our work establishes a unified framework for evaluating Persian LLMs, offering a new approach to culturally grounded alignment evaluation. A systematic evaluation of Persian LLMs is performed across the three alignment aspects: safety (avoiding harmful content), fairness (mitigating biases), and social norms (adhering to culturally accepted behaviors). We present a publicly available leaderboard that benchmarks Persian LLMs with respect to safety, fairness, and social norms.

Anthology ID:: 2025.gem-1.40
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 458–470
Language:
URL:: https://aclanthology.org/2025.gem-1.40/
DOI:
Bibkey:
Cite (ACL):: Zahra Pourbahman, Fatemeh Rajabi, Mohammadhossein Sadeghi, Omid Ghahroodi, Somayeh Bakhshaei, Arash Amini, Reza Kazemi, and Mahdieh Soleymani Baghshah. 2025. ELAB: Extensive LLM Alignment Benchmark in Persian Language. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 458–470, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: ELAB: Extensive LLM Alignment Benchmark in Persian Language (Pourbahman et al., GEM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gem-1.40.pdf

PDF Cite Search Fix data