Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context

Manal Alhassoun; Imaan Mohammed Alkhanen; Nouf Alshalawi; Ibtehal Baazeem; Waleed Alsanie

doi:10.18653/v1/2025.arabicnlp-main.11

Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context

Manal Alhassoun, Imaan Mohammed Alkhanen, Nouf Alshalawi, Ibtehal Baazeem, Waleed Alsanie

Abstract

For effective use in specific countries, Large Language Models (LLMs) need a strong grasp of local culture and core knowledge to ensure socially appropriate, context-aware, and factually correct responses. Existing Arabic and Saudi benchmarks are limited, focusing mainly on dialects or lifestyle, with little attention to deeper cultural or domain-specific alignment from authoritative sources. To address this gap and the challenge LLMs face with non-Western cultural nuance, this study introduces the Saudi-Alignment Benchmark. It consists of 874 manually curated questions across two core cultural dimensions: Saudi Cultural and Ethical Norms, and Saudi Domain Knowledge. These questions span multiple subcategories and use three formats to assess different goals with verified sources. Our evaluation reveals significant variance in LLM alignment. GPT-4 achieved the highest overall accuracy (83.3%), followed by ALLaM-7B (81.8%) and Llama-3.3-70B (81.6%), whereas Jais-30B exhibited a pronounced shortfall at 21.9%. Furthermore, multilingual LLMs excelled in norms; ALLaM-7B in domain knowledge. Considering the effect of question format, LLMs generally excelled in selected-response formats but showed weaker results on generative tasks, indicating that recognition-based benchmarks alone may overestimate cultural and contextual alignment. These findings highlight the need for tailored benchmarks and reveal LLMs’ limitations in achieving cultural grounding, particularly in underrepresented contexts like Saudi Arabia.

Anthology ID:: 2025.arabicnlp-main.11
Volume:: Proceedings of The Third Arabic Natural Language Processing Conference
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:: ArabicNLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 130–147
Language:
URL:: https://aclanthology.org/2025.arabicnlp-main.11/
DOI:: 10.18653/v1/2025.arabicnlp-main.11
Bibkey:
Cite (ACL):: Manal Alhassoun, Imaan Mohammed Alkhanen, Nouf Alshalawi, Ibtehal Baazeem, and Waleed Alsanie. 2025. Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 130–147, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context (Alhassoun et al., ArabicNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.arabicnlp-main.11.pdf

PDF Cite Search Fix data