BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan Nasser Almatham, Kareem Mohamed Darwish, Raghad Al-Rasheed, Waad Thuwaini Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah A. Alzahrani, Eman Albilali, Nizar Habash, Abdelrahman Mustafa El-Sheikh, Muhammad Elmallah, Hamdy Mubarak, Zaid Alyafeai, Mohamed Anwar, Haonan Li, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al-Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid N. Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad A. Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljareh, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, Abdulrahman M Alosaimy
Correct Metadata for
Abstract
The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.- Anthology ID:
- 2025.arabicnlp-main.21
- Volume:
- Proceedings of The Third Arabic Natural Language Processing Conference
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
- Venue:
- ArabicNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 258–277
- Language:
- URL:
- https://aclanthology.org/2025.arabicnlp-main.21/
- DOI:
- 10.18653/v1/2025.arabicnlp-main.21
- Bibkey:
- Cite (ACL):
- Rawan Nasser Almatham, Kareem Mohamed Darwish, Raghad Al-Rasheed, Waad Thuwaini Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah A. Alzahrani, Eman Albilali, Nizar Habash, Abdelrahman Mustafa El-Sheikh, Muhammad Elmallah, Hamdy Mubarak, Zaid Alyafeai, Mohamed Anwar, Haonan Li, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al-Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid N. Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad A. Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljareh, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, and Abdulrahman M Alosaimy. 2025. BALSAM: A Platform for Benchmarking Arabic Large Language Models. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 258–277, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- BALSAM: A Platform for Benchmarking Arabic Large Language Models (Almatham et al., ArabicNLP 2025)
- Copy Citation:
- PDF:
- https://aclanthology.org/2025.arabicnlp-main.21.pdf
Export citation
@inproceedings{almatham-etal-2025-balsam,
title = "{BALSAM}: A Platform for Benchmarking {A}rabic Large Language Models",
author = "Almatham, Rawan Nasser and
Darwish, Kareem Mohamed and
Al-Rasheed, Raghad and
Alshammari, Waad Thuwaini and
Alhoshan, Muneera and
Almazrua, Amal and
Al Wazrah, Asma and
Alheraki, Mais and
Alam, Firoj and
Nakov, Preslav and
Alzahrani, Norah A. and
Albilali, Eman and
Habash, Nizar and
El-Sheikh, Abdelrahman Mustafa and
Elmallah, Muhammad and
Mubarak, Hamdy and
Alyafeai, Zaid and
Anwar, Mohamed and
Li, Haonan and
Abdelali, Ahmed and
Altwairesh, Nora and
Hasanain, Maram and
Al-Thubaity, Abdulmohsen and
Shehata, Shady and
Alhafni, Bashar and
Hamed, Injy and
Inoue, Go and
Elmadani, Khalid N. and
Obeid, Ossama and
Haouari, Fatima and
Elsayed, Tamer and
Alghamdi, Emad A. and
Almubarak, Khalid and
Alshahrani, Saied and
Aljareh, Ola and
Alajlan, Safa and
Alshaqarawi, Areej and
Alshihri, Maryam and
Alghurabi, Sultana and
Alzeghayer, Atikah and
Altamimi, Afrah and
Alfaifi, Abdullah and
Alosaimy, Abdulrahman M",
editor = "Darwish, Kareem and
Ali, Ahmed and
Abu Farha, Ibrahim and
Touileb, Samia and
Zitouni, Imed and
Abdelali, Ahmed and
Al-Ghamdi, Sharefah and
Alkhereyf, Sakhar and
Zaghouani, Wajdi and
Khalifa, Salam and
AlKhamissi, Badr and
Almatham, Rawan and
Hamed, Injy and
Alyafeai, Zaid and
Alowisheq, Areeb and
Inoue, Go and
Mrini, Khalil and
Alshammari, Waad",
booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.arabicnlp-main.21/",
doi = "10.18653/v1/2025.arabicnlp-main.21",
pages = "258--277",
ISBN = "979-8-89176-352-4",
abstract = "The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="almatham-etal-2025-balsam">
<titleInfo>
<title>BALSAM: A Platform for Benchmarking Arabic Large Language Models</title>
</titleInfo>
<name type="personal">
<namePart type="given">Rawan</namePart>
<namePart type="given">Nasser</namePart>
<namePart type="family">Almatham</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Kareem</namePart>
<namePart type="given">Mohamed</namePart>
<namePart type="family">Darwish</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Raghad</namePart>
<namePart type="family">Al-Rasheed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Waad</namePart>
<namePart type="given">Thuwaini</namePart>
<namePart type="family">Alshammari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Muneera</namePart>
<namePart type="family">Alhoshan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Amal</namePart>
<namePart type="family">Almazrua</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Asma</namePart>
<namePart type="family">Al Wazrah</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mais</namePart>
<namePart type="family">Alheraki</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Firoj</namePart>
<namePart type="family">Alam</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Preslav</namePart>
<namePart type="family">Nakov</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Norah</namePart>
<namePart type="given">A</namePart>
<namePart type="family">Alzahrani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Eman</namePart>
<namePart type="family">Albilali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nizar</namePart>
<namePart type="family">Habash</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Abdelrahman</namePart>
<namePart type="given">Mustafa</namePart>
<namePart type="family">El-Sheikh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Muhammad</namePart>
<namePart type="family">Elmallah</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hamdy</namePart>
<namePart type="family">Mubarak</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zaid</namePart>
<namePart type="family">Alyafeai</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohamed</namePart>
<namePart type="family">Anwar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Haonan</namePart>
<namePart type="family">Li</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Abdelali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nora</namePart>
<namePart type="family">Altwairesh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Maram</namePart>
<namePart type="family">Hasanain</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Abdulmohsen</namePart>
<namePart type="family">Al-Thubaity</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shady</namePart>
<namePart type="family">Shehata</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bashar</namePart>
<namePart type="family">Alhafni</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Injy</namePart>
<namePart type="family">Hamed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Go</namePart>
<namePart type="family">Inoue</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Khalid</namePart>
<namePart type="given">N</namePart>
<namePart type="family">Elmadani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ossama</namePart>
<namePart type="family">Obeid</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Fatima</namePart>
<namePart type="family">Haouari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tamer</namePart>
<namePart type="family">Elsayed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Emad</namePart>
<namePart type="given">A</namePart>
<namePart type="family">Alghamdi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Khalid</namePart>
<namePart type="family">Almubarak</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Saied</namePart>
<namePart type="family">Alshahrani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ola</namePart>
<namePart type="family">Aljareh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Safa</namePart>
<namePart type="family">Alajlan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Areej</namePart>
<namePart type="family">Alshaqarawi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Maryam</namePart>
<namePart type="family">Alshihri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sultana</namePart>
<namePart type="family">Alghurabi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Atikah</namePart>
<namePart type="family">Alzeghayer</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Afrah</namePart>
<namePart type="family">Altamimi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Abdullah</namePart>
<namePart type="family">Alfaifi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Abdulrahman</namePart>
<namePart type="given">M</namePart>
<namePart type="family">Alosaimy</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2025-11</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of The Third Arabic Natural Language Processing Conference</title>
</titleInfo>
<name type="personal">
<namePart type="given">Kareem</namePart>
<namePart type="family">Darwish</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Ali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ibrahim</namePart>
<namePart type="family">Abu Farha</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Samia</namePart>
<namePart type="family">Touileb</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Imed</namePart>
<namePart type="family">Zitouni</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Abdelali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sharefah</namePart>
<namePart type="family">Al-Ghamdi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sakhar</namePart>
<namePart type="family">Alkhereyf</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Wajdi</namePart>
<namePart type="family">Zaghouani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Salam</namePart>
<namePart type="family">Khalifa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Badr</namePart>
<namePart type="family">AlKhamissi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rawan</namePart>
<namePart type="family">Almatham</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Injy</namePart>
<namePart type="family">Hamed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zaid</namePart>
<namePart type="family">Alyafeai</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Areeb</namePart>
<namePart type="family">Alowisheq</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Go</namePart>
<namePart type="family">Inoue</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Khalil</namePart>
<namePart type="family">Mrini</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Waad</namePart>
<namePart type="family">Alshammari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Suzhou, China</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-352-4</identifier>
</relatedItem>
<abstract>The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities.</abstract>
<identifier type="citekey">almatham-etal-2025-balsam</identifier>
<identifier type="doi">10.18653/v1/2025.arabicnlp-main.21</identifier>
<location>
<url>https://aclanthology.org/2025.arabicnlp-main.21/</url>
</location>
<part>
<date>2025-11</date>
<extent unit="page">
<start>258</start>
<end>277</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T BALSAM: A Platform for Benchmarking Arabic Large Language Models %A Almatham, Rawan Nasser %A Darwish, Kareem Mohamed %A Al-Rasheed, Raghad %A Alshammari, Waad Thuwaini %A Alhoshan, Muneera %A Almazrua, Amal %A Al Wazrah, Asma %A Alheraki, Mais %A Alam, Firoj %A Nakov, Preslav %A Alzahrani, Norah A. %A Albilali, Eman %A Habash, Nizar %A El-Sheikh, Abdelrahman Mustafa %A Elmallah, Muhammad %A Mubarak, Hamdy %A Alyafeai, Zaid %A Anwar, Mohamed %A Li, Haonan %A Abdelali, Ahmed %A Altwairesh, Nora %A Hasanain, Maram %A Al-Thubaity, Abdulmohsen %A Shehata, Shady %A Alhafni, Bashar %A Hamed, Injy %A Inoue, Go %A Elmadani, Khalid N. %A Obeid, Ossama %A Haouari, Fatima %A Elsayed, Tamer %A Alghamdi, Emad A. %A Almubarak, Khalid %A Alshahrani, Saied %A Aljareh, Ola %A Alajlan, Safa %A Alshaqarawi, Areej %A Alshihri, Maryam %A Alghurabi, Sultana %A Alzeghayer, Atikah %A Altamimi, Afrah %A Alfaifi, Abdullah %A Alosaimy, Abdulrahman M. %Y Darwish, Kareem %Y Ali, Ahmed %Y Abu Farha, Ibrahim %Y Touileb, Samia %Y Zitouni, Imed %Y Abdelali, Ahmed %Y Al-Ghamdi, Sharefah %Y Alkhereyf, Sakhar %Y Zaghouani, Wajdi %Y Khalifa, Salam %Y AlKhamissi, Badr %Y Almatham, Rawan %Y Hamed, Injy %Y Alyafeai, Zaid %Y Alowisheq, Areeb %Y Inoue, Go %Y Mrini, Khalil %Y Alshammari, Waad %S Proceedings of The Third Arabic Natural Language Processing Conference %D 2025 %8 November %I Association for Computational Linguistics %C Suzhou, China %@ 979-8-89176-352-4 %F almatham-etal-2025-balsam %X The impressive advancement of Large Language Models (LLMs) in English has not been matched across all languages. In particular, LLM performance in Arabic lags behind, due to data scarcity, linguistic diversity of Arabic and its dialects, morphological complexity, etc. Progress is further hindered by the quality of Arabic benchmarks, which typically rely on static, publicly available data, lack comprehensive task coverage, or do not provide dedicated platforms with blind test sets. This makes it challenging to measure actual progress and to mitigate data contamination. Here, we aim to bridge these gaps. In particular, we introduce BALSAM, a comprehensive, community-driven benchmark aimed at advancing Arabic LLM development and evaluation. It includes 78 NLP tasks from 14 broad categories, with 52K examples divided into 37K test and 15K development, and a centralized, transparent platform for blind evaluation. We envision BALSAM as a unifying platform that sets standards and promotes collaborative research to advance Arabic LLM capabilities. %R 10.18653/v1/2025.arabicnlp-main.21 %U https://aclanthology.org/2025.arabicnlp-main.21/ %U https://doi.org/10.18653/v1/2025.arabicnlp-main.21 %P 258-277
Markdown (Informal)
[BALSAM: A Platform for Benchmarking Arabic Large Language Models](https://aclanthology.org/2025.arabicnlp-main.21/) (Almatham et al., ArabicNLP 2025)
- BALSAM: A Platform for Benchmarking Arabic Large Language Models (Almatham et al., ArabicNLP 2025)
ACL
- Rawan Nasser Almatham, Kareem Mohamed Darwish, Raghad Al-Rasheed, Waad Thuwaini Alshammari, Muneera Alhoshan, Amal Almazrua, Asma Al Wazrah, Mais Alheraki, Firoj Alam, Preslav Nakov, Norah A. Alzahrani, Eman Albilali, Nizar Habash, Abdelrahman Mustafa El-Sheikh, Muhammad Elmallah, Hamdy Mubarak, Zaid Alyafeai, Mohamed Anwar, Haonan Li, Ahmed Abdelali, Nora Altwairesh, Maram Hasanain, Abdulmohsen Al-Thubaity, Shady Shehata, Bashar Alhafni, Injy Hamed, Go Inoue, Khalid N. Elmadani, Ossama Obeid, Fatima Haouari, Tamer Elsayed, Emad A. Alghamdi, Khalid Almubarak, Saied Alshahrani, Ola Aljareh, Safa Alajlan, Areej Alshaqarawi, Maryam Alshihri, Sultana Alghurabi, Atikah Alzeghayer, Afrah Altamimi, Abdullah Alfaifi, and Abdulrahman M Alosaimy. 2025. BALSAM: A Platform for Benchmarking Arabic Large Language Models. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 258–277, Suzhou, China. Association for Computational Linguistics.