Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes
Basma El Amel Boussaha, Mohammed Alyafeai, Ahmed Alzubaidi, Leen Al Qadi, Shaikha Alsuwaidi, Hakim Hacid
Correct Metadata for
Abstract
Under-represented languages suffer from a lack of data, and as a result, there are few LLMs that support them. Extending an existing LLM to a new language is a practical option for startups, university labs, and organizations with limited budgets. This process involves several steps. In this paper, we describe how we adapted the Falcon3-7B model to Arabic, covering everything from data collection and training to evaluation. Falcon-Arabic was trained exclusively on native data to better capture the cultural and linguistic aspects of the language. Our evaluations show that Falcon-Arabic achieves state-of-the-art results on a range of Arabic benchmarks.- Anthology ID:
- 2025.arabicnlp-main.1
- Volume:
- Proceedings of The Third Arabic Natural Language Processing Conference
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
- Venue:
- ArabicNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–15
- Language:
- URL:
- https://aclanthology.org/2025.arabicnlp-main.1/
- DOI:
- Bibkey:
- Cite (ACL):
- Basma El Amel Boussaha, Mohammed Alyafeai, Ahmed Alzubaidi, Leen Al Qadi, Shaikha Alsuwaidi, and Hakim Hacid. 2025. Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 1–15, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes (Boussaha et al., ArabicNLP 2025)
- Copy Citation:
- PDF:
- https://aclanthology.org/2025.arabicnlp-main.1.pdf
Export citation
@inproceedings{boussaha-etal-2025-adapting,
title = "Adapting Falcon3-7{B} Language Model for {A}rabic: Methods, Challenges, and Outcomes",
author = "Boussaha, Basma El Amel and
Alyafeai, Mohammed and
Alzubaidi, Ahmed and
Al Qadi, Leen and
Alsuwaidi, Shaikha and
Hacid, Hakim",
editor = "Darwish, Kareem and
Ali, Ahmed and
Abu Farha, Ibrahim and
Touileb, Samia and
Zitouni, Imed and
Abdelali, Ahmed and
Al-Ghamdi, Sharefah and
Alkhereyf, Sakhar and
Zaghouani, Wajdi and
Khalifa, Salam and
AlKhamissi, Badr and
Almatham, Rawan and
Hamed, Injy and
Alyafeai, Zaid and
Alowisheq, Areeb and
Inoue, Go and
Mrini, Khalil and
Alshammari, Waad",
booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.arabicnlp-main.1/",
pages = "1--15",
ISBN = "979-8-89176-352-4",
abstract = "Under-represented languages suffer from a lack of data, and as a result, there are few LLMs that support them. Extending an existing LLM to a new language is a practical option for startups, university labs, and organizations with limited budgets. This process involves several steps. In this paper, we describe how we adapted the Falcon3-7B model to Arabic, covering everything from data collection and training to evaluation. Falcon-Arabic was trained exclusively on native data to better capture the cultural and linguistic aspects of the language. Our evaluations show that Falcon-Arabic achieves state-of-the-art results on a range of Arabic benchmarks."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="boussaha-etal-2025-adapting">
<titleInfo>
<title>Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes</title>
</titleInfo>
<name type="personal">
<namePart type="given">Basma</namePart>
<namePart type="given">El</namePart>
<namePart type="given">Amel</namePart>
<namePart type="family">Boussaha</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohammed</namePart>
<namePart type="family">Alyafeai</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Alzubaidi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Leen</namePart>
<namePart type="family">Al Qadi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Shaikha</namePart>
<namePart type="family">Alsuwaidi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hakim</namePart>
<namePart type="family">Hacid</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2025-11</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of The Third Arabic Natural Language Processing Conference</title>
</titleInfo>
<name type="personal">
<namePart type="given">Kareem</namePart>
<namePart type="family">Darwish</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Ali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ibrahim</namePart>
<namePart type="family">Abu Farha</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Samia</namePart>
<namePart type="family">Touileb</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Imed</namePart>
<namePart type="family">Zitouni</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ahmed</namePart>
<namePart type="family">Abdelali</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sharefah</namePart>
<namePart type="family">Al-Ghamdi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sakhar</namePart>
<namePart type="family">Alkhereyf</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Wajdi</namePart>
<namePart type="family">Zaghouani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Salam</namePart>
<namePart type="family">Khalifa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Badr</namePart>
<namePart type="family">AlKhamissi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rawan</namePart>
<namePart type="family">Almatham</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Injy</namePart>
<namePart type="family">Hamed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zaid</namePart>
<namePart type="family">Alyafeai</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Areeb</namePart>
<namePart type="family">Alowisheq</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Go</namePart>
<namePart type="family">Inoue</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Khalil</namePart>
<namePart type="family">Mrini</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Waad</namePart>
<namePart type="family">Alshammari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Suzhou, China</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-352-4</identifier>
</relatedItem>
<abstract>Under-represented languages suffer from a lack of data, and as a result, there are few LLMs that support them. Extending an existing LLM to a new language is a practical option for startups, university labs, and organizations with limited budgets. This process involves several steps. In this paper, we describe how we adapted the Falcon3-7B model to Arabic, covering everything from data collection and training to evaluation. Falcon-Arabic was trained exclusively on native data to better capture the cultural and linguistic aspects of the language. Our evaluations show that Falcon-Arabic achieves state-of-the-art results on a range of Arabic benchmarks.</abstract>
<identifier type="citekey">boussaha-etal-2025-adapting</identifier>
<location>
<url>https://aclanthology.org/2025.arabicnlp-main.1/</url>
</location>
<part>
<date>2025-11</date>
<extent unit="page">
<start>1</start>
<end>15</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes %A Boussaha, Basma El Amel %A Alyafeai, Mohammed %A Alzubaidi, Ahmed %A Al Qadi, Leen %A Alsuwaidi, Shaikha %A Hacid, Hakim %Y Darwish, Kareem %Y Ali, Ahmed %Y Abu Farha, Ibrahim %Y Touileb, Samia %Y Zitouni, Imed %Y Abdelali, Ahmed %Y Al-Ghamdi, Sharefah %Y Alkhereyf, Sakhar %Y Zaghouani, Wajdi %Y Khalifa, Salam %Y AlKhamissi, Badr %Y Almatham, Rawan %Y Hamed, Injy %Y Alyafeai, Zaid %Y Alowisheq, Areeb %Y Inoue, Go %Y Mrini, Khalil %Y Alshammari, Waad %S Proceedings of The Third Arabic Natural Language Processing Conference %D 2025 %8 November %I Association for Computational Linguistics %C Suzhou, China %@ 979-8-89176-352-4 %F boussaha-etal-2025-adapting %X Under-represented languages suffer from a lack of data, and as a result, there are few LLMs that support them. Extending an existing LLM to a new language is a practical option for startups, university labs, and organizations with limited budgets. This process involves several steps. In this paper, we describe how we adapted the Falcon3-7B model to Arabic, covering everything from data collection and training to evaluation. Falcon-Arabic was trained exclusively on native data to better capture the cultural and linguistic aspects of the language. Our evaluations show that Falcon-Arabic achieves state-of-the-art results on a range of Arabic benchmarks. %U https://aclanthology.org/2025.arabicnlp-main.1/ %P 1-15
Markdown (Informal)
[Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes](https://aclanthology.org/2025.arabicnlp-main.1/) (Boussaha et al., ArabicNLP 2025)
- Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes (Boussaha et al., ArabicNLP 2025)
ACL
- Basma El Amel Boussaha, Mohammed Alyafeai, Ahmed Alzubaidi, Leen Al Qadi, Shaikha Alsuwaidi, and Hakim Hacid. 2025. Adapting Falcon3-7B Language Model for Arabic: Methods, Challenges, and Outcomes. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 1–15, Suzhou, China. Association for Computational Linguistics.