@inproceedings{ahmadi-etal-2025-parme,
title = "{PARME}: Parallel Corpora for Low-Resourced {M}iddle {E}astern Languages",
author = {Ahmadi, Sina and
Sennrich, Rico and
Karami, Erfan and
Marani, Ako and
Fekrazad, Parviz and
Baghban, Gholamreza Akbarzadeh and
Hadi, Hanah and
Heidari, Semko and
Dogan, Mah{\^i}r and
Asadi, Pedram and
Bashir, Dashne and
Ghodrati, Mohammad Amin and
Amini, Kourosh and
Ashourinezhad, Zeynab and
Baladi, Mana and
Ezzati, Farshid and
Ghasemifar, Alireza and
Hosseinpour, Daryoush and
Abbaszadeh, Behrooz and
Hassanpour, Amin and
Hamaamin, Bahaddin Jalal and
Hama, Saya Kamal and
Mousavi, Ardeshir and
Hussein, Sarko Nazir and
Nejadgholi, Isar and
{\"O}lmez, Mehmet and
Osmanpour, Horam and
Ramezani, Rashid Roshan and
Aziz, Aryan Sediq and
Salehi Sheikhalikelayeh, Ali and
Yadegari, Mohammadreza and
Yadegari, Kewyar and
Roodsari, Sedighe Zamani},
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.1451/",
doi = "10.18653/v1/2025.acl-long.1451",
pages = "30032--30053",
ISBN = "979-8-89176-251-0",
abstract = "The Middle East is characterized by remarkable linguistic diversity, with over 400 million inhabitants speaking more than 60 languages across multiple language families. This study presents a pioneering work in developing the first parallel corpora for eight severely under-resourced varieties in the region{--}PARME, addressing fundamental challenges in low-resource scenarios including non-standardized writing and dialectal complexity. Through an extensive community-driven initiative, volunteers contributed to the creation of over 36,000 translated sentences, marking a significant milestone in resource development. We evaluate machine translation capabilities through zero-shot approaches and fine-tuning experiments with pretrained machine translation models and provide a comprehensive analysis of limitations. Our findings reveal significant gaps in existing technologies for processing the selected languages, highlighting critical areas for improvement in language technology for Middle Eastern languages."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ahmadi-etal-2025-parme">
<titleInfo>
<title>PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages</title>
</titleInfo>
<name type="personal">
<namePart type="given">Sina</namePart>
<namePart type="family">Ahmadi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rico</namePart>
<namePart type="family">Sennrich</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Erfan</namePart>
<namePart type="family">Karami</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ako</namePart>
<namePart type="family">Marani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Parviz</namePart>
<namePart type="family">Fekrazad</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gholamreza</namePart>
<namePart type="given">Akbarzadeh</namePart>
<namePart type="family">Baghban</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hanah</namePart>
<namePart type="family">Hadi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Semko</namePart>
<namePart type="family">Heidari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mahîr</namePart>
<namePart type="family">Dogan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Pedram</namePart>
<namePart type="family">Asadi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Dashne</namePart>
<namePart type="family">Bashir</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohammad</namePart>
<namePart type="given">Amin</namePart>
<namePart type="family">Ghodrati</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Kourosh</namePart>
<namePart type="family">Amini</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zeynab</namePart>
<namePart type="family">Ashourinezhad</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mana</namePart>
<namePart type="family">Baladi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Farshid</namePart>
<namePart type="family">Ezzati</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Alireza</namePart>
<namePart type="family">Ghasemifar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Daryoush</namePart>
<namePart type="family">Hosseinpour</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Behrooz</namePart>
<namePart type="family">Abbaszadeh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Amin</namePart>
<namePart type="family">Hassanpour</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bahaddin</namePart>
<namePart type="given">Jalal</namePart>
<namePart type="family">Hamaamin</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Saya</namePart>
<namePart type="given">Kamal</namePart>
<namePart type="family">Hama</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ardeshir</namePart>
<namePart type="family">Mousavi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sarko</namePart>
<namePart type="given">Nazir</namePart>
<namePart type="family">Hussein</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Isar</namePart>
<namePart type="family">Nejadgholi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mehmet</namePart>
<namePart type="family">Ölmez</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Horam</namePart>
<namePart type="family">Osmanpour</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rashid</namePart>
<namePart type="given">Roshan</namePart>
<namePart type="family">Ramezani</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Aryan</namePart>
<namePart type="given">Sediq</namePart>
<namePart type="family">Aziz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ali</namePart>
<namePart type="family">Salehi Sheikhalikelayeh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohammadreza</namePart>
<namePart type="family">Yadegari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Kewyar</namePart>
<namePart type="family">Yadegari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sedighe</namePart>
<namePart type="given">Zamani</namePart>
<namePart type="family">Roodsari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2025-07</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</title>
</titleInfo>
<name type="personal">
<namePart type="given">Wanxiang</namePart>
<namePart type="family">Che</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Joyce</namePart>
<namePart type="family">Nabende</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ekaterina</namePart>
<namePart type="family">Shutova</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohammad</namePart>
<namePart type="given">Taher</namePart>
<namePart type="family">Pilehvar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Vienna, Austria</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-251-0</identifier>
</relatedItem>
<abstract>The Middle East is characterized by remarkable linguistic diversity, with over 400 million inhabitants speaking more than 60 languages across multiple language families. This study presents a pioneering work in developing the first parallel corpora for eight severely under-resourced varieties in the region–PARME, addressing fundamental challenges in low-resource scenarios including non-standardized writing and dialectal complexity. Through an extensive community-driven initiative, volunteers contributed to the creation of over 36,000 translated sentences, marking a significant milestone in resource development. We evaluate machine translation capabilities through zero-shot approaches and fine-tuning experiments with pretrained machine translation models and provide a comprehensive analysis of limitations. Our findings reveal significant gaps in existing technologies for processing the selected languages, highlighting critical areas for improvement in language technology for Middle Eastern languages.</abstract>
<identifier type="citekey">ahmadi-etal-2025-parme</identifier>
<identifier type="doi">10.18653/v1/2025.acl-long.1451</identifier>
<location>
<url>https://aclanthology.org/2025.acl-long.1451/</url>
</location>
<part>
<date>2025-07</date>
<extent unit="page">
<start>30032</start>
<end>30053</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages
%A Ahmadi, Sina
%A Sennrich, Rico
%A Karami, Erfan
%A Marani, Ako
%A Fekrazad, Parviz
%A Baghban, Gholamreza Akbarzadeh
%A Hadi, Hanah
%A Heidari, Semko
%A Dogan, Mahîr
%A Asadi, Pedram
%A Bashir, Dashne
%A Ghodrati, Mohammad Amin
%A Amini, Kourosh
%A Ashourinezhad, Zeynab
%A Baladi, Mana
%A Ezzati, Farshid
%A Ghasemifar, Alireza
%A Hosseinpour, Daryoush
%A Abbaszadeh, Behrooz
%A Hassanpour, Amin
%A Hamaamin, Bahaddin Jalal
%A Hama, Saya Kamal
%A Mousavi, Ardeshir
%A Hussein, Sarko Nazir
%A Nejadgholi, Isar
%A Ölmez, Mehmet
%A Osmanpour, Horam
%A Ramezani, Rashid Roshan
%A Aziz, Aryan Sediq
%A Salehi Sheikhalikelayeh, Ali
%A Yadegari, Mohammadreza
%A Yadegari, Kewyar
%A Roodsari, Sedighe Zamani
%Y Che, Wanxiang
%Y Nabende, Joyce
%Y Shutova, Ekaterina
%Y Pilehvar, Mohammad Taher
%S Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
%D 2025
%8 July
%I Association for Computational Linguistics
%C Vienna, Austria
%@ 979-8-89176-251-0
%F ahmadi-etal-2025-parme
%X The Middle East is characterized by remarkable linguistic diversity, with over 400 million inhabitants speaking more than 60 languages across multiple language families. This study presents a pioneering work in developing the first parallel corpora for eight severely under-resourced varieties in the region–PARME, addressing fundamental challenges in low-resource scenarios including non-standardized writing and dialectal complexity. Through an extensive community-driven initiative, volunteers contributed to the creation of over 36,000 translated sentences, marking a significant milestone in resource development. We evaluate machine translation capabilities through zero-shot approaches and fine-tuning experiments with pretrained machine translation models and provide a comprehensive analysis of limitations. Our findings reveal significant gaps in existing technologies for processing the selected languages, highlighting critical areas for improvement in language technology for Middle Eastern languages.
%R 10.18653/v1/2025.acl-long.1451
%U https://aclanthology.org/2025.acl-long.1451/
%U https://doi.org/10.18653/v1/2025.acl-long.1451
%P 30032-30053
Markdown (Informal)
[PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages](https://aclanthology.org/2025.acl-long.1451/) (Ahmadi et al., ACL 2025)
ACL
- Sina Ahmadi, Rico Sennrich, Erfan Karami, Ako Marani, Parviz Fekrazad, Gholamreza Akbarzadeh Baghban, Hanah Hadi, Semko Heidari, Mahîr Dogan, Pedram Asadi, Dashne Bashir, Mohammad Amin Ghodrati, Kourosh Amini, Zeynab Ashourinezhad, Mana Baladi, Farshid Ezzati, Alireza Ghasemifar, Daryoush Hosseinpour, Behrooz Abbaszadeh, Amin Hassanpour, Bahaddin Jalal Hamaamin, Saya Kamal Hama, Ardeshir Mousavi, Sarko Nazir Hussein, Isar Nejadgholi, Mehmet Ölmez, Horam Osmanpour, Rashid Roshan Ramezani, Aryan Sediq Aziz, Ali Salehi Sheikhalikelayeh, Mohammadreza Yadegari, Kewyar Yadegari, and Sedighe Zamani Roodsari. 2025. PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30032–30053, Vienna, Austria. Association for Computational Linguistics.