APARSIN: A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages

Sadegh Jafari, Tara Azin, Farhad Roodi, Zahra Dehghani Tafti, Mehrdad Ghadrdan, Elham Vatankhahan Esfahani, Aylin Naebzadeh, Mohammadhadi Shahhosseini, Ghafoor Khan, Kazem Forghani, Danial Namazi, Seyed Mohammad Hossein Hashemi, Farhan Farsi, Mohammad Osoolian, Maede Mohammadi, Mohammad Erfan Zare, Muhammad Hasnain Khan, Muhammad Hussain, Nooreen Zaki, Joma Mohammadi, Shayan Bali, Mohammad Javad Ranjbar, Els Lefever, Veronique Hoste


Abstract
The Iranic language family includes many underrepresented languages and dialects that remain largely unexplored in modern NLP research. We introduce APARSIN, a multi-variety benchmark covering 14 Iranic languages, dialects, and accents, designed for sentiment analysis and machine translation. The dataset includes both high and low-resource varieties, several of which are endangered, capturing linguistic variation across them. We evaluate a set of instruction-tuned Large Language Models (LLMs) on these tasks and analyze their performance across the varieties. Our results highlight substantial performance gaps between standard Persian and other Iranic languages and dialects, demonstrating the need for more inclusive multilingual and dialectally diverse NLP benchmarks.
Anthology ID:
2026.silkroadnlp-1.9
Volume:
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Rayyan Merchant, Karine Megerdoomian
Venues:
SilkRoadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
83–97
Language:
URL:
https://aclanthology.org/2026.silkroadnlp-1.9/
DOI:
Bibkey:
Cite (ACL):
Sadegh Jafari, Tara Azin, Farhad Roodi, Zahra Dehghani Tafti, Mehrdad Ghadrdan, Elham Vatankhahan Esfahani, Aylin Naebzadeh, Mohammadhadi Shahhosseini, Ghafoor Khan, Kazem Forghani, Danial Namazi, Seyed Mohammad Hossein Hashemi, Farhan Farsi, Mohammad Osoolian, Maede Mohammadi, Mohammad Erfan Zare, Muhammad Hasnain Khan, Muhammad Hussain, Nooreen Zaki, Joma Mohammadi, Shayan Bali, Mohammad Javad Ranjbar, Els Lefever, and Veronique Hoste. 2026. APARSIN: A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages. In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, pages 83–97, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
APARSIN: A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages (Jafari et al., SilkRoadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.silkroadnlp-1.9.pdf