Tara Azin
2026
APARSIN: A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages
Sadegh Jafari | Tara Azin | Farhad Roodi | Zahra Dehghani Tafti | Mehrdad Ghadrdan | Elham Vatankhahan Esfahani | Aylin Naebzadeh | Mohammadhadi Shahhosseini | Ghafoor Khan | Kazem Forghani | Danial Namazi | Seyed Mohammad Hossein Hashemi | Farhan Farsi | Mohammad Osoolian | Maede Mohammadi | Mohammad Erfan Zare | Muhammad Hasnain Khan | Muhammad Hussain | Nooreen Zaki | Joma Mohammadi | Shayan Bali | Mohammad Javad Ranjbar | Els Lefever | Veronique Hoste
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Sadegh Jafari | Tara Azin | Farhad Roodi | Zahra Dehghani Tafti | Mehrdad Ghadrdan | Elham Vatankhahan Esfahani | Aylin Naebzadeh | Mohammadhadi Shahhosseini | Ghafoor Khan | Kazem Forghani | Danial Namazi | Seyed Mohammad Hossein Hashemi | Farhan Farsi | Mohammad Osoolian | Maede Mohammadi | Mohammad Erfan Zare | Muhammad Hasnain Khan | Muhammad Hussain | Nooreen Zaki | Joma Mohammadi | Shayan Bali | Mohammad Javad Ranjbar | Els Lefever | Veronique Hoste
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
The Iranic language family includes many underrepresented languages and dialects that remain largely unexplored in modern NLP research. We introduce APARSIN, a multi-variety benchmark covering 14 Iranic languages, dialects, and accents, designed for sentiment analysis and machine translation. The dataset includes both high and low-resource varieties, several of which are endangered, capturing linguistic variation across them. We evaluate a set of instruction-tuned Large Language Models (LLMs) on these tasks and analyze their performance across the varieties. Our results highlight substantial performance gaps between standard Persian and other Iranic languages and dialects, demonstrating the need for more inclusive multilingual and dialectally diverse NLP benchmarks.
2024
Persian Abstract Meaning Representation: Annotation Guidelines and Gold Standard Dataset
Reza Takhshid | Tara Azin | Razieh Shojaei | Mohammad Bahrani
Proceedings of the 2024 UMR Parsing Workshop
Reza Takhshid | Tara Azin | Razieh Shojaei | Mohammad Bahrani
Proceedings of the 2024 UMR Parsing Workshop
This paper introduces the Persian Abstract Meaning Representation (AMR) guidelines, a detailed guide for annotating Persian sentences with AMR, focusing on the necessary adaptations to fit Persian’s unique syntactic structures. We discuss the development process of a Persian AMR gold standard dataset consisting of 1562 sentences created following the guidelines. By examining the language specifications and nuances that distinguish AMR annotations of a low-resource language like Persian, we shed light on the challenges and limitations of developing a universal meaning representation framework. The guidelines and the dataset introduced in this study highlight such challenges, aiming to advance the field.
Search
Fix author
Co-authors
- Mohammad Bahrani 1
- Shayan Bali 1
- Elham Vatankhahan Esfahani 1
- Farhan Farsi 1
- Kazem Forghani 1
- Mehrdad Ghadrdan 1
- Seyed Mohammad Hossein Hashemi 1
- Veronique Hoste 1
- Muhammad Hussain 1
- Sadegh Jafari 1
- Ghafoor Khan 1
- Muhammad Hasnain Khan 1
- Els Lefever 1
- Maede Mohammadi 1
- Joma Mohammadi 1
- Aylin Naebzadeh 1
- Danial Namazi 1
- Mohammad Osoolian 1
- Mohammad Javad Ranjbar 1
- Farhad Roodi 1
- Mohammadhadi Shahhosseini 1
- Razieh Shojaei 1
- Zahra Dehghani Tafti 1
- Reza Takhshid 1
- Nooreen Zaki 1
- Mohammad Erfan Zare 1