Benchmarking Offensive Language Detection in Persian and Pashto

Zahra Bokaei, Bonnie Webber, Walid Magdy


Abstract
Offensive language detection and target identification are essential for maintaining respectful online environments. While these tasks have been widely studied for English, comparatively less attention has been given to other language, including Persian and Pashto, and the effectiveness of recent large language models for these languages remains underexplored. To address this gap, we created a comprehensive benchmark of diverse modeling approaches in Persian and Pashto. Our evaluation covers zeroshot, fine-tuned, and cross-lingual transfer settings, analyzing when detection succeeds or fails across different model approaches. This study provides one of the first systematic analyses of offensive language detection and crosslingual transfer between these languages.
Anthology ID:
2026.silkroadnlp-1.2
Volume:
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Rayyan Merchant, Karine Megerdoomian
Venues:
SilkRoadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–23
Language:
URL:
https://aclanthology.org/2026.silkroadnlp-1.2/
DOI:
Bibkey:
Cite (ACL):
Zahra Bokaei, Bonnie Webber, and Walid Magdy. 2026. Benchmarking Offensive Language Detection in Persian and Pashto. In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, pages 13–23, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Benchmarking Offensive Language Detection in Persian and Pashto (Bokaei et al., SilkRoadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.silkroadnlp-1.2.pdf