Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks

Farhan Farsi, Shahriar Shariati Motlagh, Shayan Bali, Sadra Sabouri, Saeedeh Momtazi


Abstract
This study introduces a novel framework for evaluating Large Language Models (LLMs) and Vision-Language Models (VLMs) in Persian, a low-resource language. We develop comprehensive datasets to assess reasoning, linguistic understanding, and multimodal capabilities. Our datasets include Persian-OCR-QA for optical character recognition, Persian-VQA for visual question answering, Persian world-image puzzle for multimodal integration, Visual-Abstraction-Reasoning for abstract reasoning, and Iran-places for visual knowledge of Iranian figures and locations. We evaluate models like GPT-4o, Claude 3.5 Sonnet, and Llama 3.2 90B Vision, revealing their strengths and weaknesses in processing Persian. This research contributes to inclusive language processing by addressing the unique challenges of low-resource language evaluation.
Anthology ID:
2025.evalmg-1.5
Volume:
Proceedings of the First Workshop of Evaluation of Multi-Modal Generation
Month:
Jan
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Wei Emma Zhang, Xiang Dai, Desmond Elliot, Byron Fang, Mongyuan Sim, Haojie Zhuang, Weitong Chen
Venues:
EvalMG | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–56
Language:
URL:
https://aclanthology.org/2025.evalmg-1.5/
DOI:
Bibkey:
Cite (ACL):
Farhan Farsi, Shahriar Shariati Motlagh, Shayan Bali, Sadra Sabouri, and Saeedeh Momtazi. 2025. Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks. In Proceedings of the First Workshop of Evaluation of Multi-Modal Generation, pages 52–56, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks (Farsi et al., EvalMG 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.evalmg-1.5.pdf