PMWP: A Benchmark for Math Word Problem Solving in Persian

Marzieh Abdolmaleki; Mehrnoush Shamsfard; Veronique Hoste; Els Lefever

PMWP: A Benchmark for Math Word Problem Solving in Persian

Marzieh Abdolmaleki, Mehrnoush Shamsfard, Veronique Hoste, Els Lefever

Abstract

Mathematical reasoning captures fundamental aspects of human cognitive ability. Although recent advances in LLMs have led to substantial improvements in automated mathematical problem solving, most existing benchmarks remain focused on English. As a result, robust mathematical reasoning remains a challenging and insufficiently explored capability for underrepresented languages including Persian. To address this gap, we introduce PMWP, the first dataset of 15K elementary-level Persian math word problems that supports both supervised training and evaluation of reasoning models. By expanding mathematical reasoning resources beyond English, PMWP contributes to the development of multilingual AI systems with stronger reasoning capabilities. In this work, we conduct a systematic evaluation of the Persian math word problem solving capabilities of different state-of-the-art LLMs. Our results indicate that DeepSeek-V3 exhibits reduced language bias when problem texts are translated into English, while Gemini-2.5-Flash achieves the highest equation value accuracy (72.02%) in Persian. In addition, we investigate parameter-efficient adaptation for equation generation by applying LoRA-based fine-tuning to LLaMA-3-8B and Qwen-2.5-7B. Our results show that, following fine-tuning, these openweight models achieve 91.65% and 92.53% exact equation match accuracy, respectively. Overall, our findings provide insights into the comparative strengths and limitations of proprietary and open-weight models for mathematical reasoning in Persian.

Anthology ID:: 2026.silkroadnlp-1.8
Volume:: The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Rayyan Merchant, Karine Megerdoomian
Venues:: SilkRoadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 74–82
Language:
URL:: https://aclanthology.org/2026.silkroadnlp-1.8/
DOI:
Bibkey:
Cite (ACL):: Marzieh Abdolmaleki, Mehrnoush Shamsfard, Veronique Hoste, and Els Lefever. 2026. PMWP: A Benchmark for Math Word Problem Solving in Persian. In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, pages 74–82, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: PMWP: A Benchmark for Math Word Problem Solving in Persian (Abdolmaleki et al., SilkRoadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.silkroadnlp-1.8.pdf

PDF Cite Search Fix data