Swakkhar Shatabda

2026

MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning
Mahbub E Sobhani | Md. Faiyaz Abdullah Sayeedi | Tasnim Mohiuddin | Md Mofijul Islam | Swakkhar Shatabda
Findings of the Association for Computational Linguistics: EACL 2026

Mathematical reasoning remains one of the most challenging domains for large language models (LLMs), requiring not only linguistic understanding but also structured logical deduction and numerical precision. While recent LLMs demonstrate strong general-purpose reasoning abilities, their mathematical competence across diverse languages remains underexplored. Existing benchmarks primarily focus on English or a narrow subset of high-resource languages, leaving significant gaps in assessing multilingual and cross-lingual mathematical reasoning. To address this, we introduce MathMist, a parallel multilingual benchmark for mathematical problem solving and reasoning. MathMist encompasses 2,890 parallel Bangla-English gold standard artifacts, totaling ≈30K aligned question–answer pairs across thirteen languages, representing an extensive coverage of high-, medium-, and low-resource linguistic settings. The dataset captures linguistic variety, multiple types of problem settings, and solution synthesizing capabilities. We systematically evaluate a diverse suite of models, including open-source small and medium LLMs, proprietary systems, and multilingual-reasoning-focused models under zero-shot, chain-of-thought (CoT), perturbated reasoning, and code-switched reasoning paradigms. Our results reveal persistent deficiencies in LLMs’ ability to perform consistent and interpretable mathematical reasoning across languages, with pronounced degradation in low-resource settings. All the codes and data are available at GitHub: https://github.com/mahbubhimel/MathMist

pdf bib abs

Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning
Mahbub E Sobhani | Md. Faiyaz Abdullah Sayeedi | Mohammad Nehad Alam | Proma Hossain Progga | Swakkhar Shatabda
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Diagram-grounded geometry problem solving is a critical benchmark for multimodal large language models (MLLMs), yet the benefits of multi-agent design over single-agent remain unclear. We systematically compare single-agent and multi-agent pipelines on four visual math benchmarks: Geometry3K, MathVerse, OlympiadBench, and We-Math. For open-source models, multi-agent consistently improves performance. For example, Qwen-2.5-VL (7B) gains +6.8 points and Qwen-2.5-VL (32B) gains +3.3 on Geometry3K, and both Qwen-2.5-VL variants see further gains on OlympiadBench and We-Math. In contrast, the closed-source Gemini-2.0-Flash generally performs better in single-agent mode on classic benchmarks, while multi-agent yields only modest improvements on the newer We-Math dataset. These findings show that multi-agent pipelines provide clear benefits for open-source models and can assist strong proprietary systems on newer, less familiar benchmarks, but agentic decomposition is not universally optimal. All code, data, and reasoning files are available at https://github.com/faiyazabdullah/Interpreter-Solver

2023

pdf bib abs

Advancing Bangla Punctuation Restoration by a Monolingual Transformer-Based Method and a Large-Scale Corpus
Mehedi Hasan Bijoy | Mir Fatema Afroz Faria | Mahbub E Sobhani | Tanzid Ferdoush | Swakkhar Shatabda
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Punctuation restoration is the endeavor of reinstating and rectifying missing or improper punctuation marks within a text, thereby eradicating ambiguity in written discourse. The Bangla punctuation restoration task has received little attention and exploration, despitethe rising popularity of textual communication in the language. The primary hindrances in the advancement of the task revolve aroundthe utilization of transformer-based methods and an openly accessible extensive corpus, challenges that we discovered remainedunresolved in earlier efforts. In this study, we propose a baseline by introducing a mono-lingual transformer-based method named Jatikarok, where the effectiveness of transfer learning has been meticulously scrutinized, and a large-scale corpus containing 1.48M source-target pairs to resolve the previous issues. The Jatikarok attains accuracy rates of 95.2%, 85.13%, and 91.36% on the BanglaPRCorpus, Prothom-Alo Balanced, and BanglaOPUS corpora, thereby establishing itself as the state-of-the-art method through its superior performance compared to BanglaT5 and T5-Small. Jatikarok and BanglaPRCorpus are publicly available at: https://github.com/mehedihasanbijoy/Jatikarok-and-BanglaPRCorpus

Co-authors

Tanzid Ferdoush 1

Md Mofijul Islam 1

Muhammad Tasnim Mohiuddin 1

Proma Hossain Progga 1

Venues

Fix author