Enabling Structured Reasoning in Sindhi with Culturally Grounded Instruction Tuning

Mehak Mehak; Kamyar Zeinalipour; Pireh Soomro; Cristiano Chesi; Marco Gori; Marco Maggini

Enabling Structured Reasoning in Sindhi with Culturally Grounded Instruction Tuning

Mehak Mehak, Kamyar Zeinalipour, Pireh Soomro, Cristiano Chesi, Marco Gori, Marco Maggini

Abstract

While Large Language Models (LLMs) excel in high-resource contexts, reasoning capabilities in low-resource languages (LRLs) like Sindhi remain limited. To bridge this gap, we introduce Sindhi-Reasoning-Instruct, the first culturally grounded Sindhi instruction corpus. We fine-tuned six LLaMA and Mistral models (1B–24B) to evaluate if parameter-efficient tuning enables deductive, inductive, and causal reasoning. Results demonstrate that linguistically authentic data is the decisive factor. Fine-tuning effectively restored Sindhi’s Perso-Arabic orthography and SOV structure, with the Mistral-Small-24B model achieving a massive 141% relative improvement in human quality ratings over its base version. Furthermore, structured reasoning capabilities were found to scale with model size; while smaller models achieved high fluency, Mistral-Small-24B achieved top performance across logical categories, reaching 83% on inductive reasoning tasks. This study provides empirical evidence that expert-curated, native instruction data allows LRL models to move beyond simple translation toward robust, structured reasoning. The dataset and models are publicly available.

Anthology ID:: 2026.loreslm-1.22
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 239–258
Language:
URL:: https://aclanthology.org/2026.loreslm-1.22/
DOI:
Bibkey:
Cite (ACL):: Mehak Mehak, Kamyar Zeinalipour, Pireh Soomro, Cristiano Chesi, Marco Gori, and Marco Maggini. 2026. Enabling Structured Reasoning in Sindhi with Culturally Grounded Instruction Tuning. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 239–258, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Enabling Structured Reasoning in Sindhi with Culturally Grounded Instruction Tuning (Mehak et al., LoResLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loreslm-1.22.pdf

PDF Cite Search Fix data