Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

Prathamesh Devadiga, Paras Chopra


Abstract
Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over two million speakers but minimal digital presence. Rather than fine-tuning, we examine whether structured prompt engineering alone can elicit basic conversational ability under extreme data scarcity. Our framework combines explicit grammar documentation, negative constraints to suppress high-probability tokens from related languages, romanization standardization, and quality-controlled synthetic data generation via self-play. Evaluated on a manually curated held-out set across three LLMs (Gemini 2.0 Flash, GPT-4o, and Llama 3.1 70B) and validated by native speakers, our approach reduces vocabulary contamination from 80% to 5% while achieving 85% grammatical accuracy. Cross-model analysis shows that negative constraints provide consistent improvements (12–18 percentage points), while the effectiveness of grammar documentation varies by model architecture (8–22 points). These results demonstrate that structured in-context learning can meaningfully extend LLM capabilities to extremely low-resource languages without parameter updates.
Anthology ID:
2026.loreslm-1.5
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
50–61
Language:
URL:
https://aclanthology.org/2026.loreslm-1.5/
DOI:
Bibkey:
Cite (ACL):
Prathamesh Devadiga and Paras Chopra. 2026. Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 50–61, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language (Devadiga & Chopra, LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.5.pdf