Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

Prathamesh Devadiga; Paras Chopra

Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

Abstract

Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over two million speakers but minimal digital presence. Rather than fine-tuning, we examine whether structured prompt engineering alone can elicit basic conversational ability under extreme data scarcity. Our framework combines explicit grammar documentation, negative constraints to suppress high-probability tokens from related languages, romanization standardization, and quality-controlled synthetic data generation via self-play. Evaluated on a manually curated held-out set across three LLMs (Gemini 2.0 Flash, GPT-4o, and Llama 3.1 70B) and validated by native speakers, our approach reduces vocabulary contamination from 80% to 5% while achieving 85% grammatical accuracy. Cross-model analysis shows that negative constraints provide consistent improvements (12–18 percentage points), while the effectiveness of grammar documentation varies by model architecture (8–22 points). These results demonstrate that structured in-context learning can meaningfully extend LLM capabilities to extremely low-resource languages without parameter updates.

Anthology ID:: 2026.loreslm-1.5
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 50–61
Language:
URL:: https://aclanthology.org/2026.loreslm-1.5/
DOI:
Bibkey:
Cite (ACL):: Prathamesh Devadiga and Paras Chopra. 2026. Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 50–61, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language (Devadiga & Chopra, LoResLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loreslm-1.5.pdf

PDF Cite Search Fix data