Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition

Van-Hien Tran, Huy Hien Vu, Hideki Tanaka, Masao Utiyama


Abstract
Large language models (LLMs) often underperform in zero-shot text classification for low-resource, non-Latin languages due to script and tokenization mismatches. We propose representation-aware prompting for Marathi that augments the original script with International Phonetic Alphabet (IPA) transcriptions, romanization, or a repetition-based fallback when external converters are unavailable. Experiments with two instruction-tuned LLMs on Marathi sentiment analysis and hate detection show consistent gains over script-only prompting (up to +2.6 accuracy points). We further find that the most effective augmentation is model-dependent, and that combining all variants is not consistently beneficial, suggesting that concise, targeted cues are preferable in zero-shot settings.
Anthology ID:
2026.loreslm-1.37
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
436–443
Language:
URL:
https://aclanthology.org/2026.loreslm-1.37/
DOI:
Bibkey:
Cite (ACL):
Van-Hien Tran, Huy Hien Vu, Hideki Tanaka, and Masao Utiyama. 2026. Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 436–443, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Representation-Aware Prompting for Zero-Shot Marathi Text Classification: IPA, Romanization, Repetition (Tran et al., LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.37.pdf