Ahrane Mahaganapathy

2026

Bridging Dialectal Variation: A Phonetic Transcription Tool for Tamil
Ahrane Mahaganapathy | Sumirtha Karunakaran | Kavitha Navakulan | Kengatharaiyer Sarveswaran
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects

Phonetic transcription is vital for speech processing and linguistic documentation, particularly in languages like Tamil with complex phonology and dialectal variation. Challenges such as consonant gemination, retroflexion, vowel length, and one-to-many grapheme-phoneme mappings are compounded by limited data on Sri Lankan Tamil dialects. We present a dialect-aware, rule-based transcription tool for Tamil that supports Indian and Jaffna Tamil, with extensions underway for other dialects. Using a two-stage pipeline: Tamil script to Latin, then to IPA with context-sensitive rules, the tool handles dialect shifts. A real-time interface enables dialect selection. Evaluated on a 7,830-word corpus, it achieves 94.54% accuracy for Jaffna Tamil and is higher than other tools like eSpeak NG, advancing linguistic preservation and accessible speech technology for Tamil communities.

Co-authors

Venues

VarDial1
WS1

Fix author