Sumirtha Karunakaran
2026
Bridging Dialectal Variation: A Phonetic Transcription Tool for Tamil
Ahrane Mahaganapathy | Sumirtha Karunakaran | Kavitha Navakulan | Kengatharaiyer Sarveswaran
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Ahrane Mahaganapathy | Sumirtha Karunakaran | Kavitha Navakulan | Kengatharaiyer Sarveswaran
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Phonetic transcription is vital for speech processing and linguistic documentation, particularly in languages like Tamil with complex phonology and dialectal variation. Challenges such as consonant gemination, retroflexion, vowel length, and one-to-many grapheme-phoneme mappings are compounded by limited data on Sri Lankan Tamil dialects. We present a dialect-aware, rule-based transcription tool for Tamil that supports Indian and Jaffna Tamil, with extensions underway for other dialects. Using a two-stage pipeline: Tamil script to Latin, then to IPA with context-sensitive rules, the tool handles dialect shifts. A real-time interface enables dialect selection. Evaluated on a 7,830-word corpus, it achieves 94.54% accuracy for Jaffna Tamil and is higher than other tools like eSpeak NG, advancing linguistic preservation and accessible speech technology for Tamil communities.