Bridging Dialectal Variation: A Phonetic Transcription Tool for Tamil

Ahrane Mahaganapathy, Sumirtha Karunakaran, Kavitha Navakulan, Kengatharaiyer Sarveswaran


Abstract
Phonetic transcription is vital for speech processing and linguistic documentation, particularly in languages like Tamil with complex phonology and dialectal variation. Challenges such as consonant gemination, retroflexion, vowel length, and one-to-many grapheme-phoneme mappings are compounded by limited data on Sri Lankan Tamil dialects. We present a dialect-aware, rule-based transcription tool for Tamil that supports Indian and Jaffna Tamil, with extensions underway for other dialects. Using a two-stage pipeline: Tamil script to Latin, then to IPA with context-sensitive rules, the tool handles dialect shifts. A real-time interface enables dialect selection. Evaluated on a 7,830-word corpus, it achieves 94.54% accuracy for Jaffna Tamil and is higher than other tools like eSpeak NG, advancing linguistic preservation and accessible speech technology for Tamil communities.
Anthology ID:
2026.vardial-1.19
Volume:
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
234–241
Language:
URL:
https://aclanthology.org/2026.vardial-1.19/
DOI:
Bibkey:
Cite (ACL):
Ahrane Mahaganapathy, Sumirtha Karunakaran, Kavitha Navakulan, and Kengatharaiyer Sarveswaran. 2026. Bridging Dialectal Variation: A Phonetic Transcription Tool for Tamil. In Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 234–241, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Bridging Dialectal Variation: A Phonetic Transcription Tool for Tamil (Mahaganapathy et al., VarDial 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.vardial-1.19.pdf