Findings in Tamil Dialect Speech Recognition and Classification

Bharathi B; Bharathi Raja Chakravarthi; Shunmuga Priya Muthusamy Chinnan; Saranya S; Suhasini S

Findings in Tamil Dialect Speech Recognition and Classification

Bharathi B, Bharathi Raja Chakravarthi, Shunmuga Priya Muthusamy Chinnan, Saranya S, Suhasini S

Abstract

As part of DravidianLangTech-2026, we provide a overview of Shared Task on Dialect-based Speech Recognition and Classification in Tamil. Creating reliable system for Tamil dialect identification from audio signals and dialect-aware Automatic Speech Recognition (ASR) is the main goal of the joint work. Dialect-based Tamil Speech Recognition and Tamil Dialect Classification from Speech are the two subtasks that make up the task. 5,134 audio recordings in four Tamil dialects: Southern, Northern, Western, and Central-spanning 9 hours and 22 minutes make up the training dataset. There are 579 audio samples in the test set, totaling almost two hours in length. The shared task involved 17 teams in total. For speech recognition and dialect classification, the top-performing system obtained a Word Error Rate (WER) of 0.51 and a macro F1-score of 0.79, respectively. The findings emphasize the difficulties in understanding Tamil speech due to dialectal diversity and set solid foundations for further study on low-resource dialect-aware ASR systems.

Anthology ID:: 2026.dravidianlangtech-1.9
Volume:: Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: July
Year:: 2026
Address:: Underline (Virtual)
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 71–79
Language:
URL:: https://aclanthology.org/2026.dravidianlangtech-1.9/
DOI:
Bibkey:
Cite (ACL):: Bharathi B, Bharathi Raja Chakravarthi, Shunmuga Priya Muthusamy Chinnan, Saranya S, and Suhasini S. 2026. Findings in Tamil Dialect Speech Recognition and Classification. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 71–79, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):: Findings in Tamil Dialect Speech Recognition and Classification (B et al., DravidianLangTech 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.dravidianlangtech-1.9.pdf

PDF Cite Search Fix data