MunTTS: A Text-to-Speech System for Mundari

Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani Mondal, Vivek Seshadri, Kalika Bali


Abstract
We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. We begin our study by gathering a substantial dataset of Mundari text and speech and train end-to-end speech models. We also delve into the methods used for training our models, ensuring they are efficient and effective despite the data constraints. We evaluate our system with native speakers and objective metrics, demonstrating its potential as a tool for preserving and promoting the Mundari language in the digital age.
Anthology ID:
2024.computel-1.11
Volume:
Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Sarah Moeller, Godfred Agyapong, Antti Arppe, Aditi Chaudhary, Shruti Rijhwani, Christopher Cox, Ryan Henke, Alexis Palmer, Daisy Rosenblum, Lane Schwartz
Venues:
ComputEL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–82
Language:
URL:
https://aclanthology.org/2024.computel-1.11
DOI:
Bibkey:
Cite (ACL):
Varun Gumma, Rishav Hada, Aditya Yadavalli, Pamir Gogoi, Ishani Mondal, Vivek Seshadri, and Kalika Bali. 2024. MunTTS: A Text-to-Speech System for Mundari. In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 76–82, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
MunTTS: A Text-to-Speech System for Mundari (Gumma et al., ComputEL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.computel-1.11.pdf
Video:
 https://aclanthology.org/2024.computel-1.11.mp4