Towards Speech to Speech Machine Translation focusing on Indian Languages

Vandan Mujadia, Dipti Sharma


Abstract
We introduce an SSMT (Speech to Speech Machine Translation, aka Speech to Speech Video Translation) Pipeline(https://ssmt.iiit.ac.in/ssmtiiith), as web application for translating videos from one language to another by cascading multiple language modules. Our speech translation system combines highly accurate speech to text (ASR) for Indian English, pre-possessing modules to bridge ASR-MT gaps such as spoken disfluency and punctuation, robust machine translation (MT) systems for multiple language pairs, SRT module for translated text, text to speech (TTS) module and a module to render translated synthesized audio on the original video. It is user-friendly, flexible, and easily accessible system. We aim to provide a complete configurable speech translation experience to users and researchers with this system. It also supports human intervention where users can edit outputs of different modules and the edited output can then be used for subsequent processing to improve overall output quality. By adopting a human-in-the-loop approach, the aim is to configure technology in such a way where it can assist humans and help to reduce the involved human efforts in speech translation involving English and Indian languages. As per our understanding, this is the first fully integrated system for English to Indian languages (Hindi, Telugu, Gujarati, Marathi and Punjabi) video translation. Our evaluation shows that one can get 3.5+ MOS score using the developed pipeline with human intervention for English to Hindi. A short video demonstrating our system is available at https://youtu.be/MVftzoeRg48.
Anthology ID:
2023.eacl-demo.19
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Danilo Croce, Luca Soldaini
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
161–168
Language:
URL:
https://aclanthology.org/2023.eacl-demo.19
DOI:
10.18653/v1/2023.eacl-demo.19
Bibkey:
Cite (ACL):
Vandan Mujadia and Dipti Sharma. 2023. Towards Speech to Speech Machine Translation focusing on Indian Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 161–168, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Towards Speech to Speech Machine Translation focusing on Indian Languages (Mujadia & Sharma, EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-demo.19.pdf
Video:
 https://aclanthology.org/2023.eacl-demo.19.mp4