Speech-to-Speech Translation for a Real-world Unwritten Language

Peng-Jen Chen; Kevin Tran; Yilin Yang; Jingfei Du; Justine Kao; Yu-An Chung; Paden Tomasello; Paul-Ambroise Duquenne; Holger Schwenk; Hongyu Gong; Hirofumi Inaguma; Sravya Popuri; Changhan Wang; Juan Pino; Wei-Ning Hsu; Ann Lee

doi:10.18653/v1/2023.findings-acl.307

Speech-to-Speech Translation for a Real-world Unwritten Language

Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

Abstract

We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field.

Anthology ID:: 2023.findings-acl.307
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4969–4983
Language:
URL:: https://aclanthology.org/2023.findings-acl.307/
DOI:: 10.18653/v1/2023.findings-acl.307
Bibkey:
Cite (ACL):: Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, and Ann Lee. 2023. Speech-to-Speech Translation for a Real-world Unwritten Language. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4969–4983, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Speech-to-Speech Translation for a Real-world Unwritten Language (Chen et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.307.pdf

PDF Cite Search Fix data