The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

Minghan Wang; Yinglu Li; Jiaxin Guo; Zongyao Li; Hengchao Shang; Daimeng Wei; Min Zhang; Shimin Tao; Hao Yang

doi:10.18653/v1/2023.iwslt-1.25

The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023

Minghan Wang, Yinglu Li, Jiaxin Guo, Zongyao Li, Hengchao Shang, Daimeng Wei, Min Zhang, Shimin Tao, Hao Yang

Abstract

This paper describes our work on the IWSLT2023 Speech-to-Speech task. Our proposed cascaded system consists of an ensemble of Conformer and S2T-Transformer-based ASR models, a Transformer-based MT model, and a Diffusion-based TTS model. Our primary focus in this competition was to investigate the modeling ability of the Diffusion model for TTS tasks in high-resource scenarios and the role of TTS in the overall S2S task. To this end, we proposed DTS, an end-to-end diffusion-based TTS model that takes raw text as input and generates waveform by iteratively denoising on pure Gaussian noise. Compared to previous TTS models, the speech generated by DTS is more natural and performs better in code-switching scenarios. As the training process is end-to-end, it is relatively straightforward. Our experiments demonstrate that DTS outperforms other TTS models on the GigaS2S benchmark, and also brings positive gains for the entire S2S system.

Anthology ID:: 2023.iwslt-1.25
Volume:: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 277–282
Language:
URL:: https://aclanthology.org/2023.iwslt-1.25/
DOI:: 10.18653/v1/2023.iwslt-1.25
Bibkey:
Cite (ACL):: Minghan Wang, Yinglu Li, Jiaxin Guo, Zongyao Li, Hengchao Shang, Daimeng Wei, Min Zhang, Shimin Tao, and Hao Yang. 2023. The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 277–282, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):: The HW-TSC’s Speech-to-Speech Translation System for IWSLT 2023 (Wang et al., IWSLT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.iwslt-1.25.pdf

PDF Cite Search Fix data