Duplex Diffusion Models Improve Speech-to-Speech Translation

Xianchao Wu


Abstract
Speech-to-speech translation is a typical sequence-to-sequence learning task that naturally has two directions. How to effectively leverage bidirectional supervision signals to produce high-fidelity audio for both directions? Existing approaches either train two separate models or a multitask-learned model with low efficiency and inferior performance. In this paper, we propose a duplex diffusion model that applies diffusion probabilistic models to both sides of a reversible duplex Conformer, so that either end can simultaneously input and output a distinct language’s speech. Our model enables reversible speech translation by simply flipping the input and output ends. Experiments show that our model achieves the first success of reversible speech translation with significant improvements of ASR-BLEU scores compared with a list of state-of-the-art baselines.
Anthology ID:
2023.findings-acl.509
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8035–8047
Language:
URL:
https://aclanthology.org/2023.findings-acl.509
DOI:
10.18653/v1/2023.findings-acl.509
Bibkey:
Cite (ACL):
Xianchao Wu. 2023. Duplex Diffusion Models Improve Speech-to-Speech Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8035–8047, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Duplex Diffusion Models Improve Speech-to-Speech Translation (Wu, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.509.pdf
Video:
 https://aclanthology.org/2023.findings-acl.509.mp4