Length-Aware NMT and Adaptive Duration for Automatic Dubbing

Zhiqiang Rao; Hengchao Shang; Jinlong Yang; Daimeng Wei; Zongyao Li; Jiaxin Guo; Shaojun Li; Zhengzhe Yu; Zhanglin Wu; Yuhao Xie; Bin Wei; Jiawei Zheng; Lizhi Lei; Hao Yang (杨浩)

doi:10.18653/v1/2023.iwslt-1.9

Length-Aware NMT and Adaptive Duration for Automatic Dubbing

Zhiqiang Rao, Hengchao Shang, Jinlong Yang, Daimeng Wei, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhengzhe Yu, Zhanglin Wu, Yuhao Xie, Bin Wei, Jiawei Zheng, Lizhi Lei, Hao Yang

Abstract

This paper presents the submission of Huawei Translation Services Center for the IWSLT 2023 dubbing task in the unconstrained setting. The proposed solution consists of a Transformer-based machine translation model and a phoneme duration predictor. The Transformer is deep and multiple target-to-source length-ratio class labels are used to control target lengths. The variation predictor in FastSpeech2 is utilized to predict phoneme durations. To optimize the isochrony in dubbing, re-ranking and scaling are performed. The source audio duration is used as a reference to re-rank the translations of different length-ratio labels, and the one with minimum time deviation is preferred. Additionally, the phoneme duration outputs are scaled within a defined threshold to narrow the duration gap with the source audio.

Anthology ID:: 2023.iwslt-1.9
Volume:: Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 138–143
Language:
URL:: https://aclanthology.org/2023.iwslt-1.9/
DOI:: 10.18653/v1/2023.iwslt-1.9
Bibkey:
Cite (ACL):: Zhiqiang Rao, Hengchao Shang, Jinlong Yang, Daimeng Wei, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhengzhe Yu, Zhanglin Wu, Yuhao Xie, Bin Wei, Jiawei Zheng, Lizhi Lei, and Hao Yang. 2023. Length-Aware NMT and Adaptive Duration for Automatic Dubbing. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 138–143, Toronto, Canada (in-person and online). Association for Computational Linguistics.
Cite (Informal):: Length-Aware NMT and Adaptive Duration for Automatic Dubbing (Rao et al., IWSLT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.iwslt-1.9.pdf

PDF Cite Search Fix data