HW-TSC’s Submission to the IWSLT 2026 Cross-Lingual Voice Cloning Track

Yu He, Daimeng Wei, Jiaxin GUO, Yuanchang Luo, Hengchao Shang, Zongyao Li, Zhiqiang Rao, Jinlong Yang, Zhanglin Wu, Boqi Huang, Xiaoqing Lan


Abstract
This paper presents HW-TSC’s submission to the IWSLT 2026 Cross-Lingual Voice Cloning Track. The Cross-Lingual Voice Cloning Track includes three target languages: Arabic, Chinese, and French. We take part in two language tasks of this track, namely Chinese and French. We employ the Qwen3-TTS-12Hz-1.7B-Base multilingual model as the core voice cloning model. To tackle problems such as excessively long duration of the original reference audio and scattered features, we design a sliding-window audio segmentation preprocessing method, which continuously splits long audio into standardized short segments with overlapping redundancy. This method avoids feature attenuation caused by overly long audio and maximizes the preservation of complete timbre information through step overlap. To select the outputs with the highest timbre similarity from numerous synthetic results, this study conducts voiceprint recognition based on the Enhanced Context-Dependent Adversarial Time Delay Neural Network (ECAPA-TDNN), with cosine similarity as the core quantitative evaluation metric, and selects the result with the highest similarity as the optimal output.
Anthology ID:
2026.iwslt-1.11
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–102
Language:
URL:
https://aclanthology.org/2026.iwslt-1.11/
DOI:
Bibkey:
Cite (ACL):
Yu He, Daimeng Wei, Jiaxin GUO, Yuanchang Luo, Hengchao Shang, Zongyao Li, Zhiqiang Rao, Jinlong Yang, Zhanglin Wu, Boqi Huang, and Xiaoqing Lan. 2026. HW-TSC’s Submission to the IWSLT 2026 Cross-Lingual Voice Cloning Track. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 97–102, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
HW-TSC’s Submission to the IWSLT 2026 Cross-Lingual Voice Cloning Track (He et al., IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.11.pdf