HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track

Bin Wei, Zongyao Li, Jiaxin Guo, Daimeng Wei, Zhanglin Wu, Xiaoyu Chen, Zhiqiang Rao, Shaojun Li, Yuanchang Luo, Hengchao Shang, Hao Yang, Yanfei Jiang


Abstract
This article introduces the process of HW-TSC and the results of IWSLT 2024 Indic Track Speech to Text Translation. We designed a cascade system consisting of an ASR model and a machine translation model to translate speech from one language to another. For the ASR part, we directly use whisper large v3 as our ASR model. Our main task is to optimize the machine translation model (en2ta, en2hi, en2bn). In the process of optimizing the translation model, we first use bilingual corpus to train the baseline model. Then we use monolingual data to construct pseudo-corpus data to further enhance the baseline model. Finally, we filter the parallel corpus data through the labse filtering method and finetune the model again, which can further improve the bleu value. We also selected domain data from bilingual corpus to finetune previous model to achieve the best results.
Anthology ID:
2024.iwslt-1.8
Volume:
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–56
Language:
URL:
https://aclanthology.org/2024.iwslt-1.8
DOI:
Bibkey:
Cite (ACL):
Bin Wei, Zongyao Li, Jiaxin Guo, Daimeng Wei, Zhanglin Wu, Xiaoyu Chen, Zhiqiang Rao, Shaojun Li, Yuanchang Luo, Hengchao Shang, Hao Yang, and Yanfei Jiang. 2024. HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 53–56, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):
HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track (Wei et al., IWSLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.iwslt-1.8.pdf