Xiaosong Qiao


2022

pdf bib
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation
Yinglu Li | Minghan Wang | Jiaxin Guo | Xiaosong Qiao | Yuxia Wang | Daimeng Wei | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes the HW-TSC’s designation of the Offline Speech Translation System submitted for IWSLT 2022 Evaluation. We explored both cascade and end-to-end system on three language tracks (en-de, en-zh and en-ja), and we chose the cascade one as our primary submission. For the automatic speech recognition (ASR) model of cascade system, there are three ASR models including Conformer, S2T-Transformer and U2 trained on the mixture of five datasets. During inference, transcripts are generated with the help of domain controlled generation strategy. Context-aware reranking and ensemble based anti-interference strategy are proposed to produce better ASR outputs. For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora. Our cascade system shows competitive performance than the known offline systems in the industry and academia.

pdf bib
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022 Evaluation
Minghan Wang | Jiaxin Guo | Yinglu Li | Xiaosong Qiao | Yuxia Wang | Zongyao Li | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our work in the participation of IWSLT 2022 simultaneous speech translation evaluation. For the track of text-to-text (T2T), we participate in three language pairs and build wait-k based simultaneous MT (SimulMT) model for the task. The model was pretrained on WMT21 news corpora, and was further improved with in-domain fine-tuning and self-training. For the speech-to-text (S2T) track, we designed both cascade and end-to-end form in three language pairs. The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track. The end-to-end system is a simultaneous speech translation (SimulST) model based on wait-k strategy, which is directly trained on a synthetic corpus produced by translating all texts of ASR corpora into specific target language with an offline MT model. It also contains a heuristic sentence breaking strategy, preventing it from finishing the translation before the the end of the speech. We evaluate our systems on the MUST-C tst-COMMON dataset and show that the end-to-end system is competitive to the cascade one. Meanwhile, we also demonstrate that the SimulMT model can be efficiently optimized by these approaches, resulting in the improvements of 1-2 BLEU points.

pdf bib
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
Jiaxin Guo | Yinglu Li | Minghan Wang | Xiaosong Qiao | Yuxia Wang | Hengchao Shang | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022. We design a cascade system consisted of an ASR model, machine translation model and TTS model to convert the speech from one language into another language(en-de). For the ASR part, we find that better performance can be obtained by ensembling multiple heterogeneous ASR models and performing reranking on beam candidates. And we find that the combination of context-aware reranking strategy and MT model fine-tuned on the in-domain dataset is helpful to improve the performance. Because it can mitigate the problem that the inconsistency in transcripts caused by the lack of context. Finally, we use VITS model provided officially to reproduce audio files from the translation hypothesis.