Yuanchang Luo


2024

pdf bib
Improving the Quality of IWLST 2024 Cascade Offline Speech Translation and Speech-to-Speech Translation via Translation Hypothesis Ensembling with NMT models and Large Language Models
Zhanglin Wu | Jiaxin Guo | Daimeng Wei | Zhiqiang Rao | Zongyao Li | Hengchao Shang | Yuanchang Luo | Shaojun Li | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper presents HW-TSC’s submission to the IWSLT 2024 Offline Speech Translation Task and Speech-to-Speech Translation Task. The former includes three translation directions: English to German, English to Chinese, and English to Japanese, while the latter only includes the translation direction of English to Chinese. We attend all three tracks (Constraint training, Constrained with Large Language Models training, and Unconstrained training) of offline speech translation task, using the cascade model architecture. Under the constrained training track, we train an ASR model from scratch, and then employ R-Drop and domain data selection to train the NMT model. In the constrained with Large Language Models training track, we use Wav2vec 2.0 and mBART50 for ASR model training initialization, and then train the LLama2-7B-based MT model using continuous training with sentence-aligned parallel data, supervised fine-tuning, and contrastive preference optimization. In the unconstrained training track, we fine-tune the whisper model for speech recognition, and then ensemble the translation results of NMT models and LLMs to produce superior translation output. For the speech-to-speech translation Task, we initially employ the offline speech translation system described above to generate the translated text. Then, we utilize the VITS model to generate the corresponding speech and employ the OpenVoice model for timbre cloning.

pdf bib
HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track
Bin Wei | Zongyao Li | Jiaxin Guo | Daimeng Wei | Zhanglin Wu | Xiaoyu Chen | Zhiqiang Rao | Shaojun Li | Yuanchang Luo | Hengchao Shang | Hao Yang | Yanfei Jiang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This article introduces the process of HW-TSC and the results of IWSLT 2024 Indic Track Speech to Text Translation. We designed a cascade system consisting of an ASR model and a machine translation model to translate speech from one language to another. For the ASR part, we directly use whisper large v3 as our ASR model. Our main task is to optimize the machine translation model (en2ta, en2hi, en2bn). In the process of optimizing the translation model, we first use bilingual corpus to train the baseline model. Then we use monolingual data to construct pseudo-corpus data to further enhance the baseline model. Finally, we filter the parallel corpus data through the labse filtering method and finetune the model again, which can further improve the bleu value. We also selected domain data from bilingual corpus to finetune previous model to achieve the best results.

pdf bib
HW-TSC’s Submissions To the IWSLT2024 Low-resource Speech Translation Tasks
Zheng Jiawei | Hengchao Shang | Zongyao Li | Zhanglin Wu | Daimeng Wei | Zhiqiang Rao | Shaojun Li | Jiaxin Guo | Bin Wei | Yuanchang Luo | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

In this work, we submitted our systems to the low-resource track of the IWSLT 2024 Speech Translation Campaign. Our systems tackled the unconstrained condition of the Dialectal Arabic North Levantine (ISO-3 code: apc) to English language pair. We proposed a cascaded solution consisting of an automatic speech recognition (ASR) model and a machine translation (MT) model. It was noted that the ASR model employed the pre-trained Whisper-large-v3 model to process the speech data, while the MT model adopted the Transformer architecture. To improve the quality of the MT model, it was stated that our system utilized not only the data provided by the competition but also an additional 54 million parallel sentences. Ultimately, we reported that our final system achieved a BLEU score of 24.7 for apc-to-English translation.

pdf bib
HW-TSC’s Simultaneous Speech Translation System for IWSLT 2024
Shaojun Li | Zhiqiang Rao | Bin Wei | Yuanchang Luo | Zhanglin Wu | Zongyao Li | Hengchao Shang | Jiaxin Guo | Daimeng Wei | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper outlines our submission for the IWSLT 2024 Simultaneous Speech-to-Text (SimulS2T) and Speech-to-Speech (SimulS2S) Translation competition. We have engaged in all four language directions and both the SimulS2T and SimulS2S tracks: English-German (EN-DE), English-Chinese (EN-ZH), English-Japanese (EN-JA), and Czech-English (CS-EN). For the S2T track, we have built upon our previous year’s system and further honed the cascade system composed of ASR model and MT model. Concurrently, we have introduced an end-to-end system specifically for the CS-EN direction. This end-to-end (E2E) system primarily employs the pre-trained seamlessM4T model. In relation to the SimulS2S track, we have integrated a novel TTS model into our SimulS2T system. The final submission for the S2T directions of EN-DE, EN-ZH, and EN-JA has been refined over our championship system from last year. Building upon this foundation, the incorporation of the new TTS into our SimulS2S system has resulted in the ASR-BLEU surpassing last year’s best score.

pdf bib
HW-TSC’s submission to the IWSLT 2024 Subtitling track
Yuhao Xie | Yuanchang Luo | Zongyao Li | Zhanglin Wu | Xiaoyu Chen | Zhiqiang Rao | Shaojun Li | Hengchao Shang | Jiaxin Guo | Daimeng Wei | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper introduces HW-TSC’s submission to the IWSLT 2024 Subtitling track. For the automatic subtitling track, we use an unconstrained cascaded strategy, with the main steps being: ASR with word-level timestamps, sentence segmentation based on punctuation restoration, further alignment using CTC or using machine translation with length penalty. For the subtitle compression track, we employ a subtitle compression strategy that integrates machine translation models and extensive rewriting models. We acquire the subtitle text requiring revision through the CPS index, then utilize a translation model to obtain the English version of this text. Following this, we extract the compressed-length subtitle text through controlled decoding. If this method fails to compress the text successfully, we resort to the Llama2 few-shot model for further compression.

2022

pdf bib
HW-TSC’s Submissions to the WMT 2022 General Machine Translation Shared Task
Daimeng Wei | Zhiqiang Rao | Zhanglin Wu | Shaojun Li | Yuanchang Luo | Yuhao Xie | Xiaoyu Chen | Hengchao Shang | Zongyao Li | Zhengzhe Yu | Jinlong Yang | Miaomiao Ma | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submissions of Huawei Translate Services Center (HW-TSC) to the WMT 2022 General Machine Translation Shared Task. We participate in 6 language pairs, including Zh↔En, Ru↔En, Uk↔En, Hr↔En, Uk↔Cs and Liv↔En. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform fine-grained pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. For medium and highresource languages, we mainly use data augmentation strategies, including Back Translation, Self Training, Ensemble Knowledge Distillation, Multilingual, etc. For low-resource languages such as Liv, we use pre-trained machine translation models, and then continue training with Regularization Dropout (R-Drop). The previous mentioned data augmentation methods are also used. Our submissions obtain competitive results in the final evaluation.

pdf bib
HW-TSC Translation Systems for the WMT22 Biomedical Translation Task
Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Daimeng Wei | Xiaoyu Chen | Zongyao Li | Hengchao Shang | Shaojun Li | Ming Zhu | Yuanchang Luo | Yuhao Xie | Miaomiao Ma | Ting Zhu | Lizhi Lei | Song Peng | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the translation systems trained by Huawei translation services center (HW-TSC) for the WMT22 biomedical translation task in five language pairs: English↔German (en↔de), English↔French (en↔fr), English↔Chinese (en↔zh), English↔Russian (en↔ru) and Spanish→English (es→en). Our primary systems are built on deep Transformer with a large filter size. We also utilize R-Drop, data diversification, forward translation, back translation, data selection, finetuning and ensemble to improve the system performance. According to the official evaluation results in OCELoT or CodaLab, our unconstrained systems in en→de, de→en, en→fr, fr→en, en→zh and es→en (clinical terminology sub-track) get the highest BLEU scores among all submissions for the WMT22 biomedical translation task.

pdf bib
HW-TSC Translation Systems for the WMT22 Chat Translation Task
Jinlong Yang | Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhengzhe Yu | Zhiqiang Rao | Shaojun Li | Zhanglin Wu | Yuhao Xie | Yuanchang Luo | Ting Zhu | Yanqing Zhao | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei Translation Services Center (HW-TSC) to WMT22 chat translation shared task on English-Germany (en-de) bidirection with results of zore-shot and few-shot tracks. We use the deep transformer architecture with a lager parameter size. Our submissions to the WMT21 News Translation task are used as the baselines. We adopt strategies such as back translation, forward translation, domain transfer, data selection, and noisy forward translation in task, and achieve competitive results on the development set. We also test the effectiveness of document translation on chat tasks. Due to the lack of chat data, the results on the development set show that it is not as effective as sentence-level translation models.

pdf bib
HW-TSC Systems for WMT22 Very Low Resource Supervised MT Task
Shaojun Li | Yuanchang Luo | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Yuhao Xie | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei translation services center (HW-TSC) to the WMT22 Very Low Resource Supervised MT task. We participate in all 6 supervised tracks including all combinations between Upper/Lower Sorbian (Hsb/Dsb) and German (De). Our systems are build on deep Transformer with a large filter size. We use multilingual transfer with German-Czech (De-Cs) and German-Polish (De-Pl) parallel data. We also utilize regularized dropout (R-Drop), back translation, fine-tuning and ensemble to improve the system performance. According to the official evaluation results on OCELoT, our supervised systems of all 6 language directions get the highest BLEU scores among all submissions. Our pre-trained multilingual model for unsupervised De2Dsb and Dsb2De translation also gain highest BLEU.

pdf bib
HW-TSC’s Submissions to the WMT22 Word-Level Auto Completion Task
Hao Yang | Hengchao Shang | Zongyao Li | Daimeng Wei | Xianghui He | Xiaoyu Chen | Zhengzhe Yu | Jiaxin Guo | Jinlong Yang | Shaojun Li | Yuanchang Luo | Yuhao Xie | Lizhi Lei | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submissions of Huawei Translation Services Center (HW-TSC) to WMT 2022 Word-Level AutoCompletion Task. We propose an end-to-end autoregressive model with bi-context based on Transformer to solve current task. The model uses a mixture of subword and character encoding units to realize the joint encoding of human input, the context of the target side and the decoded sequence, which ensures full utilization of information. We uses one model to solve four types of data structures in the task. During training, we try using a machine translation model as the pre-trained model and fine-tune it for the task. We also add BERT-style MLM data at the fine-tuning stage to improve model performance. We participate in zhen, ende, and deen directions and win the first place in all the three tracks. Particularly, we outperform the second place by more than 5% in terms of accuracy on the zhen and ende tracks. The result is buttressed by human evaluations as well, demonstrating the effectiveness of our model.