Hengchao Shang


2022

pdf bib
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
Jiaxin Guo | Yinglu Li | Minghan Wang | Xiaosong Qiao | Yuxia Wang | Hengchao Shang | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022. We design a cascade system consisted of an ASR model, machine translation model and TTS model to convert the speech from one language into another language(en-de). For the ASR part, we find that better performance can be obtained by ensembling multiple heterogeneous ASR models and performing reranking on beam candidates. And we find that the combination of context-aware reranking strategy and MT model fine-tuned on the in-domain dataset is helpful to improve the performance. Because it can mitigate the problem that the inconsistency in transcripts caused by the lack of context. Finally, we use VITS model provided officially to reproduce audio files from the translation hypothesis.

pdf bib
HW-TSC’s Participation in the IWSLT 2022 Isometric Spoken Language Translation
Zongyao Li | Jiaxin Guo | Daimeng Wei | Hengchao Shang | Minghan Wang | Ting Zhu | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements.

pdf bib
Diformer: Directional Transformer for Neural Machine Translation
Minghan Wang | Jiaxin Guo | Yuxia Wang | Daimeng Wei | Hengchao Shang | Yinglu Li | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful on the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.

2021

pdf bib
HW-TSC’s Participation in the WMT 2021 News Translation Shared Task
Daimeng Wei | Zongyao Li | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including Zh/En, De/En, Ja/En, Ha/En, Is/En, Hi/Bn, and Xh/Zu in both directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Multilingual Translation, Ensemble Knowledge Distillation, etc. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2021 Triangular MT Shared Task
Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Zhengzhe Yu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task. We participate in the Russian-to-Chinese task under the constrained condition. We use Transformer architecture and obtain the best performance via a variant with larger parameter sizes. We perform detailed data pre-processing and filtering on the provided large-scale bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, Data Denoising, Average Checkpoint, Ensemble, Fine-tuning, etc. Our system obtains 32.5 BLEU on the dev set and 27.7 BLEU on the test set, the highest score among all submissions.

pdf bib
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task
Zhengzhe Yu | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task. We participate in Samll Track #2, including 6 languages: Javanese (Jv), Indonesian (Id), Malay (Ms), Tagalog (Tl), Tamil (Ta) and English (En) with 30 directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We train a single multilingual model to translate all the 30 directions. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Ensemble Knowledge Distillation, Adapter Fine-tuning. Our model obtains competitive results in the end.

pdf bib
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Jianfei Feng | ZhengZhe Yu | Jiaxin Guo | Shaojun Li | Lizhi Lei | ShiMin Tao | Hao Yang | Jun Yao | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.

pdf bib
HW-TSC’s Submissions to the WMT21 Biomedical Translation Task
Hao Yang | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Daimeng Wei | Zongyao Li | Hengchao Shang | Minghan Wang | Jiaxin Guo | Lizhi Lei | Chuanfei Xu | Min Zhang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC). Technical details are introduced in this paper, including model framework, data pre-processing method and model enhancement strategies. In addition, using the wmt20 OK-aligned biomedical test set, we compare and analyze system performances under different strategies. On WMT21 biomedical translation task, Our systems in English→Chinese and English→German directions get the highest BLEU scores among all submissions according to the official evaluation results.

pdf bib
How Length Prediction Influence the Performance of Non-Autoregressive Translation?
Minghan Wang | Guo Jiaxin | Yuxia Wang | Yimeng Chen | Su Chang | Hengchao Shang | Min Zhang | Shimin Tao | Hao Yang
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Length prediction is a special task in a series of NAT models where target length has to be determined before generation. However, the performance of length prediction and its influence on translation quality has seldom been discussed. In this paper, we present comprehensive analyses on length prediction task of NAT, aiming to find the factors that influence performance, as well as how it associates with translation quality. We mainly perform experiments based on Conditional Masked Language Model (CMLM) (Ghazvininejad et al., 2019), a representative NAT model, and evaluate it on two language pairs, En-De and En-Ro. We draw two conclusions: 1) The performance of length prediction is mainly influenced by properties of language pairs such as alignment pattern, word order or intrinsic length ratio, and is also affected by the usage of knowledge distilled data. 2) There is a positive correlation between the performance of the length prediction and the BLEU score.

2020

pdf bib
The HW-TSC Video Speech Translation System at IWSLT 2020
Minghan Wang | Hao Yang | Yao Deng | Ying Qin | Lizhi Lei | Daimeng Wei | Hengchao Shang | Ning Xie | Xiaochun Li | Jiaxian Guo
Proceedings of the 17th International Conference on Spoken Language Translation

The paper presents details of our system in the IWSLT Video Speech Translation evaluation. The system works in a cascade form, which contains three modules: 1) A proprietary ASR system. 2) A disfluency correction system aims to remove interregnums or other disfluent expressions with a fine-tuned BERT and a series of rule-based algorithms. 3) An NMT System based on the Transformer and trained with massive publicly available corpus.

pdf bib
HW-TSC’s Participation in the WAT 2020 Indic Languages Multilingual Task
Zhengzhe Yu | Zhanglin Wu | Xiaoyu Chen | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Zongyao Li | Minghan Wang | Liangyou Li | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 7th Workshop on Asian Translation

This paper describes our work in the WAT 2020 Indic Multilingual Translation Task. We participated in all 7 language pairs (En<->Bn/Hi/Gu/Ml/Mr/Ta/Te) in both directions under the constrained condition—using only the officially provided data. Using transformer as a baseline, our Multi->En and En->Multi translation systems achieve the best performances. Detailed data filtering and data domain selection are the keys to performance enhancement in our experiment, with an average improvement of 2.6 BLEU scores for each language pair in the En->Multi system and an average improvement of 4.6 BLEU scores regarding the Multi->En. In addition, we employed language independent adapter to further improve the system performances. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2020 News Translation Shared Task
Daimeng Wei | Hengchao Shang | Zhanglin Wu | Zhengzhe Yu | Liangyou Li | Jiaxin Guo | Minghan Wang | Hao Yang | Lizhi Lei | Ying Qin | Shiliang Sun
Proceedings of the Fifth Conference on Machine Translation

This paper presents our work in the WMT 2020 News Translation Shared Task. We participate in 3 language pairs including Zh/En, Km/En, and Ps/En and in both directions under the constrained condition. We use the standard Transformer-Big model as the baseline and obtain the best performance via two variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual dataset. Several commonly used strategies are used to train our models such as Back Translation, Ensemble Knowledge Distillation, etc. We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission. Our submission obtains remarkable results in the final evaluation.

pdf bib
HW-TSC’s Participation at WMT 2020 Automatic Post Editing Shared Task
Hao Yang | Minghan Wang | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Zongyao Li | Lizhi Lei | Ying Qin | Shimin Tao | Shiliang Sun | Yimeng Chen
Proceedings of the Fifth Conference on Machine Translation

The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task. We participate in the English-German and English-Chinese language pairs. Our system is built based on the Transformer pre-trained on WMT 2019 and WMT 2020 News Translation corpora, and fine-tuned on the APE corpus. Bottleneck Adapter Layers are integrated into the model to prevent over-fitting. We further collect external translations as the augmented MT candidates to improve the performance. The experiment demonstrates that pre-trained NMT models are effective when fine-tuning with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our system achieves competitive results on both directions in the final evaluation.

pdf bib
HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task
Minghan Wang | Hao Yang | Hengchao Shang | Daimeng Wei | Jiaxin Guo | Lizhi Lei | Ying Qin | Shimin Tao | Shiliang Sun | Yimeng Chen | Liangyou Li
Proceedings of the Fifth Conference on Machine Translation

This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task. Our system follows standard Predictor-Estimator architecture, with a pre-trained Transformer as the Predictor, and specific classifiers and regressors as Estimators. We integrate Bottleneck Adapter Layers in the Predictor to improve the transfer learning efficiency and prevent from over-fitting. At the same time, we jointly train the word- and sentence-level tasks with a unified model with multitask learning. Pseudo-PE assisted QE (PEAQE) is proposed, resulting in significant improvements on the performance. Our submissions achieve competitive result in word/sentence-level sub-tasks for both of En-De/Zh language pairs.