Zhengzhe Yu

Also published as: ZhengZhe Yu


2023

pdf bib
Text Style Transfer Back-Translation
Daimeng Wei | Zhanglin Wu | Hengchao Shang | Zongyao Li | Minghan Wang | Jiaxin Guo | Xiaoyu Chen | Zhengzhe Yu | Hao Yang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-liked inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a generalized data augmentation method. Our training code and text style transfer model are open-sourced.

pdf bib
Treating General MT Shared Task as a Multi-Domain Adaptation Problem: HW-TSC’s Submission to the WMT23 General MT Shared Task
Zhanglin Wu | Daimeng Wei | Zongyao Li | Zhengzhe Yu | Shaojun Li | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Yuhao Xie | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT23 general machine translation (MT) shared task, in which we participate in Chinese↔English (zh↔en) language pair. We use Transformer architecture and obtain the best performance via a variant with larger parameter size. We perform fine-grained pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. We mainly use model enhancement strategies, including Regularized Dropout, Bidirectional Training, Data Diversification, Forward Translation, Back Translation, Alternated Training, Curriculum Learning and Transductive Ensemble Learning. Our submissions obtain competitive results in the final evaluation.

pdf bib
The Path to Continuous Domain Adaptation Improvements by HW-TSC for the WMT23 Biomedical Translation Shared Task
Zhanglin Wu | Daimeng Wei | Zongyao Li | Zhengzhe Yu | Shaojun Li | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Yuhao Xie | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the domain adaptation methods adopted by Huawei Translation Service Center (HW-TSC) to train the neural machine translation (NMT) system on the English↔German (en↔de) language pair of the WMT23 biomedical translation task. Our NMT system is built on deep Transformer with larger parameter sizes. Based on the biomedical NMT system trained last year, we leverage Curriculum Learning, Data Diversification, Forward translation, Back translation, and Transductive Ensemble Learning to further improve system performance. Overall, we believe our submission can achieve highly competitive result in the official final evaluation.

pdf bib
Length-Aware NMT and Adaptive Duration for Automatic Dubbing
Zhiqiang Rao | Hengchao Shang | Jinlong Yang | Daimeng Wei | Zongyao Li | Jiaxin Guo | Shaojun Li | Zhengzhe Yu | Zhanglin Wu | Yuhao Xie | Bin Wei | Jiawei Zheng | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents the submission of Huawei Translation Services Center for the IWSLT 2023 dubbing task in the unconstrained setting. The proposed solution consists of a Transformer-based machine translation model and a phoneme duration predictor. The Transformer is deep and multiple target-to-source length-ratio class labels are used to control target lengths. The variation predictor in FastSpeech2 is utilized to predict phoneme durations. To optimize the isochrony in dubbing, re-ranking and scaling are performed. The source audio duration is used as a reference to re-rank the translations of different length-ratio labels, and the one with minimum time deviation is preferred. Additionally, the phoneme duration outputs are scaled within a defined threshold to narrow the duration gap with the source audio.

pdf bib
Improving Neural Machine Translation Formality Control with Domain Adaptation and Reranking-based Transductive Learning
Zhanglin Wu | Zongyao Li | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Xiaoyu Chen | Zhiqiang Rao | Zhengzhe Yu | Jinlong Yang | Shaojun Li | Yuhao Xie | Bin Wei | Jiawei Zheng | Ming Zhu | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents Huawei Translation Service Center (HW-TSC)’s submission on the IWSLT 2023 formality control task, which provides two training scenarios: supervised and zero-shot, each containing two language pairs, and sets constrained and unconstrained conditions. We train the formality control models for these four language pairs under these two conditions respectively, and submit the corresponding translation results. Our efforts are divided into two fronts: enhancing general translation quality and improving formality control capability. According to the different requirements of the formality control task, we use a multi-stage pre-training method to train a bilingual or multilingual neural machine translation (NMT) model as the basic model, which can improve the general translation quality of the base model to a relatively high level. Then, under the premise of affecting the general translation quality of the basic model as little as possible, we adopt domain adaptation and reranking-based transductive learning methods to improve the formality control capability of the model.

pdf bib
HW-TSC at IWSLT2023: Break the Quality Ceiling of Offline Track via Pre-Training and Domain Adaptation
Zongyao Li | Zhanglin Wu | Zhiqiang Rao | Xie YuHao | Guo JiaXin | Daimeng Wei | Hengchao Shang | Wang Minghan | Xiaoyu Chen | Zhengzhe Yu | Li ShaoJun | Lei LiZhi | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents HW-TSC’s submissions to the IWSLT 2023 Offline Speech Translation task, including speech translation of talks from English to German, Chinese, and Japanese, respectively. We participate in all three conditions (constrained training, constrained with large language models training, and unconstrained training) with models of cascaded architectures. We use data enhancement, pre-training models and other means to improve the ASR quality, and R-Drop, deep model, domain data selection, etc. to improve the translation quality. Compared with last year’s best results, we achieve 2.1 BLEU improvement on the MuST-C English-German test set.

pdf bib
The HW-TSC’s Simultaneous Speech-to-Text Translation System for IWSLT 2023 Evaluation
Jiaxin Guo | Daimeng Wei | Zhanglin Wu | Zongyao Li | Zhiqiang Rao | Minghan Wang | Hengchao Shang | Xiaoyu Chen | Zhengzhe Yu | Shaojun Li | Yuhao Xie | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Text Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our proposed solution is a cascaded incremental decoding system that comprises an ASR model and an MT model. The ASR model is based on the U2++ architecture and can handle both streaming and offline speech scenarios with ease. Meanwhile, the MT model adopts the Deep-Transformer architecture. To improve performance, we explore methods to generate a confident partial target text output that guides the next MT incremental decoding process. In our experiments, we demonstrate that our simultaneous strategies achieve low latency while maintaining a loss of no more than 2 BLEU points when compared to offline systems.

pdf bib
The HW-TSC’s Simultaneous Speech-to-Speech Translation System for IWSLT 2023 Evaluation
Hengchao Shang | Zhiqiang Rao | Zongyao Li | Zhanglin Wu | Jiaxin Guo | Minghan Wang | Daimeng Wei | Shaojun Li | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Speech Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our solution is a cascaded incremental decoding system, consisting of an ASR model, an MT model, and a TTS model. By adopting the strategies used in the Speech-to-Text track, we have managed to generate a more confident target text for each audio segment input, which can guide the next MT incremental decoding process. Additionally, we have integrated the TTS model to seamlessly reproduce audio files from the translation hypothesis. To enhance the effectiveness of our experiment, we have utilized a range of methods to reduce error conditions in the TTS input text and improve the smoothness of the TTS output audio.

2022

pdf bib
HW-TSC’s Submissions to the WMT 2022 General Machine Translation Shared Task
Daimeng Wei | Zhiqiang Rao | Zhanglin Wu | Shaojun Li | Yuanchang Luo | Yuhao Xie | Xiaoyu Chen | Hengchao Shang | Zongyao Li | Zhengzhe Yu | Jinlong Yang | Miaomiao Ma | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submissions of Huawei Translate Services Center (HW-TSC) to the WMT 2022 General Machine Translation Shared Task. We participate in 6 language pairs, including Zh↔En, Ru↔En, Uk↔En, Hr↔En, Uk↔Cs and Liv↔En. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform fine-grained pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. For medium and highresource languages, we mainly use data augmentation strategies, including Back Translation, Self Training, Ensemble Knowledge Distillation, Multilingual, etc. For low-resource languages such as Liv, we use pre-trained machine translation models, and then continue training with Regularization Dropout (R-Drop). The previous mentioned data augmentation methods are also used. Our submissions obtain competitive results in the final evaluation.

pdf bib
Exploring Robustness of Machine Translation Metrics: A Study of Twenty-Two Automatic Metrics in the WMT22 Metric Task
Xiaoyu Chen | Daimeng Wei | Hengchao Shang | Zongyao Li | Zhanglin Wu | Zhengzhe Yu | Ting Zhu | Mengli Zhu | Ning Xie | Lizhi Lei | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

Contextual word embeddings extracted from pre-trained models have become the basis for many downstream NLP tasks, including machine translation automatic evaluations. Metrics that leverage embeddings claim better capture of synonyms and changes in word orders, and thus better correlation with human ratings than surface-form matching metrics (e.g. BLEU). However, few studies have been done to examine robustness of these metrics. This report uses a challenge set to uncover the brittleness of reference-based and reference-free metrics. Our challenge set1 aims at examining metrics’ capability to correlate synonyms in different areas and to discern catastrophic errors at both word- and sentence-levels. The results show that although embedding-based metrics perform relatively well on discerning sentence-level negation/affirmation errors, their performances on relating synonyms are poor. In addition, we find that some metrics are susceptible to text styles so their generalizability compromised.

pdf bib
HW-TSC’s Submission for the WMT22 Efficiency Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Xianzhi Yu | Jianfei Feng | Ting Zhu | Lizhi Lei | Shimin Tao | Hao Yang | Ying Qin | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2022 Efficiency Shared Task. For this year’s task, we still apply sentence-level distillation strategy to train small models with different configurations. Then, we integrate the average attention mechanism into the lightweight RNN model to pursue more efficient decoding. We tried adding a retrain step to our 8-bit and 4-bit models to achieve a balance between model size and quality. We still use Huawei Noah’s Bolt for INT8 inference and 4-bit storage. Coupled with Bolt’s support for batch inference and multi-core parallel computing, we finally submit models with different configurations to the CPU latency and throughput tracks to explore the Pareto frontiers.

pdf bib
HW-TSC Translation Systems for the WMT22 Biomedical Translation Task
Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Daimeng Wei | Xiaoyu Chen | Zongyao Li | Hengchao Shang | Shaojun Li | Ming Zhu | Yuanchang Luo | Yuhao Xie | Miaomiao Ma | Ting Zhu | Lizhi Lei | Song Peng | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the translation systems trained by Huawei translation services center (HW-TSC) for the WMT22 biomedical translation task in five language pairs: English↔German (en↔de), English↔French (en↔fr), English↔Chinese (en↔zh), English↔Russian (en↔ru) and Spanish→English (es→en). Our primary systems are built on deep Transformer with a large filter size. We also utilize R-Drop, data diversification, forward translation, back translation, data selection, finetuning and ensemble to improve the system performance. According to the official evaluation results in OCELoT or CodaLab, our unconstrained systems in en→de, de→en, en→fr, fr→en, en→zh and es→en (clinical terminology sub-track) get the highest BLEU scores among all submissions for the WMT22 biomedical translation task.

pdf bib
HW-TSC Translation Systems for the WMT22 Chat Translation Task
Jinlong Yang | Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhengzhe Yu | Zhiqiang Rao | Shaojun Li | Zhanglin Wu | Yuhao Xie | Yuanchang Luo | Ting Zhu | Yanqing Zhao | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei Translation Services Center (HW-TSC) to WMT22 chat translation shared task on English-Germany (en-de) bidirection with results of zore-shot and few-shot tracks. We use the deep transformer architecture with a lager parameter size. Our submissions to the WMT21 News Translation task are used as the baselines. We adopt strategies such as back translation, forward translation, domain transfer, data selection, and noisy forward translation in task, and achieve competitive results on the development set. We also test the effectiveness of document translation on chat tasks. Due to the lack of chat data, the results on the development set show that it is not as effective as sentence-level translation models.

pdf bib
HW-TSC Systems for WMT22 Very Low Resource Supervised MT Task
Shaojun Li | Yuanchang Luo | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Yuhao Xie | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei translation services center (HW-TSC) to the WMT22 Very Low Resource Supervised MT task. We participate in all 6 supervised tracks including all combinations between Upper/Lower Sorbian (Hsb/Dsb) and German (De). Our systems are build on deep Transformer with a large filter size. We use multilingual transfer with German-Czech (De-Cs) and German-Polish (De-Pl) parallel data. We also utilize regularized dropout (R-Drop), back translation, fine-tuning and ensemble to improve the system performance. According to the official evaluation results on OCELoT, our supervised systems of all 6 language directions get the highest BLEU scores among all submissions. Our pre-trained multilingual model for unsupervised De2Dsb and Dsb2De translation also gain highest BLEU.

pdf bib
HW-TSC’s Submissions to the WMT22 Word-Level Auto Completion Task
Hao Yang | Hengchao Shang | Zongyao Li | Daimeng Wei | Xianghui He | Xiaoyu Chen | Zhengzhe Yu | Jiaxin Guo | Jinlong Yang | Shaojun Li | Yuanchang Luo | Yuhao Xie | Lizhi Lei | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submissions of Huawei Translation Services Center (HW-TSC) to WMT 2022 Word-Level AutoCompletion Task. We propose an end-to-end autoregressive model with bi-context based on Transformer to solve current task. The model uses a mixture of subword and character encoding units to realize the joint encoding of human input, the context of the target side and the decoded sequence, which ensures full utilization of information. We uses one model to solve four types of data structures in the task. During training, we try using a machine translation model as the pre-trained model and fine-tune it for the task. We also add BERT-style MLM data at the fine-tuning stage to improve model performance. We participate in zhen, ende, and deen directions and win the first place in all the three tracks. Particularly, we outperform the second place by more than 5% in terms of accuracy on the zhen and ende tracks. The result is buttressed by human evaluations as well, demonstrating the effectiveness of our model.

pdf bib
HW-TSC’s Participation in the IWSLT 2022 Isometric Spoken Language Translation
Zongyao Li | Jiaxin Guo | Daimeng Wei | Hengchao Shang | Minghan Wang | Ting Zhu | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements.

2021

pdf bib
HW-TSC’s Participation in the WMT 2021 News Translation Shared Task
Daimeng Wei | Zongyao Li | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including Zh/En, De/En, Ja/En, Ha/En, Is/En, Hi/Bn, and Xh/Zu in both directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Multilingual Translation, Ensemble Knowledge Distillation, etc. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2021 Triangular MT Shared Task
Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Zhengzhe Yu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task. We participate in the Russian-to-Chinese task under the constrained condition. We use Transformer architecture and obtain the best performance via a variant with larger parameter sizes. We perform detailed data pre-processing and filtering on the provided large-scale bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, Data Denoising, Average Checkpoint, Ensemble, Fine-tuning, etc. Our system obtains 32.5 BLEU on the dev set and 27.7 BLEU on the test set, the highest score among all submissions.

pdf bib
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task
Zhengzhe Yu | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task. We participate in Samll Track #2, including 6 languages: Javanese (Jv), Indonesian (Id), Malay (Ms), Tagalog (Tl), Tamil (Ta) and English (En) with 30 directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We train a single multilingual model to translate all the 30 directions. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Ensemble Knowledge Distillation, Adapter Fine-tuning. Our model obtains competitive results in the end.

pdf bib
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Jianfei Feng | ZhengZhe Yu | Jiaxin Guo | Shaojun Li | Lizhi Lei | ShiMin Tao | Hao Yang | Jun Yao | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.

pdf bib
HW-TSC’s Submissions to the WMT21 Biomedical Translation Task
Hao Yang | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Daimeng Wei | Zongyao Li | Hengchao Shang | Minghan Wang | Jiaxin Guo | Lizhi Lei | Chuanfei Xu | Min Zhang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC). Technical details are introduced in this paper, including model framework, data pre-processing method and model enhancement strategies. In addition, using the wmt20 OK-aligned biomedical test set, we compare and analyze system performances under different strategies. On WMT21 biomedical translation task, Our systems in English→Chinese and English→German directions get the highest BLEU scores among all submissions according to the official evaluation results.

2020

pdf bib
HW-TSC’s Participation in the WAT 2020 Indic Languages Multilingual Task
Zhengzhe Yu | Zhanglin Wu | Xiaoyu Chen | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Zongyao Li | Minghan Wang | Liangyou Li | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 7th Workshop on Asian Translation

This paper describes our work in the WAT 2020 Indic Multilingual Translation Task. We participated in all 7 language pairs (En<->Bn/Hi/Gu/Ml/Mr/Ta/Te) in both directions under the constrained condition—using only the officially provided data. Using transformer as a baseline, our Multi->En and En->Multi translation systems achieve the best performances. Detailed data filtering and data domain selection are the keys to performance enhancement in our experiment, with an average improvement of 2.6 BLEU scores for each language pair in the En->Multi system and an average improvement of 4.6 BLEU scores regarding the Multi->En. In addition, we employed language independent adapter to further improve the system performances. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2020 News Translation Shared Task
Daimeng Wei | Hengchao Shang | Zhanglin Wu | Zhengzhe Yu | Liangyou Li | Jiaxin Guo | Minghan Wang | Hao Yang | Lizhi Lei | Ying Qin | Shiliang Sun
Proceedings of the Fifth Conference on Machine Translation

This paper presents our work in the WMT 2020 News Translation Shared Task. We participate in 3 language pairs including Zh/En, Km/En, and Ps/En and in both directions under the constrained condition. We use the standard Transformer-Big model as the baseline and obtain the best performance via two variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual dataset. Several commonly used strategies are used to train our models such as Back Translation, Ensemble Knowledge Distillation, etc. We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission. Our submission obtains remarkable results in the final evaluation.