Shaojun Li - ACL Anthology

Shaojun Li

2025

Enhancing Large Language Models for Document-Level Translation Post-Editing Using Monolingual Data
Zongyao Li | Zhiqiang Rao | Hengchao Shang | Jiaxin Guo | Shaojun Li | Daimeng Wei | Hao Yang
Proceedings of the 31st International Conference on Computational Linguistics

The translation capabilities of neural machine translation (NMT) models based on the encoder-decoder framework are extremely potent. Although Large Language Models (LLMs) have achieved remarkable results in many tasks, they have not reached state-of-the-art performance in NMT. However, traditional NMT still faces significant challenges in areas of document translation such as context consistency, tense, and pronoun resolution, where LLMs inherently possess substantial advantages. Instead of directly using LLMs for translation, employing them for Automatic Post-Editing (APE) to post-edit NMT outputs proves to be a viable option. However, document-level bilingual data is extremely scarce. This paper proposes a method that can effectively leverage the capabilities of LLMs to optimize document translation using only monolingual data. By employing two NMT models in opposite directions (Source-to-Target and Target-to-Source), we generate pseudo-document training data for the training of APE. We have identified and resolved the issue between training and inference mode inconsistency brought about by the pseudo-document training data. The final experimental results demonstrate that by using only document-level monolingual data, we can significantly improve the quality of NMT and greatly enhance issues such as reference and contextual consistency in NMT.

Generative Annotation for ASR Named Entity Correction
Yuanchang Luo | Daimeng Wei | Shaojun Li | Hengchao Shang | Jiaxin Guo | Zongyao Li | Zhanglin Wu | Xiaoyu Chen | Zhiqiang Rao | Jinlong Yang | Hao Yang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

End-to-end automatic speech recognition systems often fail to transcribe domain-speciffcnamed entities, causing catastrophic failuresin downstream tasks. Numerous fast and lightweight named entity correction (NEC) models have been proposed in recent years. These models, mainly leveraging phonetic-level edit distance algorithms, have shown impressive performances. However, when theforms of the wrongly-transcribed words(s) and the ground-truth entity are signiffcantly different, these methods often fail to locate the wrongly transcribed words in hypothesis, thus limiting their usage. We propose a novel NEC method that utilizes speech sound features to retrieve candidate entities. With speech sound features and candidate entities, we inovatively design a generative method to annotate entityerrors in ASR transcripts and replace the textwith correct entities. This method is effective inscenarios of word form difference. We test ourmethod using open-source and self-constructed test sets. The results demonstrate that our NEC method can bring signiffcant improvement to entity accuracy. We will open source our self constructed test set and training data.

2024

Improving the Quality of IWLST 2024 Cascade Offline Speech Translation and Speech-to-Speech Translation via Translation Hypothesis Ensembling with NMT models and Large Language Models
Zhanglin Wu | Jiaxin Guo | Daimeng Wei | Zhiqiang Rao | Zongyao Li | Hengchao Shang | Yuanchang Luo | Shaojun Li | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper presents HW-TSC’s submission to the IWSLT 2024 Offline Speech Translation Task and Speech-to-Speech Translation Task. The former includes three translation directions: English to German, English to Chinese, and English to Japanese, while the latter only includes the translation direction of English to Chinese. We attend all three tracks (Constraint training, Constrained with Large Language Models training, and Unconstrained training) of offline speech translation task, using the cascade model architecture. Under the constrained training track, we train an ASR model from scratch, and then employ R-Drop and domain data selection to train the NMT model. In the constrained with Large Language Models training track, we use Wav2vec 2.0 and mBART50 for ASR model training initialization, and then train the LLama2-7B-based MT model using continuous training with sentence-aligned parallel data, supervised fine-tuning, and contrastive preference optimization. In the unconstrained training track, we fine-tune the whisper model for speech recognition, and then ensemble the translation results of NMT models and LLMs to produce superior translation output. For the speech-to-speech translation Task, we initially employ the offline speech translation system described above to generate the translated text. Then, we utilize the VITS model to generate the corresponding speech and employ the OpenVoice model for timbre cloning.

HW-TSC’s Speech to Text Translation System for IWSLT 2024 in Indic track
Bin Wei | Zongyao Li | Jiaxin Guo | Daimeng Wei | Zhanglin Wu | Xiaoyu Chen | Zhiqiang Rao | Shaojun Li | Yuanchang Luo | Hengchao Shang | Hao Yang | Yanfei Jiang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This article introduces the process of HW-TSC and the results of IWSLT 2024 Indic Track Speech to Text Translation. We designed a cascade system consisting of an ASR model and a machine translation model to translate speech from one language to another. For the ASR part, we directly use whisper large v3 as our ASR model. Our main task is to optimize the machine translation model (en2ta, en2hi, en2bn). In the process of optimizing the translation model, we first use bilingual corpus to train the baseline model. Then we use monolingual data to construct pseudo-corpus data to further enhance the baseline model. Finally, we filter the parallel corpus data through the labse filtering method and finetune the model again, which can further improve the bleu value. We also selected domain data from bilingual corpus to finetune previous model to achieve the best results.

HW-TSC’s Submissions To the IWSLT2024 Low-resource Speech Translation Tasks
Zheng Jiawei | Hengchao Shang | Zongyao Li | Zhanglin Wu | Daimeng Wei | Zhiqiang Rao | Shaojun Li | Jiaxin Guo | Bin Wei | Yuanchang Luo | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

In this work, we submitted our systems to the low-resource track of the IWSLT 2024 Speech Translation Campaign. Our systems tackled the unconstrained condition of the Dialectal Arabic North Levantine (ISO-3 code: apc) to English language pair. We proposed a cascaded solution consisting of an automatic speech recognition (ASR) model and a machine translation (MT) model. It was noted that the ASR model employed the pre-trained Whisper-large-v3 model to process the speech data, while the MT model adopted the Transformer architecture. To improve the quality of the MT model, it was stated that our system utilized not only the data provided by the competition but also an additional 54 million parallel sentences. Ultimately, we reported that our final system achieved a BLEU score of 24.7 for apc-to-English translation.

HW-TSC’s Simultaneous Speech Translation System for IWSLT 2024
Shaojun Li | Zhiqiang Rao | Bin Wei | Yuanchang Luo | Zhanglin Wu | Zongyao Li | Hengchao Shang | Jiaxin Guo | Daimeng Wei | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper outlines our submission for the IWSLT 2024 Simultaneous Speech-to-Text (SimulS2T) and Speech-to-Speech (SimulS2S) Translation competition. We have engaged in all four language directions and both the SimulS2T and SimulS2S tracks: English-German (EN-DE), English-Chinese (EN-ZH), English-Japanese (EN-JA), and Czech-English (CS-EN). For the S2T track, we have built upon our previous year’s system and further honed the cascade system composed of ASR model and MT model. Concurrently, we have introduced an end-to-end system specifically for the CS-EN direction. This end-to-end (E2E) system primarily employs the pre-trained seamlessM4T model. In relation to the SimulS2S track, we have integrated a novel TTS model into our SimulS2T system. The final submission for the S2T directions of EN-DE, EN-ZH, and EN-JA has been refined over our championship system from last year. Building upon this foundation, the incorporation of the new TTS into our SimulS2S system has resulted in the ASR-BLEU surpassing last year’s best score.

HW-TSC’s submission to the IWSLT 2024 Subtitling track
Yuhao Xie | Yuanchang Luo | Zongyao Li | Zhanglin Wu | Xiaoyu Chen | Zhiqiang Rao | Shaojun Li | Hengchao Shang | Jiaxin Guo | Daimeng Wei | Hao Yang
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper introduces HW-TSC’s submission to the IWSLT 2024 Subtitling track. For the automatic subtitling track, we use an unconstrained cascaded strategy, with the main steps being: ASR with word-level timestamps, sentence segmentation based on punctuation restoration, further alignment using CTC or using machine translation with length penalty. For the subtitle compression track, we employ a subtitle compression strategy that integrates machine translation models and extensive rewriting models. We acquire the subtitle text requiring revision through the CPS index, then utilize a translation model to obtain the English version of this text. Following this, we extract the compressed-length subtitle text through controlled decoding. If this method fails to compress the text successfully, we resort to the Llama2 few-shot model for further compression.

Choose the Final Translation from NMT and LLM Hypotheses Using MBR Decoding: HW-TSC’s Submission to the WMT24 General MT Shared Task
Zhanglin Wu | Daimeng Wei | Zongyao Li | Hengchao Shang | Jiaxin Guo | Shaojun Li | Zhiqiang Rao | Yuanchang Luo | Ning Xie | Hao Yang
Proceedings of the Ninth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT24 general machine translation (MT) shared task, where we participate in the English to Chinese (en→zh) language pair. Similar to previous years’ work, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated training, curriculum learning, and transductive ensemble learning to train the neural machine translation (NMT) model based on the deep Transformer-big architecture. The difference is that we also use continue pre-training, supervised fine-tuning, and contrastive preference optimization to train the large language model (LLM) based MT model. By using Minimum Bayesian risk (MBR) decoding to select the final translation from multiple hypotheses for NMT and LLM-based MT models, our submission receives competitive results in the final evaluation.

Machine Translation Advancements of Low-Resource Indian Languages by Transfer Learning
Bin Wei | Zheng Jiawei | Zongyao Li | Zhanglin Wu | Jiaxin Guo | Daimeng Wei | Zhiqiang Rao | Shaojun Li | Yuanchang Luo | Hengchao Shang | Jinlong Yang | Yuhao Xie | Hao Yang
Proceedings of the Ninth Conference on Machine Translation

This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop a reliable machine translation system for low-resource Indian languages, we employed two distinct knowledge transfer strategies, taking into account the characteristics of the language scripts and the support available from existing open-source models for Indian languages. For Assamese(as) and Manipuri(mn), we fine-tuned the existing IndicTrans2 open-source model to enable bidirectional translation between English and these languages. For Khasi(kh) and Mizo(mz), we trained a multilingual model as the baseline using bilingual data from this four language pairs as well as additional Bengali data, which share the same language family. This was followed by fine-tuning to achieve bidirectional translation between English and Khasi, as well as English and Mizo. Our transfer learning experiments produced significant results: 23.5 BLEU for en→as, 31.8 BLEU for en→mn, 36.2 BLEU for as→en, and 47.9 BLEU for mn→en on their respective test sets. Similarly, the multilingual model transfer learning experiments yielded impressive outcomes, achieving 19.7 BLEU for en→kh, 32.8 BLEU for en→mz, 16.1 BLEU for kh→en, and 33.9 BLEU for mz→en on their respective test sets. These results not only highlight the effectiveness of transfer learning techniques for low-resource languages but also contribute to advancing machine translation capabilities for low-resource Indian languages.

Multilingual Transfer and Domain Adaptation for Low-Resource Languages of Spain
Yuanchang Luo | Zhanglin Wu | Daimeng Wei | Hengchao Shang | Zongyao Li | Jiaxin Guo | Zhiqiang Rao | Shaojun Li | Jinlong Yang | Yuhao Xie | Zheng Jiawei | Bin Wei | Hao Yang
Proceedings of the Ninth Conference on Machine Translation

This article introduces the submission status of the Translation into Low-Resource Languages of Spain task at (WMT 2024) by Huawei Translation Service Center (HW-TSC). We participated in three translation tasks: spanish to aragonese (es2arg), spanish to aranese (es2arn), and spanish to asturian (es2ast). For these three translation tasks, we use training strategies such as multilingual transfer, regularized dropout, forward translation and back translation, labse denoising, transduction ensemble learning and other strategies to neural machine translation (NMT) model based on training deep transformer-big architecture. By using these enhancement strategies, our submission achieved a competitive result in the final evaluation.

Context-aware and Style-related Incremental Decoding Framework for Discourse-Level Literary Translation
Yuanchang Luo | Jiaxin Guo | Daimeng Wei | Hengchao Shang | Zongyao Li | Zhanglin Wu | Zhiqiang Rao | Shaojun Li | Jinlong Yang | Hao Yang
Proceedings of the Ninth Conference on Machine Translation

This report outlines our approach for the WMT24 Discourse-Level Literary Translation Task, focusing on the Chinese-English language pair in the Constrained Track. Translating literary texts poses significant challenges due to the nuanced meanings, idiomatic expressions, and intricate narrative structures inherent in such works. To address these challenges, we leveraged the Chinese-Llama2 model, specifically enhanced for this task through a combination of Continual Pre-training (CPT) and Supervised Fine-Tuning (SFT). Our methodology includes a novel Incremental Decoding framework, which ensures that each sentence is translated with consideration of its broader context, maintaining coherence and consistency throughout the text. This approach allows the model to capture long-range dependencies and stylistic elements, producing translations that faithfully preserve the original literary quality. Our experiments demonstrate significant improvements in both sentence-level and document-level BLEU scores, underscoring the effectiveness of our proposed framework in addressing the complexities of document-level literary translation.

Exploring the Traditional NMT Model and Large Language Model for Chat Translation
Jinlong Yang | Hengchao Shang | Daimeng Wei | Jiaxin Guo | Zongyao Li | Zhanglin Wu | Zhiqiang Rao | Shaojun Li | Yuhao Xie | Yuanchang Luo | Zheng Jiawei | Bin Wei | Hao Yang
Proceedings of the Ninth Conference on Machine Translation

This paper describes the submissions of Huawei Translation Services Center(HW-TSC) to WMT24 chat translation shared task on English↔Germany (en-de) bidirection. The experiments involved fine-tuning models using chat data and exploring various strategies, including Minimum Bayesian Risk (MBR) decoding and self-training. The results show significant performance improvements in certain directions, with the MBR self-training method achieving the best results. The Large Language Model also discusses the challenges and potential avenues for further research in the field of chat translation.

2023

Length-Aware NMT and Adaptive Duration for Automatic Dubbing
Zhiqiang Rao | Hengchao Shang | Jinlong Yang | Daimeng Wei | Zongyao Li | Jiaxin Guo | Shaojun Li | Zhengzhe Yu | Zhanglin Wu | Yuhao Xie | Bin Wei | Jiawei Zheng | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents the submission of Huawei Translation Services Center for the IWSLT 2023 dubbing task in the unconstrained setting. The proposed solution consists of a Transformer-based machine translation model and a phoneme duration predictor. The Transformer is deep and multiple target-to-source length-ratio class labels are used to control target lengths. The variation predictor in FastSpeech2 is utilized to predict phoneme durations. To optimize the isochrony in dubbing, re-ranking and scaling are performed. The source audio duration is used as a reference to re-rank the translations of different length-ratio labels, and the one with minimum time deviation is preferred. Additionally, the phoneme duration outputs are scaled within a defined threshold to narrow the duration gap with the source audio.

This paper presents Huawei Translation Service Center (HW-TSC)’s submission on the IWSLT 2023 formality control task, which provides two training scenarios: supervised and zero-shot, each containing two language pairs, and sets constrained and unconstrained conditions. We train the formality control models for these four language pairs under these two conditions respectively, and submit the corresponding translation results. Our efforts are divided into two fronts: enhancing general translation quality and improving formality control capability. According to the different requirements of the formality control task, we use a multi-stage pre-training method to train a bilingual or multilingual neural machine translation (NMT) model as the basic model, which can improve the general translation quality of the base model to a relatively high level. Then, under the premise of affecting the general translation quality of the basic model as little as possible, we adopt domain adaptation and reranking-based transductive learning methods to improve the formality control capability of the model.

The HW-TSC’s Simultaneous Speech-to-Text Translation System for IWSLT 2023 Evaluation
Jiaxin Guo | Daimeng Wei | Zhanglin Wu | Zongyao Li | Zhiqiang Rao | Minghan Wang | Hengchao Shang | Xiaoyu Chen | Zhengzhe Yu | Shaojun Li | Yuhao Xie | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Text Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our proposed solution is a cascaded incremental decoding system that comprises an ASR model and an MT model. The ASR model is based on the U2++ architecture and can handle both streaming and offline speech scenarios with ease. Meanwhile, the MT model adopts the Deep-Transformer architecture. To improve performance, we explore methods to generate a confident partial target text output that guides the next MT incremental decoding process. In our experiments, we demonstrate that our simultaneous strategies achieve low latency while maintaining a loss of no more than 2 BLEU points when compared to offline systems.

The HW-TSC’s Simultaneous Speech-to-Speech Translation System for IWSLT 2023 Evaluation
Hengchao Shang | Zhiqiang Rao | Zongyao Li | Zhanglin Wu | Jiaxin Guo | Minghan Wang | Daimeng Wei | Shaojun Li | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

In this paper, we present our submission to the IWSLT 2023 Simultaneous Speech-to-Speech Translation competition. Our participation involves three language directions: English-German, English-Chinese, and English-Japanese. Our solution is a cascaded incremental decoding system, consisting of an ASR model, an MT model, and a TTS model. By adopting the strategies used in the Speech-to-Text track, we have managed to generate a more confident target text for each audio segment input, which can guide the next MT incremental decoding process. Additionally, we have integrated the TTS model to seamlessly reproduce audio files from the translation hypothesis. To enhance the effectiveness of our experiment, we have utilized a range of methods to reduce error conditions in the TTS input text and improve the smoothness of the TTS output audio.

Treating General MT Shared Task as a Multi-Domain Adaptation Problem: HW-TSC’s Submission to the WMT23 General MT Shared Task
Zhanglin Wu | Daimeng Wei | Zongyao Li | Zhengzhe Yu | Shaojun Li | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Yuhao Xie | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT23 general machine translation (MT) shared task, in which we participate in Chinese↔English (zh↔en) language pair. We use Transformer architecture and obtain the best performance via a variant with larger parameter size. We perform fine-grained pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. We mainly use model enhancement strategies, including Regularized Dropout, Bidirectional Training, Data Diversification, Forward Translation, Back Translation, Alternated Training, Curriculum Learning and Transductive Ensemble Learning. Our submissions obtain competitive results in the final evaluation.

The Path to Continuous Domain Adaptation Improvements by HW-TSC for the WMT23 Biomedical Translation Shared Task
Zhanglin Wu | Daimeng Wei | Zongyao Li | Zhengzhe Yu | Shaojun Li | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Yuhao Xie | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the domain adaptation methods adopted by Huawei Translation Service Center (HW-TSC) to train the neural machine translation (NMT) system on the English↔German (en↔de) language pair of the WMT23 biomedical translation task. Our NMT system is built on deep Transformer with larger parameter sizes. Based on the biomedical NMT system trained last year, we leverage Curriculum Learning, Data Diversification, Forward translation, Back translation, and Transductive Ensemble Learning to further improve system performance. Overall, we believe our submission can achieve highly competitive result in the official final evaluation.

HW-TSC’s Submissions to the WMT23 Discourse-Level Literary Translation Shared Task
Yuhao Xie | Zongyao Li | Zhanglin Wu | Daimeng Wei | Xiaoyu Chen | Zhiqiang Rao | Shaojun Li | Hengchao Shang | Jiaxin Guo | Lizhi Lei | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper introduces HW-TSC’s submission to the WMT23 Discourse-Level Literary Translation shared task. We use standard sentence-level transformer as a baseline, and perform domain adaptation and discourse modeling to enhance discourse-level capabilities. Regarding domain adaptation, we employ Back-Translation, Forward-Translation and Data Diversification. For discourse modeling, we apply strategies such as Multi-resolutional Document-to-Document Translation and TrAining Data Augmentation.

2022

This paper presents the submissions of Huawei Translate Services Center (HW-TSC) to the WMT 2022 General Machine Translation Shared Task. We participate in 6 language pairs, including Zh↔En, Ru↔En, Uk↔En, Hr↔En, Uk↔Cs and Liv↔En. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform fine-grained pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. For medium and highresource languages, we mainly use data augmentation strategies, including Back Translation, Self Training, Ensemble Knowledge Distillation, Multilingual, etc. For low-resource languages such as Liv, we use pre-trained machine translation models, and then continue training with Regularization Dropout (R-Drop). The previous mentioned data augmentation methods are also used. Our submissions obtain competitive results in the final evaluation.

This paper describes the translation systems trained by Huawei translation services center (HW-TSC) for the WMT22 biomedical translation task in five language pairs: English↔German (en↔de), English↔French (en↔fr), English↔Chinese (en↔zh), English↔Russian (en↔ru) and Spanish→English (es→en). Our primary systems are built on deep Transformer with a large filter size. We also utilize R-Drop, data diversification, forward translation, back translation, data selection, finetuning and ensemble to improve the system performance. According to the official evaluation results in OCELoT or CodaLab, our unconstrained systems in en→de, de→en, en→fr, fr→en, en→zh and es→en (clinical terminology sub-track) get the highest BLEU scores among all submissions for the WMT22 biomedical translation task.

This paper describes the submissions of Huawei Translation Services Center (HW-TSC) to WMT22 chat translation shared task on English-Germany (en-de) bidirection with results of zore-shot and few-shot tracks. We use the deep transformer architecture with a lager parameter size. Our submissions to the WMT21 News Translation task are used as the baselines. We adopt strategies such as back translation, forward translation, domain transfer, data selection, and noisy forward translation in task, and achieve competitive results on the development set. We also test the effectiveness of document translation on chat tasks. Due to the lack of chat data, the results on the development set show that it is not as effective as sentence-level translation models.

HW-TSC Systems for WMT22 Very Low Resource Supervised MT Task
Shaojun Li | Yuanchang Luo | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jinlong Yang | Zhiqiang Rao | Zhengzhe Yu | Yuhao Xie | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes the submissions of Huawei translation services center (HW-TSC) to the WMT22 Very Low Resource Supervised MT task. We participate in all 6 supervised tracks including all combinations between Upper/Lower Sorbian (Hsb/Dsb) and German (De). Our systems are build on deep Transformer with a large filter size. We use multilingual transfer with German-Czech (De-Cs) and German-Polish (De-Pl) parallel data. We also utilize regularized dropout (R-Drop), back translation, fine-tuning and ensemble to improve the system performance. According to the official evaluation results on OCELoT, our supervised systems of all 6 language directions get the highest BLEU scores among all submissions. Our pre-trained multilingual model for unsupervised De2Dsb and Dsb2De translation also gain highest BLEU.

HW-TSC’s Submissions to the WMT22 Word-Level Auto Completion Task
Hao Yang | Hengchao Shang | Zongyao Li | Daimeng Wei | Xianghui He | Xiaoyu Chen | Zhengzhe Yu | Jiaxin Guo | Jinlong Yang | Shaojun Li | Yuanchang Luo | Yuhao Xie | Lizhi Lei | Ying Qin
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the submissions of Huawei Translation Services Center (HW-TSC) to WMT 2022 Word-Level AutoCompletion Task. We propose an end-to-end autoregressive model with bi-context based on Transformer to solve current task. The model uses a mixture of subword and character encoding units to realize the joint encoding of human input, the context of the target side and the decoded sequence, which ensures full utilization of information. We uses one model to solve four types of data structures in the task. During training, we try using a machine translation model as the pre-trained model and fine-tune it for the task. We also add BERT-style MLM data at the fine-tuning stage to improve model performance. We participate in zh→en, en→de, and de→en directions and win the first place in all the three tracks. Particularly, we outperform the second place by more than 5% in terms of accuracy on the zh→en and en→de tracks. The result is buttressed by human evaluations as well, demonstrating the effectiveness of our model.

2021

HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Jianfei Feng | ZhengZhe Yu | Jiaxin Guo | Shaojun Li | Lizhi Lei | ShiMin Tao | Hao Yang | Jun Yao | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.

Co-authors

Zhiqiang Rao 21

Yuanchang Luo 16

Jinlong Yang 12

Venues