Jiaxin Guo


2022

pdf bib
Capture Human Disagreement Distributions by Calibrated Networks for Natural Language Inference
Yuxia Wang | Minghan Wang | Yimeng Chen | Shimin Tao | Jiaxin Guo | Chang Su | Min Zhang | Hao Yang
Findings of the Association for Computational Linguistics: ACL 2022

Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels due to its subjectivity. Several recent efforts have been made to acknowledge and embrace the existence of ambiguity, and explore how to capture the human disagreement distribution. In contrast with directly learning from gold ambiguity labels, relying on special resource, we argue that the model has naturally captured the human ambiguity distribution as long as it’s calibrated, i.e. the predictive probability can reflect the true correctness likelihood. Our experiments show that when model is well-calibrated, either by label smoothing or temperature scaling, it can obtain competitive performance as prior work, on both divergence scores between predictive probability and the true human opinion distribution, and the accuracy. This reveals the overhead of collecting gold ambiguity labels can be cut, by broadly solving how to calibrate the NLI network.

pdf bib
The HW-TSC’s Offline Speech Translation System for IWSLT 2022 Evaluation
Yinglu Li | Minghan Wang | Jiaxin Guo | Xiaosong Qiao | Yuxia Wang | Daimeng Wei | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes the HW-TSC’s designation of the Offline Speech Translation System submitted for IWSLT 2022 Evaluation. We explored both cascade and end-to-end system on three language tracks (en-de, en-zh and en-ja), and we chose the cascade one as our primary submission. For the automatic speech recognition (ASR) model of cascade system, there are three ASR models including Conformer, S2T-Transformer and U2 trained on the mixture of five datasets. During inference, transcripts are generated with the help of domain controlled generation strategy. Context-aware reranking and ensemble based anti-interference strategy are proposed to produce better ASR outputs. For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora. Our cascade system shows competitive performance than the known offline systems in the industry and academia.

pdf bib
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022 Evaluation
Minghan Wang | Jiaxin Guo | Yinglu Li | Xiaosong Qiao | Yuxia Wang | Zongyao Li | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our work in the participation of IWSLT 2022 simultaneous speech translation evaluation. For the track of text-to-text (T2T), we participate in three language pairs and build wait-k based simultaneous MT (SimulMT) model for the task. The model was pretrained on WMT21 news corpora, and was further improved with in-domain fine-tuning and self-training. For the speech-to-text (S2T) track, we designed both cascade and end-to-end form in three language pairs. The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track. The end-to-end system is a simultaneous speech translation (SimulST) model based on wait-k strategy, which is directly trained on a synthetic corpus produced by translating all texts of ASR corpora into specific target language with an offline MT model. It also contains a heuristic sentence breaking strategy, preventing it from finishing the translation before the the end of the speech. We evaluate our systems on the MUST-C tst-COMMON dataset and show that the end-to-end system is competitive to the cascade one. Meanwhile, we also demonstrate that the SimulMT model can be efficiently optimized by these approaches, resulting in the improvements of 1-2 BLEU points.

pdf bib
The HW-TSC’s Speech to Speech Translation System for IWSLT 2022 Evaluation
Jiaxin Guo | Yinglu Li | Minghan Wang | Xiaosong Qiao | Yuxia Wang | Hengchao Shang | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022. We design a cascade system consisted of an ASR model, machine translation model and TTS model to convert the speech from one language into another language(en-de). For the ASR part, we find that better performance can be obtained by ensembling multiple heterogeneous ASR models and performing reranking on beam candidates. And we find that the combination of context-aware reranking strategy and MT model fine-tuned on the in-domain dataset is helpful to improve the performance. Because it can mitigate the problem that the inconsistency in transcripts caused by the lack of context. Finally, we use VITS model provided officially to reproduce audio files from the translation hypothesis.

pdf bib
HW-TSC’s Participation in the IWSLT 2022 Isometric Spoken Language Translation
Zongyao Li | Jiaxin Guo | Daimeng Wei | Hengchao Shang | Minghan Wang | Ting Zhu | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task. We participate in all three language pairs (English-German, English-French, English-Spanish) under the constrained setting, and submit an English-German result under the unconstrained setting. We use the standard Transformer model as the baseline and obtain the best performance via one of its variants that shares the decoder input and output embedding. We perform detailed pre-processing and filtering on the provided bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, R-Drop, Average Checkpoint, and Ensemble. We investigate three methods for biasing the output length: i) conditioning the output to a given target-source length-ratio class; ii) enriching the transformer positional embedding with length information and iii) length control decoding for non-autoregressive translation etc. Our submissions achieve 30.7, 41.6 and 36.7 BLEU respectively on the tst-COMMON test sets for English-German, English-French, English-Spanish tasks and 100% comply with the length requirements.

pdf bib
Diformer: Directional Transformer for Neural Machine Translation
Minghan Wang | Jiaxin Guo | Yuxia Wang | Daimeng Wei | Hengchao Shang | Yinglu Li | Chang Su | Yimeng Chen | Min Zhang | Shimin Tao | Hao Yang
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful on the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.

2021

pdf bib
HW-TSC’s Participation in the WMT 2021 News Translation Shared Task
Daimeng Wei | Zongyao Li | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Hengchao Shang | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including Zh/En, De/En, Ja/En, Ha/En, Is/En, Hi/Bn, and Xh/Zu in both directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Multilingual Translation, Ensemble Knowledge Distillation, etc. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2021 Triangular MT Shared Task
Zongyao Li | Daimeng Wei | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Zhengzhe Yu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task. We participate in the Russian-to-Chinese task under the constrained condition. We use Transformer architecture and obtain the best performance via a variant with larger parameter sizes. We perform detailed data pre-processing and filtering on the provided large-scale bilingual data. Several strategies are used to train our models, such as Multilingual Translation, Back Translation, Forward Translation, Data Denoising, Average Checkpoint, Ensemble, Fine-tuning, etc. Our system obtains 32.5 BLEU on the dev set and 27.7 BLEU on the test set, the highest score among all submissions.

pdf bib
HW-TSC’s Participation in the WMT 2021 Large-Scale Multilingual Translation Task
Zhengzhe Yu | Daimeng Wei | Zongyao Li | Hengchao Shang | Xiaoyu Chen | Zhanglin Wu | Jiaxin Guo | Minghan Wang | Lizhi Lei | Min Zhang | Hao Yang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task. We participate in Samll Track #2, including 6 languages: Javanese (Jv), Indonesian (Id), Malay (Ms), Tagalog (Tl), Tamil (Ta) and English (En) with 30 directions under the constrained condition. We use Transformer architecture and obtain the best performance via multiple variants with larger parameter sizes. We train a single multilingual model to translate all the 30 directions. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual datasets. Several commonly used strategies are used to train our models, such as Back Translation, Forward Translation, Ensemble Knowledge Distillation, Adapter Fine-tuning. Our model obtains competitive results in the end.

pdf bib
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang | Ting Hu | Daimeng Wei | Zongyao Li | Jianfei Feng | ZhengZhe Yu | Jiaxin Guo | Shaojun Li | Lizhi Lei | ShiMin Tao | Hao Yang | Jun Yao | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.

pdf bib
HW-TSC’s Submissions to the WMT21 Biomedical Translation Task
Hao Yang | Zhanglin Wu | Zhengzhe Yu | Xiaoyu Chen | Daimeng Wei | Zongyao Li | Hengchao Shang | Minghan Wang | Jiaxin Guo | Lizhi Lei | Chuanfei Xu | Min Zhang | Ying Qin
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC). Technical details are introduced in this paper, including model framework, data pre-processing method and model enhancement strategies. In addition, using the wmt20 OK-aligned biomedical test set, we compare and analyze system performances under different strategies. On WMT21 biomedical translation task, Our systems in English→Chinese and English→German directions get the highest BLEU scores among all submissions according to the official evaluation results.

pdf bib
Make the Blind Translator See The World: A Novel Transfer Learning Solution for Multimodal Machine Translation
Minghan Wang | Jiaxin Guo | Yimeng Chen | Chang Su | Min Zhang | Shimin Tao | Hao Yang
Proceedings of Machine Translation Summit XVIII: Research Track

Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT. To this end and we propose a transfer learning solution. Specifically and 1) A vanilla Transformer is pre-trained on massive bilingual text-only corpus to obtain prior knowledge; 2) A multimodal Transformer named VLTransformer is proposed with several components incorporated visual contexts; and 3) The parameters of VLTransformer are initialized with the pre-trained vanilla Transformer and then being fine-tuned on MMT tasks with a newly proposed method named cross-modal masking which forces the model to learn from both modalities. We evaluated on the Multi30k en-de and en-fr dataset and improving up to 8% BLEU score compared with the SOTA performance. The experimental result demonstrates that performing transfer learning with monomodal pre-trained NMT model on multimodal NMT tasks can obtain considerable boosts.

2020

pdf bib
HW-TSC’s Participation in the WAT 2020 Indic Languages Multilingual Task
Zhengzhe Yu | Zhanglin Wu | Xiaoyu Chen | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Zongyao Li | Minghan Wang | Liangyou Li | Lizhi Lei | Hao Yang | Ying Qin
Proceedings of the 7th Workshop on Asian Translation

This paper describes our work in the WAT 2020 Indic Multilingual Translation Task. We participated in all 7 language pairs (En<->Bn/Hi/Gu/Ml/Mr/Ta/Te) in both directions under the constrained condition—using only the officially provided data. Using transformer as a baseline, our Multi->En and En->Multi translation systems achieve the best performances. Detailed data filtering and data domain selection are the keys to performance enhancement in our experiment, with an average improvement of 2.6 BLEU scores for each language pair in the En->Multi system and an average improvement of 4.6 BLEU scores regarding the Multi->En. In addition, we employed language independent adapter to further improve the system performances. Our submission obtains competitive results in the final evaluation.

pdf bib
HW-TSC’s Participation in the WMT 2020 News Translation Shared Task
Daimeng Wei | Hengchao Shang | Zhanglin Wu | Zhengzhe Yu | Liangyou Li | Jiaxin Guo | Minghan Wang | Hao Yang | Lizhi Lei | Ying Qin | Shiliang Sun
Proceedings of the Fifth Conference on Machine Translation

This paper presents our work in the WMT 2020 News Translation Shared Task. We participate in 3 language pairs including Zh/En, Km/En, and Ps/En and in both directions under the constrained condition. We use the standard Transformer-Big model as the baseline and obtain the best performance via two variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual dataset. Several commonly used strategies are used to train our models such as Back Translation, Ensemble Knowledge Distillation, etc. We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission. Our submission obtains remarkable results in the final evaluation.

pdf bib
HW-TSC’s Participation at WMT 2020 Automatic Post Editing Shared Task
Hao Yang | Minghan Wang | Daimeng Wei | Hengchao Shang | Jiaxin Guo | Zongyao Li | Lizhi Lei | Ying Qin | Shimin Tao | Shiliang Sun | Yimeng Chen
Proceedings of the Fifth Conference on Machine Translation

The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task. We participate in the English-German and English-Chinese language pairs. Our system is built based on the Transformer pre-trained on WMT 2019 and WMT 2020 News Translation corpora, and fine-tuned on the APE corpus. Bottleneck Adapter Layers are integrated into the model to prevent over-fitting. We further collect external translations as the augmented MT candidates to improve the performance. The experiment demonstrates that pre-trained NMT models are effective when fine-tuning with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. Our system achieves competitive results on both directions in the final evaluation.

pdf bib
HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task
Minghan Wang | Hao Yang | Hengchao Shang | Daimeng Wei | Jiaxin Guo | Lizhi Lei | Ying Qin | Shimin Tao | Shiliang Sun | Yimeng Chen | Liangyou Li
Proceedings of the Fifth Conference on Machine Translation

This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task. Our system follows standard Predictor-Estimator architecture, with a pre-trained Transformer as the Predictor, and specific classifiers and regressors as Estimators. We integrate Bottleneck Adapter Layers in the Predictor to improve the transfer learning efficiency and prevent from over-fitting. At the same time, we jointly train the word- and sentence-level tasks with a unified model with multitask learning. Pseudo-PE assisted QE (PEAQE) is proposed, resulting in significant improvements on the performance. Our submissions achieve competitive result in word/sentence-level sub-tasks for both of En-De/Zh language pairs.