2023
pdf
bib
abs
HW-TSC at IWSLT2023: Break the Quality Ceiling of Offline Track via Pre-Training and Domain Adaptation
Zongyao Li
|
Zhanglin Wu
|
Zhiqiang Rao
|
Xie YuHao
|
Guo JiaXin
|
Daimeng Wei
|
Hengchao Shang
|
Wang Minghan
|
Xiaoyu Chen
|
Zhengzhe Yu
|
Li ShaoJun
|
Lei LiZhi
|
Hao Yang
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper presents HW-TSC’s submissions to the IWSLT 2023 Offline Speech Translation task, including speech translation of talks from English to German, Chinese, and Japanese, respectively. We participate in all three conditions (constrained training, constrained with large language models training, and unconstrained training) with models of cascaded architectures. We use data enhancement, pre-training models and other means to improve the ASR quality, and R-Drop, deep model, domain data selection, etc. to improve the translation quality. Compared with last year’s best results, we achieve 2.1 BLEU improvement on the MuST-C English-German test set.
2021
pdf
bib
abs
How Length Prediction Influence the Performance of Non-Autoregressive Translation?
Minghan Wang
|
Guo Jiaxin
|
Yuxia Wang
|
Yimeng Chen
|
Su Chang
|
Hengchao Shang
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Length prediction is a special task in a series of NAT models where target length has to be determined before generation. However, the performance of length prediction and its influence on translation quality has seldom been discussed. In this paper, we present comprehensive analyses on length prediction task of NAT, aiming to find the factors that influence performance, as well as how it associates with translation quality. We mainly perform experiments based on Conditional Masked Language Model (CMLM) (Ghazvininejad et al., 2019), a representative NAT model, and evaluate it on two language pairs, En-De and En-Ro. We draw two conclusions: 1) The performance of length prediction is mainly influenced by properties of language pairs such as alignment pattern, word order or intrinsic length ratio, and is also affected by the usage of knowledge distilled data. 2) There is a positive correlation between the performance of the length prediction and the BLEU score.
pdf
bib
abs
HI-CMLM: Improve CMLM with Hybrid Decoder Input
Minghan Wang
|
Guo Jiaxin
|
Yuxia Wang
|
Yimeng Chen
|
Su Chang
|
Daimeng Wei
|
Min Zhang
|
Shimin Tao
|
Hao Yang
Proceedings of the 14th International Conference on Natural Language Generation
Mask-predict CMLM (Ghazvininejad et al.,2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence. In this work, we mitigate this problem by combining copied source with embeddings of [MASK] in decoder. Notably. it’s not a straightforward copying that is shown to be useless, but a novel heuristic hybrid strategy — fence-mask. Experimental results show that it gains consistent boosts on both WMT14 En<->De and WMT16 En<->Ro corpus by 0.5 BLEU on average, and 1 BLEU for less-informative short sentences. This reveals that incorporating additional information by proper strategies is beneficial to improve CMLM, particularly translation quality of short texts and speeding up early-stage convergence.
pdf
bib
abs
HW-TSC’s Participation at WMT 2021 Quality Estimation Shared Task
Yimeng Chen
|
Chang Su
|
Yingtao Zhang
|
Yuxia Wang
|
Xiang Geng
|
Hao Yang
|
Shimin Tao
|
Guo Jiaxin
|
Wang Minghan
|
Min Zhang
|
Yujia Liu
|
Shujian Huang
Proceedings of the Sixth Conference on Machine Translation
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task. We participated in all of the three sub-tasks, including Sentence-Level Direct Assessment (DA) task, Word and Sentence-Level Post-editing Effort task and Critical Error Detection task, in all language pairs. Our systems employ the framework of Predictor-Estimator, concretely with a pre-trained XLM-Roberta as Predictor and task-specific classifier or regressor as Estimator. For all tasks, we improve our systems by incorporating post-edit sentence or additional high-quality translation sentence in the way of multitask learning or encoding it with predictors directly. Moreover, in zero-shot setting, our data augmentation strategy based on Monte-Carlo Dropout brings up significant improvement on DA sub-task. Notably, our submissions achieve remarkable results over all tasks.