2022
pdf
bib
abs
HW-TSC’s Submission for the WMT22 Efficiency Task
Hengchao Shang
|
Ting Hu
|
Daimeng Wei
|
Zongyao Li
|
Xianzhi Yu
|
Jianfei Feng
|
Ting Zhu
|
Lizhi Lei
|
Shimin Tao
|
Hao Yang
|
Ying Qin
|
Jinlong Yang
|
Zhiqiang Rao
|
Zhengzhe Yu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2022 Efficiency Shared Task. For this year’s task, we still apply sentence-level distillation strategy to train small models with different configurations. Then, we integrate the average attention mechanism into the lightweight RNN model to pursue more efficient decoding. We tried adding a retrain step to our 8-bit and 4-bit models to achieve a balance between model size and quality. We still use Huawei Noah’s Bolt for INT8 inference and 4-bit storage. Coupled with Bolt’s support for batch inference and multi-core parallel computing, we finally submit models with different configurations to the CPU latency and throughput tracks to explore the Pareto frontiers.
2021
pdf
bib
abs
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang
|
Ting Hu
|
Daimeng Wei
|
Zongyao Li
|
Jianfei Feng
|
ZhengZhe Yu
|
Jiaxin Guo
|
Shaojun Li
|
Lizhi Lei
|
ShiMin Tao
|
Hao Yang
|
Jun Yao
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.
2020
pdf
bib
abs
Best Student Forcing: A Simple Training Mechanism in Adversarial Language Generation
Jonathan Sauder
|
Ting Hu
|
Xiaoyin Che
|
Goncalo Mordido
|
Haojin Yang
|
Christoph Meinel
Proceedings of the Twelfth Language Resources and Evaluation Conference
Language models trained with Maximum Likelihood Estimation (MLE) have been considered as a mainstream solution in Natural Language Generation (NLG) for years. Recently, various approaches with Generative Adversarial Nets (GANs) have also been proposed. While offering exciting new prospects, GANs in NLG by far are nevertheless reportedly suffering from training instability and mode collapse, and therefore outperformed by conventional MLE models. In this work, we propose techniques for improving GANs in NLG, namely Best Student Forcing (BSF), a novel yet simple adversarial training mechanism in which generated sequences of high quality are selected as temporary ground-truth to further train the generator. We also use an ensemble of discriminators to increase training stability and sample diversity. Evaluation shows that the combination of BSF and multiple discriminators consistently performs better than previous GAN approaches over various metrics, and outperforms a baseline MLE in terms of Fr ́ech ́et Distance, a recently proposed metric capturing both sample quality and diversity.