Jianfei Feng
2022
HW-TSC’s Submission for the WMT22 Efficiency Task
Hengchao Shang
|
Ting Hu
|
Daimeng Wei
|
Zongyao Li
|
Xianzhi Yu
|
Jianfei Feng
|
Ting Zhu
|
Lizhi Lei
|
Shimin Tao
|
Hao Yang
|
Ying Qin
|
Jinlong Yang
|
Zhiqiang Rao
|
Zhengzhe Yu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2022 Efficiency Shared Task. For this year’s task, we still apply sentence-level distillation strategy to train small models with different configurations. Then, we integrate the average attention mechanism into the lightweight RNN model to pursue more efficient decoding. We tried adding a retrain step to our 8-bit and 4-bit models to achieve a balance between model size and quality. We still use Huawei Noah’s Bolt for INT8 inference and 4-bit storage. Coupled with Bolt’s support for batch inference and multi-core parallel computing, we finally submit models with different configurations to the CPU latency and throughput tracks to explore the Pareto frontiers.
2021
HW-TSC’s Participation in the WMT 2021 Efficiency Shared Task
Hengchao Shang
|
Ting Hu
|
Daimeng Wei
|
Zongyao Li
|
Jianfei Feng
|
ZhengZhe Yu
|
Jiaxin Guo
|
Shaojun Li
|
Lizhi Lei
|
ShiMin Tao
|
Hao Yang
|
Jun Yao
|
Ying Qin
Proceedings of the Sixth Conference on Machine Translation
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task. We explore the sentence-level teacher-student distillation technique and train several small-size models that find a balance between efficiency and quality. Our models feature deep encoder, shallow decoder and light-weight RNN with SSRU layer. We use Huawei Noah’s Bolt, an efficient and light-weight library for on-device inference. Leveraging INT8 quantization, self-defined General Matrix Multiplication (GEMM) operator, shortlist, greedy search and caching, we submit four small-size and efficient translation models with high translation quality for the one CPU core latency track.
Search
Co-authors
- Hengchao Shang 2
- Ting Hu 2
- Daimeng Wei 2
- Zongyao Li 2
- Zhengzhe Yu 2
- show all...
Venues
- wmt2