Shanbo Cheng


2021

pdf bib
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian | Yi Zhou | Zaixiang Zheng | Yaoming Zhu | Zehui Lin | Jiangtao Feng | Shanbo Cheng | Lei Li | Mingxuan Wang | Hao Zhou
Proceedings of the Sixth Conference on Machine Translation

This paper describes the Volctrans’ submission to the WMT21 news translation shared task for German->English translation. We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer, which enables fast and accurate parallel decoding in contrast to the currently prevailing autoregressive models. To the best of our knowledge, this is the first parallel translation system that can be scaled to such a practical scenario like WMT competition. More importantly, our parallel translation system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.

pdf bib
Language Tags Matter for Zero-Shot Neural Machine Translation
Liwei Wu | Shanbo Cheng | Mingxuan Wang | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Qingnan Jiang | Mingxuan Wang | Jun Cao | Shanbo Cheng | Shujian Huang | Lei Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at https://github.com/jiangqn/KSTER.

2020

pdf bib
Language-aware Interlingua for Multilingual Neural Machine Translation
Changfeng Zhu | Heng Yu | Shanbo Cheng | Weihua Luo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multilingual neural machine translation (NMT) has led to impressive accuracy improvements in low-resource scenarios by sharing common linguistic information across languages. However, the traditional multilingual model fails to capture the diversity and specificity of different languages, resulting in inferior performance compared with individual models that are sufficiently trained. In this paper, we incorporate a language-aware interlingua into the Encoder-Decoder architecture. The interlingual network enables the model to learn a language-independent representation from the semantic spaces of different languages, while still allowing for language-specific specialization of a particular language-pair. Experiments show that our proposed method achieves remarkable improvements over state-of-the-art multilingual NMT baselines and produces comparable performance with strong individual models.

2018

pdf bib
Alibaba’s Neural Machine Translation Systems for WMT18
Yongchao Deng | Shanbo Cheng | Jun Lu | Kai Song | Jingang Wang | Shenglan Wu | Liang Yao | Guchun Zhang | Haibo Zhang | Pei Zhang | Changfeng Zhu | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission systems of Alibaba for WMT18 shared news translation task. We participated in 5 translation directions including English ↔ Russian, English ↔ Turkish in both directions and English → Chinese. Our systems are based on Google’s Transformer model architecture, into which we integrated the most recent features from the academic research. We also employed most techniques that have been proven effective during the past WMT years, such as BPE, back translation, data selection, model ensembling and reranking, at industrial scale. For some morphologically-rich languages, we also incorporated linguistic knowledge into our neural network. For the translation tasks in which we have participated, our resulting systems achieved the best case sensitive BLEU score in all 5 directions. Notably, our English → Russian system outperformed the second reranked system by 5 BLEU score.

2017

pdf bib
Sogou Neural Machine Translation Systems for WMT17
Yuguang Wang | Shanbo Cheng | Liyang Jiang | Jiajun Yang | Wei Chen | Muze Li | Lin Shi | Yanfeng Wang | Hongtao Yang
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
PRIMT: A Pick-Revise Framework for Interactive Machine Translation
Shanbo Cheng | Shujian Huang | Huadong Chen | Xin-Yu Dai | Jiajun Chen
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies