Shanbo Cheng


pdf bib
Only 5% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Zihan Liu | Zewei Sun | Shanbo Cheng | Shujian Huang | Mingxuan Wang
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Controlling Styles in Neural Machine Translation with Activation Prompt
Yifan Wang | Zewei Sun | Shanbo Cheng | Weiguo Zheng | Mingxuan Wang
Findings of the Association for Computational Linguistics: ACL 2023

Controlling styles in neural machine translation (NMT) has attracted wide attention, as it is crucial for enhancing user experience. Earlier studies on this topic typically concentrate on regulating the level of formality and achieve some progress in this area. However, they still encounter two major challenges. The first is the difficulty in style evaluation. The style comprises various aspects such as lexis, syntax, and others that provide abundant information. Nevertheless, only formality has been thoroughly investigated. The second challenge involves excessive dependence on incremental adjustments, particularly when new styles are necessary. To address both challenges, this paper presents a new benchmark and approach. A multiway stylized machine translation (MSMT) benchmark is introduced, incorporating diverse categories of styles across four linguistic domains. Then, we propose a method named style activation prompt (StyleAP) by retrieving prompts from stylized monolingual corpus, which does not require extra fine-tuning. Experiments show that StyleAP could effectively control the style of translation and achieve remarkable performance.

pdf bib
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu | Zewei Sun | Shanbo Cheng | Luyang Huang | Liwei Wu | Mingxuan Wang
Findings of the Association for Computational Linguistics: ACL 2023

Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems focus on better access and use of visual information and tend to validate their methods on image-related datasets. However, these studies face two challenges. First, they can only utilize a limited amount of data that is composed of bilingual texts and images (referred to as “triple data”), which is scarce. Second, current benchmarks for MMT are restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and a new dataset for MMT. We propose a novel framework for MMT that addresses these challenges by utilizing large-scale non-triple data, such as monolingual image-text and parallel text-only data. Additionally, we construct a new e-commercial multimodal translation dataset, named EMMT, of which the test set is specifically designed to include ambiguous words that require visual context for accurate translation. Experiments show that our method is well-suited for real-world scenarios and can significantly improve translation performance with more non-triple data. In addition, our model also rivals or surpasses various SOTA models in conventional multimodal translation benchmarks.

pdf bib
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Liyan Kang | Luyang Huang | Ningxin Peng | Peihao Zhu | Zewei Sun | Shanbo Cheng | Mingxuan Wang | Degen Huang | Jinsong Su
Findings of the Association for Computational Linguistics: ACL 2023

We present a large-scale video subtitle translation dataset, *BigVideo*, to facilitate the study of multi-modality machine translation. Compared with the widely used *How2* and *VaTeX* datasets, *BigVideo* is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: *Ambiguous* with the presence of ambiguous words, and *Unambiguous* in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the *BigVideo* shows that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation.


pdf bib
Learning Kernel-Smoothed Machine Translation with Retrieved Examples
Qingnan Jiang | Mingxuan Wang | Jun Cao | Shanbo Cheng | Shujian Huang | Lei Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

How to effectively adapt neural machine translation (NMT) models according to emerging cases without retraining? Despite the great success of neural machine translation, updating the deployed models online remains a challenge. Existing non-parametric approaches that retrieve similar examples from a database to guide the translation process are promising but are prone to overfit the retrieved examples. However, non-parametric methods are prone to overfit the retrieved examples. In this work, we propose to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online. Experiments on domain adaptation and multi-domain machine translation datasets show that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. The code and trained models are released at

pdf bib
The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
Lihua Qian | Yi Zhou | Zaixiang Zheng | Yaoming Zhu | Zehui Lin | Jiangtao Feng | Shanbo Cheng | Lei Li | Mingxuan Wang | Hao Zhou
Proceedings of the Sixth Conference on Machine Translation

This paper describes the Volctrans’ submission to the WMT21 news translation shared task for German->English translation. We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer, which enables fast and accurate parallel decoding in contrast to the currently prevailing autoregressive models. To the best of our knowledge, this is the first parallel translation system that can be scaled to such a practical scenario like WMT competition. More importantly, our parallel translation system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.

pdf bib
Language Tags Matter for Zero-Shot Neural Machine Translation
Liwei Wu | Shanbo Cheng | Mingxuan Wang | Lei Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
Language-aware Interlingua for Multilingual Neural Machine Translation
Changfeng Zhu | Heng Yu | Shanbo Cheng | Weihua Luo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multilingual neural machine translation (NMT) has led to impressive accuracy improvements in low-resource scenarios by sharing common linguistic information across languages. However, the traditional multilingual model fails to capture the diversity and specificity of different languages, resulting in inferior performance compared with individual models that are sufficiently trained. In this paper, we incorporate a language-aware interlingua into the Encoder-Decoder architecture. The interlingual network enables the model to learn a language-independent representation from the semantic spaces of different languages, while still allowing for language-specific specialization of a particular language-pair. Experiments show that our proposed method achieves remarkable improvements over state-of-the-art multilingual NMT baselines and produces comparable performance with strong individual models.


pdf bib
Alibaba’s Neural Machine Translation Systems for WMT18
Yongchao Deng | Shanbo Cheng | Jun Lu | Kai Song | Jingang Wang | Shenglan Wu | Liang Yao | Guchun Zhang | Haibo Zhang | Pei Zhang | Changfeng Zhu | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission systems of Alibaba for WMT18 shared news translation task. We participated in 5 translation directions including English ↔ Russian, English ↔ Turkish in both directions and English → Chinese. Our systems are based on Google’s Transformer model architecture, into which we integrated the most recent features from the academic research. We also employed most techniques that have been proven effective during the past WMT years, such as BPE, back translation, data selection, model ensembling and reranking, at industrial scale. For some morphologically-rich languages, we also incorporated linguistic knowledge into our neural network. For the translation tasks in which we have participated, our resulting systems achieved the best case sensitive BLEU score in all 5 directions. Notably, our English → Russian system outperformed the second reranked system by 5 BLEU score.


pdf bib
Sogou Neural Machine Translation Systems for WMT17
Yuguang Wang | Shanbo Cheng | Liyang Jiang | Jiajun Yang | Wei Chen | Muze Li | Lin Shi | Yanfeng Wang | Hongtao Yang
Proceedings of the Second Conference on Machine Translation


pdf bib
PRIMT: A Pick-Revise Framework for Interactive Machine Translation
Shanbo Cheng | Shujian Huang | Huadong Chen | Xin-Yu Dai | Jiajun Chen
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies