Keqin Peng


2023

pdf bib
Towards Making the Most of ChatGPT for Machine Translation
Keqin Peng | Liang Ding | Qihuang Zhong | Li Shen | Xuebo Liu | Min Zhang | Yuanxin Ouyang | Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2023

ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g, low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this report, we aim to further mine ChatGPT’s translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information further improves ChatGPT’s performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT’s generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation.

pdf bib
Token-Level Self-Evolution Training for Sequence-to-Sequence Learning
Keqin Peng | Liang Ding | Qihuang Zhong | Yuanxin Ouyang | Wenge Rong | Zhang Xiong | Dacheng Tao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Adaptive training approaches, widely used in sequence-to-sequence models, commonly reweigh the losses of different target tokens based on priors, e.g. word frequency. However, most of them do not consider the variation of learning difficulty in different training steps, and overly emphasize the learning of difficult one-hot labels, making the learning deterministic and sub-optimal. In response, we present Token-Level Self-Evolution Training (SE), a simple and effective dynamic training method to fully and wisely exploit the knowledge from data. SE focuses on dynamically learning the under-explored tokens for each forward pass and adaptively regularizes the training by introducing a novel token-specific label smoothing approach. Empirically, SE yields consistent and significant improvements in three tasks, i.e. machine translation, summarization, and grammatical error correction. Encouragingly, we achieve averaging +0.93 BLEU improvement on three machine translation tasks. Analyses confirm that, besides improving lexical accuracy, SE enhances generation diversity and model generalization.

2022

pdf bib
Vega-MT: The JD Explore Academy Machine Translation System for WMT22
Changtong Zan | Keqin Peng | Liang Ding | Baopu Qiu | Boan Liu | Shwai He | Qingyu Lu | Zheng Zhang | Chuang Liu | Weifeng Liu | Yibing Zhan | Dacheng Tao
Proceedings of the Seventh Conference on Machine Translation (WMT)

We describe the JD Explore Academy’s submission of the WMT 2022 shared general translation task. We participated in all high-resource tracks and one medium-resource track, including Chinese-English, German-English, Czech-English, Russian-English, and Japanese-English. We push the limit of our previous work – bidirectional training for translation by scaling up two main factors, i.e. language pairs and model sizes, namely the Vega-MT system. As for language pairs, we scale the “bidirectional” up to the “multidirectional” settings, covering all participating languages, to exploit the common knowledge across languages, and transfer them to the downstream bilingual tasks. As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4.7 Billion parameters, to fully enhance the model capacity for our Vega-MT. Also, we adopt the data augmentation strategies, e.g. cycle translation for monolingual data, and bidirectional self-training for bilingual and monolingual data, to comprehensively exploit the bilingual and monolingual data. To adapt our Vega-MT to the general domain test set, generalization tuning is designed. Based on the official automatic scores of constrained systems, in terms of the sacreBLEU shown in Figure-1, we got the 1st place on Zh-En (33.5), En-Zh (49.7), De-En (33.7), En-De (37.8), Cs-En (54.9), En-Cs (41.4) and En-Ru (32.7), 2nd place on Ru-En (45.1) and Ja-En (25.6), and 3rd place on En-Ja(41.5), respectively; W.R.T the COMET, we got the 1st place on Zh-En (45.1), En-Zh (61.7), De-En (58.0), En-De (63.2), Cs-En (74.7), Ru-En (64.9), En-Ru (69.6) and En-Ja (65.1), 2nd place on En-Cs (95.3) and Ja-En (40.6), respectively. Models will be released to facilitate the MT community through GitHub and OmniForce Platform.