Yongjing Yin


2022

pdf bib
Prompt-Driven Neural Machine Translation
Yafu Li | Yongjing Yin | Jing Li | Yue Zhang
Findings of the Association for Computational Linguistics: ACL 2022

Neural machine translation (NMT) has obtained significant performance improvement over the recent years. However, NMT models still face various challenges including fragility and lack of style flexibility. Moreover, current methods for instance-level constraints are limited in that they are either constraint-specific or model-specific. To this end, we propose prompt-driven neural machine translation to incorporate prompts for enhancing translation control and enriching flexibility. Empirical results demonstrate the effectiveness of our method in both prompt responding and translation quality. Through human evaluation, we further show the flexibility of prompt control and the efficiency in human-in-the-loop translation.

pdf bib
Task-guided Disentangled Tuning for Pretrained Language Models
Jiali Zeng | Yufan Jiang | Shuangzhi Wu | Yongjing Yin | Mu Li
Findings of the Association for Computational Linguistics: ACL 2022

Pretrained language models (PLMs) trained on large-scale unlabeled corpus are typically fine-tuned on task-specific downstream datasets, which have produced state-of-the-art results on various NLP tasks. However, the data discrepancy issue in domain and scale makes fine-tuning fail to efficiently capture task-specific patterns, especially in low data regime. To address this issue, we propose Task-guided Disentangled Tuning (TDT) for PLMs, which enhances the generalization of representations by disentangling task-relevant signals from the entangled representations. For a given task, we introduce a learnable confidence model to detect indicative guidance from context, and further propose a disentangled regularization to mitigate the over-reliance problem. Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs, and extensive analysis demonstrates the effectiveness and robustness of our method. Code is available at https://github.com/lemon0830/TDT.

2021

pdf bib
Recurrent Attention for Neural Machine Translation
Jiali Zeng | Shuangzhi Wu | Yongjing Yin | Yufan Jiang | Mu Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.

pdf bib
On Compositional Generalization of Neural Machine Translation
Yafu Li | Yongjing Yin | Yulong Chen | Yue Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Modern neural machine translation (NMT) models have achieved competitive performance in standard benchmarks such as WMT. However, there still exist significant issues such as robustness, domain generalization, etc. In this paper, we study NMT models from the perspective of compositional generalization by building a benchmark dataset, CoGnition, consisting of 216k clean and consistent sentence pairs. We quantitatively analyze effects of various factors using compound translation error rate, then demonstrate that the NMT model fails badly on compositional generalization, although it performs remarkably well under traditional metrics.

2020

pdf bib
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin | Fandong Meng | Jinsong Su | Chulun Zhou | Zhengyuan Yang | Jie Zhou | Jiebo Luo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multi-modal neural machine translation (NMT) aims to translate source sentences into a target language paired with images. However, dominant multi-modal NMT models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have potential to refine multi-modal representation learning. To deal with this issue, in this paper, we propose a novel graph-based multi-modal fusion encoder for NMT. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, these representations provide an attention-based context vector for the decoder. We evaluate our proposed encoder on the Multi30K datasets. Experimental results and in-depth analysis show the superiority of our multi-modal NMT model.

2019

pdf bib
Iterative Dual Domain Adaptation for Neural Machine Translation
Jiali Zeng | Yang Liu | Jinsong Su | Yubing Ge | Yaojie Lu | Yongjing Yin | Jiebo Luo
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Previous studies on the domain adaptation for neural machine translation (NMT) mainly focus on the one-pass transferring out-of-domain translation knowledge to in-domain NMT model. In this paper, we argue that such a strategy fails to fully extract the domain-shared translation knowledge, and repeatedly utilizing corpora of different domains can lead to better distillation of domain-shared translation knowledge. To this end, we propose an iterative dual domain adaptation framework for NMT. Specifically, we first pretrain in-domain and out-of-domain NMT models using their own training corpora respectively, and then iteratively perform bidirectional translation knowledge transfer (from in-domain to out-of-domain and then vice versa) based on knowledge distillation until the in-domain NMT model convergences. Furthermore, we extend the proposed framework to the scenario of multiple out-of-domain training corpora, where the above-mentioned transfer is performed sequentially between the in-domain and each out-of-domain NMT models in the ascending order of their domain similarities. Empirical results on Chinese-English and English-German translation tasks demonstrate the effectiveness of our framework.

2018

pdf bib
Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination
Jiali Zeng | Jinsong Su | Huating Wen | Yang Liu | Jun Xie | Yongjing Yin | Jianqiang Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

With great practical value, the study of Multi-domain Neural Machine Translation (NMT) mainly focuses on using mixed-domain parallel sentences to construct a unified model that allows translation to switch between different domains. Intuitively, words in a sentence are related to its domain to varying degrees, so that they will exert disparate impacts on the multi-domain NMT modeling. Based on this intuition, in this paper, we devote to distinguishing and exploiting word-level domain contexts for multi-domain NMT. To this end, we jointly model NMT with monolingual attention-based domain classification tasks and improve NMT as follows: 1) Based on the sentence representations produced by a domain classifier and an adversarial domain classifier, we generate two gating vectors and use them to construct domain-specific and domain-shared annotations, for later translation predictions via different attention models; 2) We utilize the attention weights derived from target-side domain classifier to adjust the weights of target words in the training objective, enabling domain-related words to have greater impacts during model training. Experimental results on Chinese-English and English-French multi-domain translation tasks demonstrate the effectiveness of the proposed model. Source codes of this paper are available on Github https://github.com/DeepLearnXMU/WDCNMT.