Yun Chen


2022

pdf bib
Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation
Guanhua Chen | Shuming Ma | Yun Chen | Dongdong Zhang | Jia Pan | Wenping Wang | Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper demonstrates that multilingual pretraining and multilingual fine-tuning are both critical for facilitating cross-lingual transfer in zero-shot translation, where the neural machine translation (NMT) model is tested on source languages unseen during supervised training. Following this idea, we present SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages. SixT+ initializes the decoder embedding and the full encoder with XLM-R large and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT+ achieves impressive performance on many-to-English translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT+ offers a set of model parameters that can be further fine-tuned to other unsupervised tasks. We demonstrate that adding SixT+ initialization outperforms state-of-the-art explicitly designed unsupervised NMT models on Si<->En and Ne<->En by over 1.2 average BLEU. When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12.3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

pdf bib
Multitasking Framework for Unsupervised Simple Definition Generation
Cunliang Kong | Yun Chen | Hengyuan Zhang | Liner Yang | Erhong Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The definition generation task can help language learners by providing explanations for unfamiliar words. This task has attracted much attention in recent years. We propose a novel task of Simple Definition Generation (SDG) to help language learners and low literacy readers. A significant challenge of this task is the lack of learner’s dictionaries in many languages, and therefore the lack of data for supervised training. We explore this task and propose a multitasking framework SimpDefiner that only requires a standard dictionary with complex definitions and a corpus containing arbitrary simple texts. We disentangle the complexity factors from the text by carefully designing a parameter sharing scheme between two decoders. By jointly training these components, the framework can generate both complex and simple definitions simultaneously. We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets. Our method outperforms the baseline model by a 1.77 SARI score on the English dataset, and raises the proportion of the low level (HSK level 1-3) words in Chinese definitions by 3.87%.

2021

pdf bib
Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
Guanhua Chen | Shuming Ma | Yun Chen | Li Dong | Dongdong Zhang | Jia Pan | Wenping Wang | Furu Wei
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines.

2020

pdf bib
Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
Zhiyong Wu | Yun Chen | Ben Kao | Qun Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

By introducing a small set of additional parameters, a probe learns to solve specific linguistic tasks (e.g., dependency parsing) in a supervised manner using feature representations (e.g., contextualized embeddings). The effectiveness of such probing tasks is taken as evidence that the pre-trained model encodes linguistic knowledge. However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself. Complementary to those works, we propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT). Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process. Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines. We further feed the empirically induced dependency structures into a downstream sentiment classification task and find its improvement compatible with or even superior to a human-designed dependency schema.

pdf bib
Accurate Word Alignment Induction from Neural Machine Translation
Yun Chen | Yang Liu | Guanhua Chen | Xin Jiang | Qun Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite its original goal to jointly learn to align and translate, prior researches suggest that Transformer captures poor word alignments through its attention mechanism. In this paper, we show that attention weights do capture accurate word alignments and propose two novel word alignment induction methods Shift-Att and Shift-AET. The main idea is to induce alignments at the step when the to-be-aligned target token is the decoder input rather than the decoder output as in previous work. Shift-Att is an interpretation method that induces alignments from the attention weights of Transformer and does not require parameter update or architecture change. Shift-AET extracts alignments from an additional alignment module which is tightly integrated into Transformer and trained in isolation with supervision from symmetrized Shift-Att alignments. Experiments on three publicly available datasets demonstrate that both methods perform better than their corresponding neural baselines and Shift-AET significantly outperforms GIZA++ by 1.4-4.8 AER points.

pdf bib
A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation
Yun Chen | Liangyou Li | Xin Jiang | Xiao Chen | Qun Liu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Despite the success of neural machine translation (NMT), simultaneous neural machine translation (SNMT), the task of translating in real time before a full sentence has been observed, remains challenging due to the syntactic structure difference and simultaneity requirements. In this paper, we propose a general framework for adapting neural machine translation to translate simultaneously. Our framework contains two parts: prefix translation that utilizes a consecutive NMT model to translate source prefixes and a stopping criterion that determines when to stop the prefix translation. Experiments on three translation corpora and two language pairs show the efficacy of the proposed framework on balancing the quality and latency in adapting NMT to perform simultaneous translation.

2018

pdf bib
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Yun Chen | Victor O.K. Li | Kyunghyun Cho | Samuel Bowman
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system.

pdf bib
Meta-Learning for Low-Resource Neural Machine Translation
Jiatao Gu | Yong Wang | Yun Chen | Victor O. K. Li | Kyunghyun Cho
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML, Finn, et al., 2017) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem where we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation (Gu et al., 2018b) to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro,Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach (Zoph et al., 2016) and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT’16 by seeing only 16,000 translated words (~600 parallel sentences)

2017

pdf bib
A Teacher-Student Framework for Zero-Resource Neural Machine Translation
Yun Chen | Yang Liu | Yong Cheng | Victor O.K. Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on the assumption, our method is able to train a source-to-target NMT model (“student”) without parallel corpora available guided by an existing pivot-to-target NMT model (“teacher”) on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs.