%0 Conference Proceedings %T Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation %A Chen, Guanhua %A Ma, Shuming %A Chen, Yun %A Zhang, Dongdong %A Pan, Jia %A Wang, Wenping %A Wei, Furu %Y Muresan, Smaranda %Y Nakov, Preslav %Y Villavicencio, Aline %S Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) %D 2022 %8 May %I Association for Computational Linguistics %C Dublin, Ireland %F chen-etal-2022-towards %X This paper demonstrates that multilingual pretraining and multilingual fine-tuning are both critical for facilitating cross-lingual transfer in zero-shot translation, where the neural machine translation (NMT) model is tested on source languages unseen during supervised training. Following this idea, we present SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages. SixT+ initializes the decoder embedding and the full encoder with XLM-R large and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT+ achieves impressive performance on many-to-English translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT+ offers a set of model parameters that can be further fine-tuned to other unsupervised tasks. We demonstrate that adding SixT+ initialization outperforms state-of-the-art explicitly designed unsupervised NMT models on Si\textless-\textgreaterEn and Ne\textless-\textgreaterEn by over 1.2 average BLEU. When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12.3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder. %R 10.18653/v1/2022.acl-long.12 %U https://aclanthology.org/2022.acl-long.12 %U https://doi.org/10.18653/v1/2022.acl-long.12 %P 142-157