On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, Dacheng Tao


Abstract
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (some- times, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT’17 English-Chinese (20M) and WMT’19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at: https://github.com/zanchangtong/PTvsRI.
Anthology ID:
2022.coling-1.445
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5029–5034
Language:
URL:
https://aclanthology.org/2022.coling-1.445
DOI:
Bibkey:
Cite (ACL):
Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, and Dacheng Tao. 2022. On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5029–5034, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation (Zan et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.445.pdf
Code
 zanchangtong/ptvsri