GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Jian Yang; Shuming Ma; Li Dong; Shaohan Huang; Haoyang Huang; Yuwei Yin; Dongdong Zhang; Liqun Yang; Furu Wei; Zhoujun Li

doi:10.18653/v1/2023.acl-long.522

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li

Abstract

Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold sentence. In general, both tasks improve the ability of language understanding and generation by selectively using the denoising data. Extensive experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models (PLMs) and achieves state-of-the-art performance.

Anthology ID:: 2023.acl-long.522
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9394–9412
Language:
URL:: https://aclanthology.org/2023.acl-long.522
DOI:: 10.18653/v1/2023.acl-long.522
Bibkey:
Cite (ACL):: Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, and Zhoujun Li. 2023. GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9394–9412, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator (Yang et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.522.pdf
Video:: https://aclanthology.org/2023.acl-long.522.mp4

PDF Cite Search Video