Rethinking Denoised Auto-Encoding in Language Pre-Training

Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu Sun, Songfang Huang, Fei Huang


Abstract
Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing. These models typically corrupt the given sequences with certain types of noise, such as masking, shuffling, or substitution, and then try to recover the original input. However, such pre-training approaches are prone to learning representations that are covariant with the noise, leading to the discrepancy between the pre-training and fine-tuning stage. To remedy this, we present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations. The proposed CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals. In this way, it not only alleviates the pretrain-finetune discrepancy induced by the noise of pre-training, but also aids the pre-trained model in better capturing global semantics of the input via more effective sentence-level supervision. Different from most prior work that focuses on a particular modality, comprehensive empirical evidence on 11 natural language understanding and cross-modal tasks illustrates that CAPT is applicable for both language and vision-language tasks, and obtains surprisingly consistent improvement, including 0.6% absolute gain on GLUE benchmarks and 0.8% absolute increment on NLVR2.
Anthology ID:
2021.emnlp-main.232
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2922–2932
Language:
URL:
https://aclanthology.org/2021.emnlp-main.232
DOI:
10.18653/v1/2021.emnlp-main.232
Bibkey:
Cite (ACL):
Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu Sun, Songfang Huang, and Fei Huang. 2021. Rethinking Denoised Auto-Encoding in Language Pre-Training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2922–2932, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Rethinking Denoised Auto-Encoding in Language Pre-Training (Luo et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.232.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.232.mp4
Data
GLUEGQAQNLI