Pre-Training Transformers as Energy-Based Cloze Models

Kevin Clark, Minh-Thang Luong, Quoc Le, Christopher D. Manning


Abstract
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
Anthology ID:
2020.emnlp-main.20
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
285–294
Language:
URL:
https://aclanthology.org/2020.emnlp-main.20
DOI:
10.18653/v1/2020.emnlp-main.20
Bibkey:
Cite (ACL):
Kevin Clark, Minh-Thang Luong, Quoc Le, and Christopher D. Manning. 2020. Pre-Training Transformers as Energy-Based Cloze Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 285–294, Online. Association for Computational Linguistics.
Cite (Informal):
Pre-Training Transformers as Energy-Based Cloze Models (Clark et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.20.pdf
Video:
 https://slideslive.com/38939095
Code
 google-research/electra
Data
GLUELibriSpeechOpenWebTextWebText