Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Yi Liao, Xin Jiang, Qun Liu


Abstract
Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a bunch of downstream NLU tasks.
Anthology ID:
2020.acl-main.24
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
263–274
Language:
URL:
https://aclanthology.org/2020.acl-main.24
DOI:
10.18653/v1/2020.acl-main.24
Bibkey:
Cite (ACL):
Yi Liao, Xin Jiang, and Qun Liu. 2020. Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 263–274, Online. Association for Computational Linguistics.
Cite (Informal):
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order (Liao et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.24.pdf
Video:
 http://slideslive.com/38929449
Code
 huawei-noah/Pretrained-Language-Model +  additional community code
Data
GLUE