On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

Tianyi Zhang, Tatsunori B. Hashimoto


Abstract
We study how masking and predicting tokens in an unsupervised fashion can give rise to linguistic structures and downstream performance gains. Recent theories have suggested that pretrained language models acquire useful inductive biases through masks that implicitly act as cloze reductions for downstream tasks. While appealing, we show that the success of the random masking strategy used in practice cannot be explained by such cloze-like masks alone. We construct cloze-like masks using task-specific lexicons for three different classification datasets and show that the majority of pretrained performance gains come from generic masks that are not associated with the lexicon. To explain the empirical success of these generic masks, we demonstrate a correspondence between the Masked Language Model (MLM) objective and existing methods for learning statistical dependencies in graphical models. Using this, we derive a method for extracting these learned statistical dependencies in MLMs and show that these dependencies encode useful inductive biases in the form of syntactic structures. In an unsupervised parsing evaluation, simply forming a minimum spanning tree on the implied statistical dependence structure outperforms a classic method for unsupervised parsing (58.74 vs. 55.91 UUAS).
Anthology ID:
2021.naacl-main.404
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5131–5146
Language:
URL:
https://aclanthology.org/2021.naacl-main.404
DOI:
10.18653/v1/2021.naacl-main.404
Bibkey:
Cite (ACL):
Tianyi Zhang and Tatsunori B. Hashimoto. 2021. On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5131–5146, Online. Association for Computational Linguistics.
Cite (Informal):
On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies (Zhang & Hashimoto, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.404.pdf
Video:
 https://aclanthology.org/2021.naacl-main.404.mp4
Code
 tatsu-lab/mlm_inductive_bias
Data
SSTSST-2