The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining

The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining Ting-Rui Chiang author Dani Yogatama author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication chiang-yogatama-2023-distributional 10.18653/v1/2023.emnlp-main.637 https://aclanthology.org/2023.emnlp-main.637/ 2023-12 10305 10321