Contextual Representation Learning beyond Masked Language Modeling

Zhiyi Fu, Wangchunshu Zhou, Jingjing Xu, Hao Zhou, Lei Li


Abstract
Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due to the pervasiveness, it naturally raises an interesting question: how do masked language models (MLMs) learn contextual representations? In this work, we analyze the learning dynamics of MLMs and find that it adopts sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these problems, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. To be specific, TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over MLM.
Anthology ID:
2022.acl-long.193
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2701–2714
Language:
URL:
https://aclanthology.org/2022.acl-long.193
DOI:
10.18653/v1/2022.acl-long.193
Bibkey:
Cite (ACL):
Zhiyi Fu, Wangchunshu Zhou, Jingjing Xu, Hao Zhou, and Lei Li. 2022. Contextual Representation Learning beyond Masked Language Modeling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2701–2714, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Contextual Representation Learning beyond Masked Language Modeling (Fu et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.193.pdf
Code
 fuzhiyi/taco
Data
CoLAGLUEMRPCMultiNLIQNLISSTSST-2