Probe-Less Probing of BERT’s Layer-Wise Linguistic Knowledge with Masked Word Prediction

Tatsuya Aoyama, Nathan Schneider


Abstract
The current study quantitatively (and qualitatively for an illustrative purpose) analyzes BERT’s layer-wise masked word prediction on an English corpus, and finds that (1) the layerwise localization of linguistic knowledge primarily shown in probing studies is replicated in a behavior-based design and (2) that syntactic and semantic information is encoded at different layers for words of different syntactic categories. Hypothesizing that the above results are correlated with the number of likely potential candidates of the masked word prediction, we also investigate how the results differ for tokens within multiword expressions.
Anthology ID:
2022.naacl-srw.25
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Month:
July
Year:
2022
Address:
Hybrid: Seattle, Washington + Online
Editors:
Daphne Ippolito, Liunian Harold Li, Maria Leonor Pacheco, Danqi Chen, Nianwen Xue
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
195–201
Language:
URL:
https://aclanthology.org/2022.naacl-srw.25
DOI:
10.18653/v1/2022.naacl-srw.25
Bibkey:
Cite (ACL):
Tatsuya Aoyama and Nathan Schneider. 2022. Probe-Less Probing of BERT’s Layer-Wise Linguistic Knowledge with Masked Word Prediction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 195–201, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
Cite (Informal):
Probe-Less Probing of BERT’s Layer-Wise Linguistic Knowledge with Masked Word Prediction (Aoyama & Schneider, NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-srw.25.pdf
Video:
 https://aclanthology.org/2022.naacl-srw.25.mp4
Data
STREUSLE