Does BERT Rediscover a Classical NLP Pipeline?

Jingcheng Niu, Wenjie Lu, Gerald Penn


Abstract
Does BERT store surface knowledge in its bottom layers, syntactic knowledge in its middle layers, and semantic knowledge in its upper layers? In re-examining Jawahar et al. (2019) and Tenney et al.’s (2019a) probes into the structure of BERT, we have found that the pipeline-like separation that they asserted lacks conclusive empirical support. BERT’s structure is, however, linguistically founded, although perhaps in a way that is more nuanced than can be explained by layers alone. We introduce a novel probe, called GridLoc, through which we can also take into account token positions, training rounds, and random seeds. Using GridLoc, we are able to detect other, stronger regularities that suggest that pseudo-cognitive appeals to layer depth may not be the preferable mode of explanation for BERT’s inner workings.
Anthology ID:
2022.coling-1.278
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3143–3153
Language:
URL:
https://aclanthology.org/2022.coling-1.278
DOI:
Bibkey:
Cite (ACL):
Jingcheng Niu, Wenjie Lu, and Gerald Penn. 2022. Does BERT Rediscover a Classical NLP Pipeline?. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3143–3153, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Does BERT Rediscover a Classical NLP Pipeline? (Niu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.278.pdf
Code
 frankniujc/gridloc_probe
Data
SentEval