Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller


Abstract
Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.
Anthology ID:
2020.aacl-main.43
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Kam-Fai Wong, Kevin Knight, Hua Wu
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
409–424
Language:
URL:
https://aclanthology.org/2020.aacl-main.43
DOI:
Bibkey:
Cite (ACL):
Bowen Li, Taeuk Kim, Reinald Kim Amplayo, and Frank Keller. 2020. Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 409–424, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads (Li et al., AACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aacl-main.43.pdf