Phrase-aware Unsupervised Constituency Parsing

Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han


Abstract
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task. Despite their high accuracy in identifying low-level structures, prior arts tend to struggle in capturing high-level structures like clauses, since the MLM task usually only requires information from local context. In this work, we revisit LM-based constituency parsing from a phrase-centered perspective. Inspired by the natural reading process of human, we propose to regularize the parser with phrases extracted by an unsupervised phrase tagger to help the LM model quickly manage low-level structures. For a better understanding of high-level structures, we propose a phrase-guided masking strategy for LM to emphasize more on reconstructing non-phrase words. We show that the initial phrase regularization serves as an effective bootstrap, and phrase-guided masking improves the identification of high-level structures. Experiments on the public benchmark with two different backbone models demonstrate the effectiveness and generality of our method.
Anthology ID:
2022.acl-long.444
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6406–6415
Language:
URL:
https://aclanthology.org/2022.acl-long.444
DOI:
10.18653/v1/2022.acl-long.444
Bibkey:
Cite (ACL):
Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, and Jiawei Han. 2022. Phrase-aware Unsupervised Constituency Parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6406–6415, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Phrase-aware Unsupervised Constituency Parsing (Gu et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.444.pdf