Co-training an Unsupervised Constituency Parser with Weak Supervision

Nickil Maveli; Shay B. Cohen

doi:10.18653/v1/2022.findings-acl.101

Co-training an Unsupervised Constituency Parser with Weak Supervision

Abstract

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F₁ on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.

Anthology ID:: 2022.findings-acl.101
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1274–1291
Language:
URL:: https://aclanthology.org/2022.findings-acl.101
DOI:: 10.18653/v1/2022.findings-acl.101
Bibkey:
Cite (ACL):: Nickil Maveli and Shay Cohen. 2022. Co-training an Unsupervised Constituency Parser with Weak Supervision. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1274–1291, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Co-training an Unsupervised Constituency Parser with Weak Supervision (Maveli & Cohen, Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.101.pdf
Video:: https://aclanthology.org/2022.findings-acl.101.mp4
Code: Nickil21/weakly-supervised-parsing
Data: Chinese Treebank, PTB Diagnostic ECG Database, Penn Treebank

PDF Cite Search Code Video