Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz


Abstract
There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model. Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against unbounded induction within the same system, in part because most previous depth-bounding models are built around sequence models, the complexity of which grows exponentially with the maximum allowed depth. The present work instead applies depth bounds within a chart-based Bayesian PCFG inducer, where bounding can be switched on and off, and then samples trees with or without bounding. Results show that depth-bounding is indeed significantly effective in limiting the search space of the inducer and thereby increasing accuracy of resulting parsing model, independent of the contribution of modern Bayesian induction techniques. Moreover, parsing results on English, Chinese and German show that this bounded model is able to produce parse trees more accurately than or competitively with state-of-the-art constituency grammar induction models.
Anthology ID:
D18-1292
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2721–2731
Language:
URL:
https://aclanthology.org/D18-1292
DOI:
10.18653/v1/D18-1292
Bibkey:
Cite (ACL):
Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, and Lane Schwartz. 2018. Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2721–2731, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction (Jin et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1292.pdf
Code
 lifengjin/dimi_emnlp18
Data
Penn Treebank