An Empirical Study of Compound PCFGs

Yanpeng Zhao, Ivan Titov


Abstract
Compound probabilistic context-free grammars (C-PCFGs) have recently established a new state of the art for phrase-structure grammar induction. However, due to the high time-complexity of chart-based representation and inference, it is difficult to investigate them comprehensively. In this work, we rely on a fast implementation of C-PCFGs to conduct evaluation complementary to that of (CITATION). We highlight three key findings: (1) C-PCFGs are data-efficient, (2) C-PCFGs make the best use of global sentence-level information in preterminal rule probabilities, and (3) the best configurations of C-PCFGs on English do not always generalize to morphology-rich languages.
Anthology ID:
2021.adaptnlp-1.17
Volume:
Proceedings of the Second Workshop on Domain Adaptation for NLP
Month:
April
Year:
2021
Address:
Kyiv, Ukraine
Editors:
Eyal Ben-David, Shay Cohen, Ryan McDonald, Barbara Plank, Roi Reichart, Guy Rotman, Yftah Ziser
Venue:
AdaptNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–171
Language:
URL:
https://aclanthology.org/2021.adaptnlp-1.17
DOI:
Bibkey:
Cite (ACL):
Yanpeng Zhao and Ivan Titov. 2021. An Empirical Study of Compound PCFGs. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 166–171, Kyiv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
An Empirical Study of Compound PCFGs (Zhao & Titov, AdaptNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.adaptnlp-1.17.pdf
Code
 zhaoyanpeng/cpcfg +  additional community code
Data
English Web Treebank