N-ary Constituent Tree Parsing with Recursive Semi-Markov Model

Xin Xin, Jinlong Li, Zeqi Tan


Abstract
In this paper, we study the task of graph-based constituent parsing in the setting that binarization is not conducted as a pre-processing step, where a constituent tree may consist of nodes with more than two children. Previous graph-based methods on this setting typically generate hidden nodes with the dummy label inside the n-ary nodes, in order to transform the tree into a binary tree for prediction. The limitation is that the hidden nodes break the sibling relations of the n-ary node’s children. Consequently, the dependencies of such sibling constituents might not be accurately modeled and is being ignored. To solve this limitation, we propose a novel graph-based framework, which is called “recursive semi-Markov model”. The main idea is to utilize 1-order semi-Markov model to predict the immediate children sequence of a constituent candidate, which then recursively serves as a child candidate of its parent. In this manner, the dependencies of sibling constituents can be described by 1-order transition features, which solves the above limitation. Through experiments, the proposed framework obtains the F1 of 95.92% and 92.50% on the datasets of PTB and CTB 5.1 respectively. Specially, the recursive semi-Markov model shows advantages in modeling nodes with more than two children, whose average F1 can be improved by 0.3-1.1 points in PTB and 2.3-6.8 points in CTB 5.1.
Anthology ID:
2021.acl-long.205
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2631–2642
Language:
URL:
https://aclanthology.org/2021.acl-long.205
DOI:
10.18653/v1/2021.acl-long.205
Bibkey:
Cite (ACL):
Xin Xin, Jinlong Li, and Zeqi Tan. 2021. N-ary Constituent Tree Parsing with Recursive Semi-Markov Model. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2631–2642, Online. Association for Computational Linguistics.
Cite (Informal):
N-ary Constituent Tree Parsing with Recursive Semi-Markov Model (Xin et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.205.pdf
Optional supplementary material:
 2021.acl-long.205.OptionalSupplementaryMaterial.zip
Video:
 https://aclanthology.org/2021.acl-long.205.mp4
Code
 NP-NET-research/Recursive-Semi-Markov-Model
Data
Penn Treebank