Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank

Kai Zhao, Liang Huang


Abstract
Discourse parsing has long been treated as a stand-alone problem independent from constituency or dependency parsing. Most attempts at this problem rely on annotated text segmentations (Elementary Discourse Units, EDUs) and sophisticated sparse or continuous features to extract syntactic information. In this paper we propose the first end-to-end discourse parser that jointly parses in both syntax and discourse levels, as well as the first syntacto-discourse treebank by integrating the Penn Treebank and the RST Treebank. Built upon our recent span-based constituency parser, this joint syntacto-discourse parser requires no preprocessing efforts such as segmentation or feature extraction, making discourse parsing more convenient. Empirically, our parser achieves the state-of-the-art end-to-end discourse parsing accuracy.
Anthology ID:
D17-1225
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2117–2123
Language:
URL:
https://aclanthology.org/D17-1225/
DOI:
10.18653/v1/D17-1225
Bibkey:
Cite (ACL):
Kai Zhao and Liang Huang. 2017. Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2117–2123, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank (Zhao & Huang, EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1225.pdf
Code
 kaayy/josydipa
Data
Penn Treebank