EDTC: A Corpus for Discourse-Level Topic Chain Parsing

Longyin Zhang, Xin Tan, Fang Kong, Guodong Zhou


Abstract
Discourse analysis has long been known to be fundamental in natural language processing. In this research, we present our insight on discourse-level topic chain (DTC) parsing which aims at discovering new topics and investigating how these topics evolve over time within an article. To address the lack of data, we contribute a new discourse corpus with DTC-style dependency graphs annotated upon news articles. In particular, we ensure the high reliability of the corpus by utilizing a two-step annotation strategy to build the data and filtering out the annotations with low confidence scores. Based on the annotated corpus, we introduce a simple yet robust system for automatic discourse-level topic chain parsing.
Anthology ID:
2021.findings-emnlp.113
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1304–1312
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.113
DOI:
10.18653/v1/2021.findings-emnlp.113
Bibkey:
Cite (ACL):
Longyin Zhang, Xin Tan, Fang Kong, and Guodong Zhou. 2021. EDTC: A Corpus for Discourse-Level Topic Chain Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1304–1312, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
EDTC: A Corpus for Discourse-Level Topic Chain Parsing (Zhang et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.113.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.113.mp4
Code
 nlp-discourse-soochowu/dtcp