From News to Medical: Cross-domain Discourse Segmentation

Elisa Ferracane, Titan Page, Junyi Jessy Li, Katrin Erk


Abstract
The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.
Anthology ID:
W19-2704
Volume:
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:
June
Year:
2019
Address:
Minneapolis, MN
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–29
Language:
URL:
https://aclanthology.org/W19-2704
DOI:
10.18653/v1/W19-2704
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W19-2704.pdf
Presentation:
 W19-2704.Presentation.pdf