From News to Medical: Cross-domain Discourse Segmentation

Elisa Ferracane, Titan Page, Junyi Jessy Li, Katrin Erk


Abstract
The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.
Anthology ID:
W19-2704
Volume:
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:
June
Year:
2019
Address:
Minneapolis, MN
Editors:
Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–29
Language:
URL:
https://aclanthology.org/W19-2704
DOI:
10.18653/v1/W19-2704
Bibkey:
Cite (ACL):
Elisa Ferracane, Titan Page, Junyi Jessy Li, and Katrin Erk. 2019. From News to Medical: Cross-domain Discourse Segmentation. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 22–29, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):
From News to Medical: Cross-domain Discourse Segmentation (Ferracane et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2704.pdf
Presentation:
 W19-2704.Presentation.pdf
Code
 elisaF/news-med-segmentation