Robin Schäfer
2019
Multi-lingual and Cross-genre Discourse Unit Segmentation
Peter Bourgonje
|
Robin Schäfer
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
We describe a series of experiments applied to data sets from different languages and genres annotated for coherence relations according to different theoretical frameworks. Specifically, we investigate the feasibility of a unified (theory-neutral) approach toward discourse segmentation; a process which divides a text into minimal discourse units that are involved in s coherence relation. We apply a RandomForest and an LSTM based approach for all data sets, and we improve over a simple baseline assuming simple sentence or clause-like segmentation. Performance however varies a lot depending on language, and more importantly genre, with f-scores ranging from 73.00 to 94.47.
Search