ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents

Philippe Muller; Chloé Braud; Mathieu Morey

doi:10.18653/v1/W19-2715

ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents

Philippe Muller, Chloé Braud, Mathieu Morey

Abstract

Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to English newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.

Anthology ID:: W19-2715
Volume:: Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:: June
Year:: 2019
Address:: Minneapolis, MN
Editors:: Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 115–124
Language:
URL:: https://aclanthology.org/W19-2715/
DOI:: 10.18653/v1/W19-2715
Bibkey:
Cite (ACL):: Philippe Muller, Chloé Braud, and Mathieu Morey. 2019. ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 115–124, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):: ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents (Muller et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2715.pdf
Poster:: W19-2715.Poster.pdf

PDF Cite Search Poster Fix data