Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection

Henny Sluyter-Gäthje, Peter Bourgonje, Manfred Stede


Abstract
Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
Anthology ID:
2020.lrec-1.131
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1044–1050
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.131
DOI:
Bibkey:
Cite (ACL):
Henny Sluyter-Gäthje, Peter Bourgonje, and Manfred Stede. 2020. Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1044–1050, Marseille, France. European Language Resources Association.
Cite (Informal):
Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection (Sluyter-Gäthje et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.131.pdf