A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German

Teresa Bürkle; Stefan Grünewald; Annemarie Friedrich

doi:10.18653/v1/2021.law-1.9

A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German

Teresa Bürkle, Stefan Grünewald, Annemarie Friedrich

Abstract

In this paper, we present a first attempt at enriching German Universal Dependencies (UD) treebanks with enhanced dependencies. Similarly to the converter for English (Schuster and Manning, 2016), we develop a rule-based system for deriving enhanced dependencies from the basic layer, covering three linguistic phenomena: relative clauses, coordination, and raising/control. For quality control, we manually correct or validate a set of 196 sentences, finding that around 90% of added relations are correct. Our data analysis reveals that difficulties arise mainly due to inconsistencies in the basic layer annotations. We show that the English system is in general applicable to German data, but that adapting to the particularities of the German treebanks and language increases precision and recall by up to 10%. Comparing the application of our converter on gold standard dependencies vs. automatic parses, we find that F1 drops by around 10% in the latter setting due to error propagation. Finally, an enhanced UD parser trained on a converted treebank performs poorly when evaluated against our annotations, indicating that more work remains to be done to create gold standard enhanced German treebanks.

Anthology ID:: 2021.law-1.9
Volume:: Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Claire Bonial, Nianwen Xue
Venues:: LAW | DMR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 85–95
Language:
URL:: https://aclanthology.org/2021.law-1.9/
DOI:: 10.18653/v1/2021.law-1.9
Bibkey:
Cite (ACL):: Teresa Bürkle, Stefan Grünewald, and Annemarie Friedrich. 2021. A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 85–95, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German (Bürkle et al., LAW-DMR 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.law-1.9.pdf

PDF Cite Search Fix data