Identifying and Handling Cross-Treebank Inconsistencies in UD: A Pilot Study

Tillmann Dönicke, Xiang Yu, Jonas Kuhn


Abstract
The Universal Dependencies treebanks are a still-growing collection of treebanks for a wide range of languages, all annotated with a common inventory of dependency relations. Yet, the usages of the relations can be categorically different even for treebanks of the same language. We present a pilot study on identifying such inconsistencies in a language-independent way and conduct an experiment which illustrates that a proper handling of inconsistencies can improve parsing performance by several percentage points.
Anthology ID:
2020.udw-1.8
Volume:
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marie-Catherine de Marneffe, Miryam de Lhoneux, Joakim Nivre, Sebastian Schuster
Venue:
UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–75
Language:
URL:
https://aclanthology.org/2020.udw-1.8
DOI:
Bibkey:
Cite (ACL):
Tillmann Dönicke, Xiang Yu, and Jonas Kuhn. 2020. Identifying and Handling Cross-Treebank Inconsistencies in UD: A Pilot Study. In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), pages 67–75, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Identifying and Handling Cross-Treebank Inconsistencies in UD: A Pilot Study (Dönicke et al., UDW 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.udw-1.8.pdf
Code
 tidoe/typology-coling