The Hebrew Universal Dependency Treebank: Past Present and Future

Shoval Sade, Amit Seker, Reut Tsarfaty


Abstract
The Hebrew treebank (HTB), consisting of 6221 morpho-syntactically annotated newspaper sentences, has been the only resource for training and validating statistical parsers and taggers for Hebrew, for almost two decades now. During these decades, the HTB has gone through a trajectory of automatic and semi-automatic conversions, until arriving at its UDv2 form. In this work we manually validate the UDv2 version of the HTB, and, according to our findings, we apply scheme changes that bring the UD HTB to the same theoretical grounds as the rest of UD. Our experimental parsing results with UDv2New confirm that improving the coherence and internal consistency of the UD HTB indeed leads to improved parsing performance. At the same time, our analysis demonstrates that there is more to be done at the point of intersection of UD with other linguistic processing layers, in particular, at the points where UD interfaces external morphological and lexical resources.
Anthology ID:
W18-6016
Volume:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Marie-Catherine de Marneffe, Teresa Lynn, Sebastian Schuster
Venue:
UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
133–143
Language:
URL:
https://aclanthology.org/W18-6016
DOI:
10.18653/v1/W18-6016
Bibkey:
Cite (ACL):
Shoval Sade, Amit Seker, and Reut Tsarfaty. 2018. The Hebrew Universal Dependency Treebank: Past Present and Future. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 133–143, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
The Hebrew Universal Dependency Treebank: Past Present and Future (Sade et al., UDW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6016.pdf