Turkish Treebanking: Unifying and Constructing Efforts

Utku Türk; Furkan Atmaca; Şaziye Betül Özateş; Abdullatif Köksal; Balkız Öztürk Başaran; Tunga Gungor; Arzucan Özgür

doi:10.18653/v1/W19-4019

Turkish Treebanking: Unifying and Constructing Efforts

Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Abdullatif Köksal, Balkiz Ozturk Basaran, Tunga Gungor, Arzucan Özgür

Abstract

In this paper, we present the current version of two different treebanks, the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD). The annotation of both treebanks, the Turkish PUD Treebank and TNC-UD, was carried out based on the decisions concerning linguistic adequacy of re-annotation of the Turkish IMST-UD Treebank (Türk et. al., forthcoming). Both of the treebanks were annotated with the same annotation process and morphological and syntactic analyses. The TNC-UD is planned to have 10,000 sentences. In this paper, we will present the first 500 sentences along with the annotation PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both the treebanks, including a full edit-history and the annotation guidelines, and the custom software are publicly available under an open license online.

Anthology ID:: W19-4019
Volume:: Proceedings of the 13th Linguistic Annotation Workshop
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Annemarie Friedrich, Deniz Zeyrek, Jet Hoek
Venue:: LAW
SIG:: SIGANN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 166–177
Language:
URL:: https://aclanthology.org/W19-4019/
DOI:: 10.18653/v1/W19-4019
Bibkey:
Cite (ACL):: Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Abdullatif Köksal, Balkiz Ozturk Basaran, Tunga Gungor, and Arzucan Özgür. 2019. Turkish Treebanking: Unifying and Constructing Efforts. In Proceedings of the 13th Linguistic Annotation Workshop, pages 166–177, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Turkish Treebanking: Unifying and Constructing Efforts (Türk et al., LAW 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-4019.pdf

PDF Cite Search Fix data