Turkish Treebanking: Unifying and Constructing Efforts
Utku Türk | Furkan Atmaca | Şaziye Betül Özateş | Abdullatif Köksal | Balkiz Ozturk Basaran | Tunga Gungor | Arzucan Özgür
Proceedings of the 13th Linguistic Annotation Workshop
In this paper, we present the current version of two different treebanks, the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD). The annotation of both treebanks, the Turkish PUD Treebank and TNC-UD, was carried out based on the decisions concerning linguistic adequacy of re-annotation of the Turkish IMST-UD Treebank (Türk et. al., forthcoming). Both of the treebanks were annotated with the same annotation process and morphological and syntactic analyses. The TNC-UD is planned to have 10,000 sentences. In this paper, we will present the first 500 sentences along with the annotation PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both the treebanks, including a full edit-history and the annotation guidelines, and the custom software are publicly available under an open license online.
Improving the Annotations in the Turkish Universal Dependency Treebank
Utku Türk | Furkan Atmaca | Şaziye Betül Özateş | Balkız Öztürk Başaran | Tunga Güngör | Arzucan Özgür
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)
- Utku Türk 2
- Şaziye Betül Özateş 2
- Balkız Öztürk Başaran 2
- Tunga Güngör 2
- Arzucan Özgür 2
- show all...