First Steps towards Universal Dependencies for Laz
Utku Türk | Kaan Bayar | Ayşegül Dilara Özercan | Görkem Yiğit Öztürk | Şaziye Betül Özateş
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
This paper presents the first treebank for the Laz language, which is also the first Universal Dependencies Treebank for a South Caucasian language. This treebank aims to create a syntactically and morphologically annotated resource for further research. We also aim to document an endangered language in a systematic fashion within an inherently cross-linguistic framework: the Universal Dependencies Project (UD). As of now, our treebank consists of 576 sentences and 2,306 tokens annotated in light with the UD guidelines. We evaluated the treebank on the dependency parsing task using a pretrained multilingual parsing model, and the results are comparable with other low-resourced treebanks with no training set. We aim to expand our treebank in the near future to include 1,500 sentences. The bigger goal for our project is to create a set of treebanks for minority languages in Anatolia.
Turkish Treebanking: Unifying and Constructing Efforts
Utku Türk | Furkan Atmaca | Şaziye Betül Özateş | Abdullatif Köksal | Balkiz Ozturk Basaran | Tunga Gungor | Arzucan Özgür
Proceedings of the 13th Linguistic Annotation Workshop
In this paper, we present the current version of two different treebanks, the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD). The annotation of both treebanks, the Turkish PUD Treebank and TNC-UD, was carried out based on the decisions concerning linguistic adequacy of re-annotation of the Turkish IMST-UD Treebank (Türk et. al., forthcoming). Both of the treebanks were annotated with the same annotation process and morphological and syntactic analyses. The TNC-UD is planned to have 10,000 sentences. In this paper, we will present the first 500 sentences along with the annotation PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both the treebanks, including a full edit-history and the annotation guidelines, and the custom software are publicly available under an open license online.
Improving the Annotations in the Turkish Universal Dependency Treebank
Utku Türk | Furkan Atmaca | Şaziye Betül Özateş | Balkız Öztürk Başaran | Tunga Güngör | Arzucan Özgür
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)
- Şaziye Betül Özateş 3
- Furkan Atmaca 2
- Balkız Öztürk Başaran 2
- Tunga Güngör 2
- Arzucan Özgür 2
- show all...