First Steps towards Universal Dependencies for Laz

Utku Türk, Kaan Bayar, Ayşegül Dilara Özercan, Görkem Yiğit Öztürk, Şaziye Betül Özateş


Abstract
This paper presents the first treebank for the Laz language, which is also the first Universal Dependencies Treebank for a South Caucasian language. This treebank aims to create a syntactically and morphologically annotated resource for further research. We also aim to document an endangered language in a systematic fashion within an inherently cross-linguistic framework: the Universal Dependencies Project (UD). As of now, our treebank consists of 576 sentences and 2,306 tokens annotated in light with the UD guidelines. We evaluated the treebank on the dependency parsing task using a pretrained multilingual parsing model, and the results are comparable with other low-resourced treebanks with no training set. We aim to expand our treebank in the near future to include 1,500 sentences. The bigger goal for our project is to create a set of treebanks for minority languages in Anatolia.
Anthology ID:
2020.udw-1.21
Volume:
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
189–194
Language:
URL:
https://aclanthology.org/2020.udw-1.21
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.udw-1.21.pdf
Data
Universal Dependencies