A Gold Standard Dependency Treebank for Turkish

Tolga Kayadelen, Adnan Ozturel, Bernd Bohnet


Abstract
We introduce TWT; a new treebank for Turkish which consists of web and Wikipedia sentences that are annotated for segmentation, morphology, part-of-speech and dependency relations. To date, it is the largest publicly available human-annotated morpho-syntactic Turkish treebank in terms of the annotated word count. It is also the first large Turkish dependency treebank that has a dedicated Wikipedia section. We present the tagsets and the methodology that are used in annotating the treebank and also the results of the baseline experiments on Turkish dependency parsing with this treebank.
Anthology ID:
2020.lrec-1.634
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5156–5163
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.634
DOI:
Bibkey:
Cite (ACL):
Tolga Kayadelen, Adnan Ozturel, and Bernd Bohnet. 2020. A Gold Standard Dependency Treebank for Turkish. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5156–5163, Marseille, France. European Language Resources Association.
Cite (Informal):
A Gold Standard Dependency Treebank for Turkish (Kayadelen et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.634.pdf