A coreference corpus of Turkish situated dialogs

Faruk Büyüktekin, Umut Özge


Abstract
The paper introduces a publicly available corpus of Turkish situated dialogs annotated for coreference. We developed an annotation scheme for coreference annotation in Turkish, a language with pro-drop and rich agglutinating morphology. The annotation scheme is tailored for these aspects of the language, making it potentially applicable to similar languages. The corpus comprises 60 dialogs containing in total 3900 sentences, 18360 words, and 6120 mentions.
Anthology ID:
2024.sigturk-1.4
Volume:
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand and Online
Editors:
Duygu Ataman, Mehmet Oguz Derin, Sardana Ivanova, Abdullatif Köksal, Jonne Sälevä, Deniz Zeyrek
Venues:
SIGTURK | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–52
Language:
URL:
https://aclanthology.org/2024.sigturk-1.4
DOI:
Bibkey:
Cite (ACL):
Faruk Büyüktekin and Umut Özge. 2024. A coreference corpus of Turkish situated dialogs. In Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024), pages 42–52, Bangkok, Thailand and Online. Association for Computational Linguistics.
Cite (Informal):
A coreference corpus of Turkish situated dialogs (Büyüktekin & Özge, SIGTURK-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigturk-1.4.pdf