Faruk Büyüktekin
2024
A coreference corpus of Turkish situated dialogs
Faruk Büyüktekin
|
Umut Özge
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)
The paper introduces a publicly available corpus of Turkish situated dialogs annotated for coreference. We developed an annotation scheme for coreference annotation in Turkish, a language with pro-drop and rich agglutinating morphology. The annotation scheme is tailored for these aspects of the language, making it potentially applicable to similar languages. The corpus comprises 60 dialogs containing in total 3900 sentences, 18360 words, and 6120 mentions.