Oğul Sümer
2026
SarcasTürk: Turkish Context-Aware Sarcasm Detection Dataset
Niyazi Ahmet Metin | Sevde Yılmaz | Osman Enes Erdoğdu | Elif Sude Meydan | Oğul Sümer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Niyazi Ahmet Metin | Sevde Yılmaz | Osman Enes Erdoğdu | Elif Sude Meydan | Oğul Sümer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Sarcasm is a colloquial form of language that is used to convey messages in a non-literal way, which affects the performance of many NLP tasks. Sarcasm detection is not trivial and existing work mainly focus on only English. We present SarcasTürk, a context-aware Turkish sarcasm detection dataset built from Ekşi Sözlük entries, a large-scale Turkish online discussion platform where people frequently use sarcasm. SarcasTürk contains 1,515 entries from 98 titles with binary sarcasm labels and a title-level context field created to support comparisons between entry-only and context-aware models. We generate these contexts by selecting representative sentences from all entries under a title using summarization techniques. We report baseline results for a fine-tuned BERTurk classifier and zero-shot LLMs under both no-context and context-aware conditions. We find that BERTurk model with title-level context has the best performance with 0.76 accuracy and balanced class-wise F1 scores (0.77 for sarcasm, 0.75 for no sarcasm). SarcasTürk can be shared upon contacting the authors since the dataset contains potentially sensitive and offensive language.