Dilara Keküllüoğlu
2026
TUNE: A Task For Turkish Machine Unlearning For Data Privacy
Doruk Benli | Ada Canoğlu | Nehir İlkim Gönençer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Doruk Benli | Ada Canoğlu | Nehir İlkim Gönençer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Most large language models (LLMs) are trainedon massive datasets that include private infor-mation, which may be disclosed to third-partyusers in output generation. Developers put de-fences to prevent the generation of harmful andprivate information, but jailbreaking methodscan be used to bypass them. Machine unlearn-ing aims to remove information that may beprivate or harmful from the model’s genera-tion without retraining the model from scratch.While machine unlearning has gained somepopularity to counter the removal of privateinformation, especially in English, little to noattention has been given to Turkish unlearn-ing paradigms or existing benchmarks. In thisstudy, we introduce TUNE (Turkish Unlearn-ing Evaluation), the first benchmark datasetfor Turkish unlearning task for personal infor-mation. TUNE consists of 9842 input-targettext pairs about 50 fictitious personalities withtwo training task types: (1) Q A and (2) In-formation Request. We fine-tuned the mT5base model to evaluate various unlearning meth-ods, including our proposed approach. We findthat while current methods can help unlearnunwanted private information in Turkish, theyalso unlearn other information we want to re-tain in the model.
SarcasTürk: Turkish Context-Aware Sarcasm Detection Dataset
Niyazi Ahmet Metin | Sevde Yılmaz | Osman Enes Erdoğdu | Elif Sude Meydan | Oğul Sümer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Niyazi Ahmet Metin | Sevde Yılmaz | Osman Enes Erdoğdu | Elif Sude Meydan | Oğul Sümer | Dilara Keküllüoğlu
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Sarcasm is a colloquial form of language that is used to convey messages in a non-literal way, which affects the performance of many NLP tasks. Sarcasm detection is not trivial and existing work mainly focus on only English. We present SarcasTürk, a context-aware Turkish sarcasm detection dataset built from Ekşi Sözlük entries, a large-scale Turkish online discussion platform where people frequently use sarcasm. SarcasTürk contains 1,515 entries from 98 titles with binary sarcasm labels and a title-level context field created to support comparisons between entry-only and context-aware models. We generate these contexts by selecting representative sentences from all entries under a title using summarization techniques. We report baseline results for a fine-tuned BERTurk classifier and zero-shot LLMs under both no-context and context-aware conditions. We find that BERTurk model with title-level context has the best performance with 0.76 accuracy and balanced class-wise F1 scores (0.77 for sarcasm, 0.75 for no sarcasm). SarcasTürk can be shared upon contacting the authors since the dataset contains potentially sensitive and offensive language.