Gökçe Uludoğan

2025

HATECAT-TR: A Hate Speech Span Detection and Categorization Dataset for Turkish
Hasan Kerem Şeker | Gökçe Uludoğan | Pelin Önal | Arzucan Özgür
Findings of the Association for Computational Linguistics: EMNLP 2025

Hate speech on social media in Turkey remains a critical issue, frequently targeting minority groups. Effective moderation requires not only detecting hateful posts but also identifying the specific hateful expressions within them. To address this, we introduce HATECAT-TR, a span-annotated dataset of Turkish tweets, containing 4465 hateful spans across 2981 posts, each directed at one of eight minority groups. Annotations were created using a semi-automated approach, combining GPT-4o-generated spans with human expert review to ensure accuracy. Each hateful span is categorized into one of five discourse types, enabling a fine-grained analysis of the nature and intent behind hateful content. We frame span detection as binary and multi-class token classification tasks and utilize the state-of-the-art language models to establish a baseline performance for the new dataset. Our findings highlight the challenges of detecting and categorizing implicit hate speech, particularly when spans are subtle and highly contextual. The source code is available at github.com/boun-tabi/hatecat-tr and HATECAT-TR can be shared by complying with the terms of X upon contacting the authors.

2024

pdf bib abs

The recent advances in natural language processing have predominantly favored well-resourced English-centric models, resulting in a significant gap with low-resource languages. In this work, we introduce TURNA, a language model developed for the low-resource language Turkish and is capable of both natural language understanding and generation tasks.TURNA is pretrained with an encoder-decoder architecture based on the unified framework UL2 with a diverse corpus that we specifically curated for this purpose. We evaluated TURNA with three generation tasks and five understanding tasks for Turkish. The results show that TURNA outperforms several multilingual models in both understanding and generation tasks and competes with monolingual Turkish models in understanding tasks.

pdf bib abs

A Concise Report of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Ali Hürriyetoğlu | Surendrabikram Thapa | Gökçe Uludoğan | Somaiyeh Dehghan | Hristo Tanev
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

In this paper, we provide a brief overview of the 7th workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) co-located with EACL 2024. This workshop consisted of regular papers, system description papers submitted by shared task participants, and overview papers of shared tasks held. This workshop series has been bringing together experts and enthusiasts from technical and social science fields, providing a platform for better understanding event information. This workshop not only advances text-based event extraction but also facilitates research in event extraction in multimodal settings.

pdf bib abs

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024
Gökçe Uludoğan | Somaiyeh Dehghan | Inanc Arin | Elif Erol | Berrin Yanikoglu | Arzucan Özgür
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

This paper offers an overview of Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE workshop to be held jointly with EACL 2024. The task was divided into two subtasks: Subtask A, targeting hate speech detection in various Turkish contexts, and Subtask B, addressing hate speech detection in Arabic with limited data. The shared task attracted significant attention with 33 teams that registered and 10 teams that participated in at least one task. In this paper, we provide the details of the tasks and the approaches adopted by the participant along with an analysis of the results obtained from this shared task.

pdf bib

Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
Ali Hürriyetoğlu | Hristo Tanev | Surendrabikram Thapa | Gökçe Uludoğan
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

pdf bib abs

Detecting Hate Speech in Turkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge
Gökçe Uludoğan | Atıf Emre Yüksel | Ümit Tunçer | Burak Işık | Yasemin Korkmaz | Didar Akar | Arzucan Özgür
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

The use of hate speech targeting ethnicity, nationalities, religious identities, and specific groups has been on the rise in the news media. However, most existing automatic hate speech detection models focus on identifying hate speech, often neglecting the target group-specific language that is common in news articles. To address this problem, we first compile a hate speech dataset, TurkishHatePrintCorpus, derived from Turkish news articles and annotate it specifically for the language related to the targeted group. We then introduce the HateTargetBERT model, which integrates the target-centric linguistic features extracted in this study into the BERT model, and demonstrate its effectiveness in detecting hate speech while allowing the model’s classification decision to be explained. We have made the dataset and source code publicly available at url{https://github.com/boun-tabi/HateTargetBERT-TR}.

2022

pdf bib abs

BOUN-TABI@SMM4H’22: Text-to-Text Adverse Drug Event Extraction with Data Balancing and Prompting
Gökçe Uludoğan | Zeynep Yirmibeşoğlu
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes models developed for the Social Media Mining for Health 2022 Shared Task. We participated in two subtasks: classification of English tweets reporting adverse drug events (ADE) (Task 1a) and extraction of ADE spans in such tweets (Task 1b). We developed two separate systems based on the T5 model, viewing these tasks as sequence-to-sequence problems. To address the class imbalance, we made use of data balancing via over- and undersampling on both tasks. For the ADE extraction task, we explored prompting to further benefit from the T5 model and its formulation. Additionally, we built an ensemble model, utilizing both balanced and prompted models. The proposed models outperformed the current state-of-the-art, with an F1 score of 0.655 on ADE classification and a Partial F1 score of 0.527 on ADE extraction.

Venues

Fix author