İnanç Arın
Also published as: Inanc Arin
2024
Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024
Gökçe Uludoğan
|
Somaiyeh Dehghan
|
Inanc Arin
|
Elif Erol
|
Berrin Yanikoglu
|
Arzucan Özgür
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)
This paper offers an overview of Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE workshop to be held jointly with EACL 2024. The task was divided into two subtasks: Subtask A, targeting hate speech detection in various Turkish contexts, and Subtask B, addressing hate speech detection in Arabic with limited data. The shared task attracted significant attention with 33 teams that registered and 10 teams that participated in at least one task. In this paper, we provide the details of the tasks and the approaches adopted by the participant along with an analysis of the results obtained from this shared task.
2022
A Turkish Hate Speech Dataset and Detection System
Fatih Beyhan
|
Buse Çarık
|
İnanç Arın
|
Ayşecan Terzioğlu
|
Berrin Yanikoglu
|
Reyyan Yeniterzi
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Social media posts containing hate speech are reproduced and redistributed at an accelerated pace, reaching greater audiences at a higher speed. We present a machine learning system for automatic detection of hate speech in Turkish, along with a hate speech dataset consisting of tweets collected in two separate domains. We first adopted a definition for hate speech that is in line with our goals and amenable to easy annotation; then designed the annotation schema for annotating the collected tweets. The Istanbul Convention dataset consists of tweets posted following the withdrawal of Turkey from the Istanbul Convention. The Refugees dataset was created by collecting tweets about immigrants by filtering based on commonly used keywords related to immigrants. Finally, we have developed a hate speech detection system using the transformer architecture (BERTurk), to be used as a baseline for the collected dataset. The binary classification accuracy is 77% when the system is evaluated using 5-fold cross-validation on the Istanbul Convention dataset and 71% for the Refugee dataset. We also tested a regression model with 0.66 and 0.83 RMSE on a scale of [0-4], for the Istanbul Convention and Refugees datasets.
Search
Co-authors
- Berrin Yanıkoğlu 2
- Fatih Beyhan 1
- Buse Çarık 1
- Ayşecan Terzioğlu 1
- Reyyan Yeniterzi 1
- show all...