Atıf Emre Yüksel


2024

pdf bib
Detecting Hate Speech in Turkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge
Gökçe Uludoğan | Atıf Emre Yüksel | Ümit Tunçer | Burak Işık | Yasemin Korkmaz | Didar Akar | Arzucan Özgür
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

The use of hate speech targeting ethnicity, nationalities, religious identities, and specific groups has been on the rise in the news media. However, most existing automatic hate speech detection models focus on identifying hate speech, often neglecting the target group-specific language that is common in news articles. To address this problem, we first compile a hate speech dataset, TurkishHatePrintCorpus, derived from Turkish news articles and annotate it specifically for the language related to the targeted group. We then introduce the HateTargetBERT model, which integrates the target-centric linguistic features extracted in this study into the BERT model, and demonstrate its effectiveness in detecting hate speech while allowing the model’s classification decision to be explained. We have made the dataset and source code publicly available at url{https://github.com/boun-tabi/HateTargetBERT-TR}.

2019

pdf bib
Turkish Tweet Classification with Transformer Encoder
Atıf Emre Yüksel | Yaşar Alim Türkmen | Arzucan Özgür | Berna Altınel
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Short-text classification is a challenging task, due to the sparsity and high dimensionality of the feature space. In this study, we aim to analyze and classify Turkish tweets based on their topics. Social media jargon and the agglutinative structure of the Turkish language makes this classification task even harder. As far as we know, this is the first study that uses a Transformer Encoder for short text classification in Turkish. The model is trained in a weakly supervised manner, where the training data set has been labeled automatically. Our results on the test set, which has been manually labeled, show that performing morphological analysis improves the classification performance of the traditional machine learning algorithms Random Forest, Naive Bayes, and Support Vector Machines. Still, the proposed approach achieves an F-score of 89.3 % outperforming those algorithms by at least 5 points.