A Comparison of Machine Learning Techniques for Turkish Profanity Detection

Levent Soykan, Cihan Karsak, Ilknur Durgar Elkahlout, Burak Aytan


Abstract
Profanity detection became an important task with the increase of social media usage. Most of the users prefer a clean and profanity free environment to communicate with others. In order to provide a such environment for the users, service providers are using various profanity detection tools. In this paper, we researched on Turkish profanity detection in our search engine. We collected and labeled a dataset from search engine queries as one of the two classes: profane and not-profane. We experimented with several classical machine learning and deep learning methods and compared methods in means of speed and accuracy. We performed our best scores with transformer based Electra model with 0.93 F1 Score. We also compared our models with the state-of-the-art Turkish profanity detection tool and observed that we outperform it from all aspects.
Anthology ID:
2022.restup-1.3
Volume:
Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Johanna Monti, Valerio Basile, Maria Pia Di Buono, Raffaele Manna, Antonio Pascucci, Sara Tonelli
Venue:
ResTUP
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
16–24
Language:
URL:
https://aclanthology.org/2022.restup-1.3
DOI:
Bibkey:
Cite (ACL):
Levent Soykan, Cihan Karsak, Ilknur Durgar Elkahlout, and Burak Aytan. 2022. A Comparison of Machine Learning Techniques for Turkish Profanity Detection. In Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis, pages 16–24, Marseille, France. European Language Resources Association.
Cite (Informal):
A Comparison of Machine Learning Techniques for Turkish Profanity Detection (Soykan et al., ResTUP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.restup-1.3.pdf