M. Shahiki Tash

2024

Zavira@DravidianLangTech 2024:Telugu hate speech detection using LSTM
Z. Ahani | M. Shahiki Tash | M. T. Zamir | I. Gelbukh | A. Gelbukh
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Hate speech is communication, often oral or written, that incites, stigmatizes, or incites violence or prejudice against individuals or groups based on characteristics such as race, religion, ethnicity, gender, sexual orientation, or other protected characteristics. This usually involves expressions of hostility, contempt, or prejudice and can have harmful social consequences.Among the broader social landscape, an important problem and challenge facing the medical community is related to the impact of people’s verbal expression. These words have a significant and immediate effect on human behavior and psyche. Repeating such phrases can even lead to depression and social isolation.In an attempt to identify and classify these Telugu text samples in the social media domain, our research LSTM and the findings of this experiment are summarized in this paper, in which out of 27 participants, we obtained 8th place with an F1 score of 0.68.

pdf bib abs

Lidoma@LT-EDI 2024:Tamil Hate Speech Detection in Migration Discourse
M. Shahiki Tash | Z. Ahani | M. T. Zamir | O. Kolesnikova | G. Sidorov
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

The exponential rise in social media users has revolutionized information accessibility and exchange. While these platforms serve various purposes, they also harbor negative elements, including hate speech and offensive behavior. Detecting hate speech in diverse languages has garnered significant attention in Natural Language Processing (NLP). This paper delves into hate speech detection in Tamil, particularly related to migration and refuge, contributing to the Caste/migration hate speech detection shared task. Employing a Convolutional Neural Network (CNN), our model achieved an F1 score of 0.76 in identifying hate speech and significant potential in the domain despite encountering complexities. We provide an overview of related research, methodology, and insights into the competition’s diverse performances, showcasing the landscape of hate speech detection nuances in the Tamil language.

2022

pdf bib abs

Word Level Language Identification in Code-mixed Kannada-English Texts using traditional machine learning algorithms
M. Shahiki Tash | Z. Ahani | A.l. Tonja | M. Gemeda | N. Hussain | O. Kolesnikova
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts

Language Identification at the Word Level in Kannada-English Texts. This paper de- scribes the system paper of CoLI-Kanglish 2022 shared task. The goal of this task is to identify the different languages used in CoLI- Kanglish 2022. This dataset is distributed into different categories including Kannada, En- glish, Mixed-Language, Location, Name, and Others. This Code-Mix was compiled by CoLI- Kanglish 2022 organizers from posts on social media. We use two classification techniques, KNN and SVM, and achieve an F1-score of 0.58 and place third out of nine competitors.

Co-authors

Venues

Fix author