Moein Tash
2024
Lidoma@DravidianLangTech 2024: Identifying Hate Speech in Telugu Code-Mixed: A BERT Multilingual
Muhammad Zamir
|
Moein Tash
|
Zahra Ahani
|
Alexander Gelbukh
|
Grigori Sidorov
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Over the past few years, research on hate speech and offensive content identification on social media has been ongoing. Since most people in the world are not native English speakers, unapproved messages are typically sent in code-mixed language. We accomplished collaborative work to identify the language of code-mixed text on social media in order to address the difficulties associated with it in the Telugu language scenario. Specifically, we participated in the shared task on the provided dataset by the Dravidian- LangTech Organizer for the purpose of identifying hate and non-hate content. The assignment is to classify each sentence in the provided text into two predetermined groups: hate or non-hate. We developed a model in Python and selected a BERT multilingual to do the given task. Using a train-development data set, we developed a model, which we then tested on test data sets. An average macro F1 score metric was used to measure the model’s performance. For the task, the model reported an average macro F1 of 0.6151.
2023
LIDOMA@DravidianLangTech: Convolutional Neural Networks for Studying Correlation Between Lexical Features and Sentiment Polarity in Tamil and Tulu Languages
Moein Tash
|
Jesus Armenta-Segura
|
Zahra Ahani
|
Olga Kolesnikova
|
Grigori Sidorov
|
Alexander Gelbukh
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
With the prevalence of code-mixing among speakers of Dravidian languages, DravidianLangTech proposed the shared task on Sentiment Analysis in Tamil and Tulu at RANLP 2023. This paper presents the submission of LIDOMA, which proposes a methodology that combines lexical features and Convolutional Neural Networks (CNNs) to address the challenge. A fine-tuned 6-layered CNN model is employed, achieving macro F1 scores of 0.542 and 0.199 for Tulu and Tamil, respectively
Search
Co-authors
- Zahra Ahani 2
- Grigori Sidorov 2
- Alexander Gelbukh 2
- Jesus Armenta-Segura 1
- Olga Kolesnikova 1
- show all...