Md. Rahman

2024

Binary_Beasts@DravidianLangTech-EACL 2024: Multimodal Abusive Language Detection in Tamil based on Integrated Approach of Machine Learning and Deep Learning Techniques
Md. Rahman | Abu Raihan | Tanzim Rahman | Shawly Ahsan | Jawad Hossain | Avishek Das | Mohammed Moshiul Hoque
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Detecting abusive language on social media is a challenging task that needs to be solved effectively. This research addresses the formidable challenge of detecting abusive language in Tamil through a comprehensive multimodal approach, incorporating textual, acoustic, and visual inputs. This study utilized ConvLSTM, 3D-CNN, and a hybrid 3D-CNN with BiLSTM to extract video features. Several models, such as BiLSTM, LR, and CNN, are explored for processing audio data, whereas for textual content, MNB, LR, and LSTM methods are explored. To further enhance overall performance, this work introduced a weighted late fusion model amalgamating predictions from all modalities. The fusion model was then applied to make predictions on the test dataset. The ConvLSTM+BiLSTM+MNB model yielded the highest macro F1 score of 71.43%. Our methodology allowed us to achieve 1 st rank for multimodal abusive language detection in the shared task

pdf bib abs

Identifying between fake and original news in social media demands vigilant procedures. This paper introduces the significant shared task on ‘Fake News Detection in Dravidian Languages - DravidianLangTech@EACL 2024’. With a focus on the Malayalam language, this task is crucial in identifying social media posts as either fake or original news. The participating teams contribute immensely to this task through their varied strategies, employing methods ranging from conventional machine-learning techniques to advanced transformer-based models. Notably, the findings of this work highlight the effectiveness of the Malayalam-BERT model, demonstrating an impressive macro F1 score of 0.88 in distinguishing between fake and original news in Malayalam social media content, achieving a commendable rank of 1st among the participants.

pdf bib abs

The pervasive impact of stress on individuals necessitates proactive identification and intervention measures, especially in social media interaction. This research paper addresses the imperative need for proactive identification and intervention concerning the widespread influence of stress on individuals. This study focuses on the shared task, “Stress Identification in Dravidian Languages,” specifically emphasizing Tamil and Telugu code-mixed languages. The primary objective of the task is to classify social media messages into two categories: stressed and non stressed. We employed various methodologies, from traditional machine-learning techniques to state-of-the-art transformer-based models. Notably, the Tamil-BERT and Telugu-BERT models exhibited exceptional performance, achieving a noteworthy macro F1-score of 0.71 and 0.72, respectively, and securing the 15^th position in Tamil code-mixed language and the 9^th position in the Telugu code-mixed language. These findings underscore the effectiveness of these models in recognizing stress signals within social media content composed in Tamil and Telugu.

Co-authors

Abu Raihan 3

Venues

Fix author