Malliga Subramanian


2024

pdf bib
Overview of the Second Shared Task on Fake News Detection in Dravidian Languages: DravidianLangTech@EACL 2024
Malliga Subramanian | Bharathi Raja Chakravarthi | Kogilavani Shanmugavadivel | Santhiya Pandiyan | Prasanna Kumar Kumaresan | Balasubramanian Palani | Premjith B | Vanaja K | Mithunja S | Devika K | Hariprasath S.b | Haripriya B | Vigneshwar E
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rise of online social media has revolutionized communication, offering users a convenient way to share information and stay updated on current events. However, this surge in connectivity has also led to the proliferation of misinformation, commonly known as fake news. This misleading content, often disguised as legitimate news, poses a significant challenge as it can distort public perception and erode trust in reliable sources. This shared task consists of two subtasks such as task 1 and task 2. Task 1 aims to classify a given social media text into original or fake. The goal of the FakeDetect-Malayalam task2 is to encourage participants to develop effective models capable of accurately detecting and classifying fake news articles in the Malayalam language into different categories like False, Half True, Mostly False, Partly False, and Mostly True. For this shared task, 33 participants submitted their results.

pdf bib
Beyond Tech@DravidianLangTech2024 : Fake News Detection in Dravidian Languages Using Machine Learning
Kogilavani Shanmugavadivel | Malliga Subramanian | Sanjai R | Mohammed Sameer B | Motheeswaran K
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

In the digital age, identifying fake news is essential when fake information travels quickly via social media platforms. This project employs machine learning techniques, including Random Forest, Logistic Regression, and Decision Tree, to distinguish between real and fake news. With the rise of news consumption on social media, it becomes essential to authenticate information shared on platforms like YouTube comments. The research emphasizes the need to stop spreading harmful rumors and focuses on authenticating news articles. The proposed model utilizes machine learning and natural language processing, specifically Support Vector Machines, to aggregate and determine the authenticity of news. To address the challenges of detecting fake news in this paper, describe the Machine Learning (ML) models submitted to ‘Fake News Detection in Dravidian Languages” at DravidianLangTech@EACL 2024 shared task. Four different models, namely: Naive Bayes, Support Vector Machine (SVM), Random forest, and Decision tree.

pdf bib
Code_Makers@DravidianLangTech-EACL 2024 : Sentiment Analysis in Code-Mixed Tamil using Machine Learning Techniques
Kogilavani Shanmugavadivel | Sowbharanika J S | Navbila K | Malliga Subramanian
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rising importance of sentiment analysis online community research is addressed in our project, which focuses on the surge of code-mixed writing in multilingual social media. Targeting sentiments in texts combining Tamil and English, our supervised learning approach, particularly the Decision Tree algorithm, proves essential for effective sentiment classification. Notably, Decision Tree(accuracy: 0.99, average F1 score: 0.39), Random Forest exhibit high accuracy (accuracy: 0.99, macro average F1 score : 0.35), SVM (accuracy: 0.78, macro average F1 score : 0.68), Logistic Regression (accuracy: 0.75, macro average F1 score: 0.62), KNN (accuracy: 0.73, macro average F1 score : 0.26) also demonstrate commendable results. These findings showcase the project’s efficacy, offering promise for linguistic research and technological advancements. Securing the 8th rank emphasizes its recognition in the field.

pdf bib
MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques
Kogilavani Shanmugavadivel | Sowbarnigaa K S | Mehal Sakthi M S | Subhadevi K | Malliga Subramanian
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This study presents a strong methodology for detecting offensive content in multilingual text, with a focus on Kannada and Kannada-English mixed comments. The first step in data preprocessing is to work with a dataset containing Kannada comments, which is backed by Google Translate for Kannada-English translation. Following tokenization and sequence labeling, BIO tags are assigned to indicate the existence and bounds of objectionable spans within the text. On annotated data, a Bidirectional LSTM neural network model is trained and BiLSTM model’s macro F1 score is 61.0 in recognizing objectionable content. Data preparation, model architecture definition, and iterative training with Kannada and Kannada- English text are all part of the training process. In a fresh dataset, the trained model accurately predicts offensive spans, emphasizing comments in the aforementioned languages. Predictions that have been recorded and include offensive span indices are organized into a database.

pdf bib
InnovationEngineers@DravidianLangTech-EACL 2024: Sentimental Analysis of YouTube Comments in Tamil by using Machine Learning
Kogilavani Shanmugavadivel | Malliga Subramanian | Palanimurugan V | Pavul chinnappan D
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

There is opportunity for machine learning and natural language processing research because of the growing volume of textual data. Although there has been little research done on trend extraction from YouTube comments, sentiment analysis is an intriguing issue because of the poor consistency and quality of the material found there. The purpose of this work is to use machine learning techniques and algorithms to do sentiment analysis on YouTube comments pertaining to popular themes. The findings demonstrate that sentiment analysis is capable of giving a clear picture of how actual events affect public opinion. This study aims to make it easier for academics to find high-quality sentiment analysis research publications. Data normalisation methods are used to clean an annotated corpus of 1500 citation sentences for the study. .For classification, a system utilising one machine learning algorithm—K-Nearest Neighbour (KNN), Na ̈ıve Bayes, SVC (Support Vector Machine), and RandomForest—is built. Metrics like the f1-score and correctness score are used to assess the correctness of the system.

pdf bib
KEC_HAWKS@DravidianLangTech 2024 : Detecting Malayalam Fake News using Machine Learning Models
Malliga Subramanian | Jayanthjr J R | Muthu Karuppan P | Keerthibala T | Kogilavani Shanmugavadivel
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The proliferation of fake news in the Malayalam language across digital platforms has emerged as a pressing issue. By employing Recurrent Neural Networks (RNNs), a type of machine learning model, we aim to distinguish between Original and Fake News in Malayalam and achieved 9th rank in Task 1.RNNs are chosen for their ability to understand the sequence of words in a sentence, which is important in languages like Malayalam. Our main goal is to develop better models that can spot fake news effectively. We analyze various features to understand what contributes most to this accuracy. By doing so, we hope to provide a reliable method for identifying and combating fake news in the Malayalam language.

pdf bib
KEC-AI-NLP@LT-EDI-2024:Homophobia and Transphobia Detection in Social Media Comments using Machine Learning
Kogilavani Shanmugavadivel | Malliga Subramanian | Shri R | Srigha S | Samyuktha K | Nithika K
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

Our work addresses the growing concern of abusive comments in online platforms, particularly focusing on the identification of Homophobia and Transphobia in social media comments. The goal is to categorize comments into three classes: Homophobia, Transphobia, and non-anti LGBT+ comments. Utilizing machine learning techniques and a deep learning model, our work involves training on a English dataset with a designated training set and testing on a validation set. This approach aims to contribute to the understanding and detection of Homophobia and Transphobia within the realm of social media interactions. Our team participated in the shared task organized by LTEDI@EACL 2024 and secured seventh rank in the task of Homophobia/Transphobia Detection in social media comments in Tamil with a macro- f1 score of 0.315. Also, our run was submitted for the English language and secured eighth rank with a macro-F1 score of 0.369. The run submitted for Malayalam language securing fourth rank with a macro- F1 score of 0.883 using the Random Forest model.

pdf bib
KEC AI DSNLP@LT-EDI-2024:Caste and Migration Hate Speech Detection using Machine Learning Techniques
Kogilavani Shanmugavadivel | Malliga Subramanian | Aiswarya M | Aruna T | Jeevaananth S
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

Commonly used language defines “hate speech” as objectionable statements that may jeopardize societal harmony by singling out a group or a person based on fundamental traits (including gender, caste, or religion). Using machine learning techniques, our research focuses on identifying hate speech in social media comments. Using a variety of machine learning methods, we created machine learning models to detect hate speech. An approximate Macro F1 of 0.60 was attained by the created models.

pdf bib
KEC_AI_MIRACLE_MAKERS@LT-EDI-2024: Stress Identification in Dravidian Languages using Machine Learning Techniques
Kogilavani Shanmugavadivel | Malliga Subramanian | Monika J | Monishaa S | Rishibalan B
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

Identifying an individual where he/she is stressed or not stressed is our shared task topic. we have used several machine learning models for identifying the stress. This paper presents our system submission for the task 1 and 2 for both Tamil and Telugu dataset, focusing on us- ing supervised approaches. For Tamil dataset, we got highest accuracy for the Support Vector Machine model with f1-score of 0.98 and for Telugu dataset, we got highest accuracy for Random Forest algorithm with f1-score of 0.99. By using this model, Stress Identification System will be helpful for an individual to improve their mental health in optimistic manner.

2023

pdf bib
KEC_AI_NLP@DravidianLangTech: Abusive Comment Detection in Tamil Language
Kogilavani Shanmugavadivel | Malliga Subramanian | Shri Durga R | Srigha S | Sree Harene J S | Yasvanth Bala P
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Our work aims to identify the negative comments that is associated with Counter-speech,Xenophobia, Homophobia,Transphobia, Misandry, Misogyny, None-of-the-above categories, In order to identify these categories from the given dataset, we propose three different models such as traditional machine learning techniques, deep learning model and transfer Learning model called BERT is also used to analyze the texts. In the Tamil dataset, we are training the models with Train dataset and test the models with Validation data. Our Team Participated in the shared task organised by DravidianLangTech and secured 4th rank in the task of abusive comment detection in Tamil with a macro- f1 score of 0.35. Also, our run was submitted for abusive comment detection in code-mixed languages (Tamil-English) and secured 6th rank with a macro-f1 score of 0.42.

pdf bib
KEC_AI_NLP_DEP @ LT-EDI : Detecting Signs of Depression From Social Media Texts
Kogilavani Shanmugavadivel | Malliga Subramanian | Vasantharan K | Prethish Ga | Sankar S | Sabari S
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

The goal of this study is to use machine learning approaches to detect depression indications in social media articles. Data gathering, pre-processing, feature extraction, model training, and performance evaluation are all aspects of the research. The collection consists of social media messages classified into three categories: not depressed, somewhat depressed, and severely depressed. The study contributes to the growing field of social media data-driven mental health analysis by stressing the use of feature extraction algorithms for obtaining relevant information from text data. The use of social media communications to detect depression has the potential to increase early intervention and help for people at risk. Several feature extraction approaches, such as TF-IDF, Count Vectorizer, and Hashing Vectorizer, are used to quantitatively represent textual data. These features are used to train and evaluate a wide range of machine learning models, including Logistic Regression, Random Forest, Decision Tree, Gaussian Naive Bayes, and Multinomial Naive Bayes. To assess the performance of the models, metrics such as accuracy, precision, recall, F1 score, and the confusion matrix are utilized. The Random Forest model with Count Vectorizer had the greatest accuracy on the development dataset, coming in at 92.99 percent. And with a macro F1-score of 0.362, we came in 19th position in the shared task. The findings show that machine learning is effective in detecting depression markers in social media articles.

2022

pdf bib
Transformers at SemEval-2022 Task 5: A Feature Extraction based Approach for Misogynous Meme Detection
Shankar Mahadevan | Sean Benhur | Roshan Nayak | Malliga Subramanian | Kogilavani Shanmugavadivel | Kanchana Sivanraju | Bharathi Raja Chakravarthi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Social media is an idea created to make theworld smaller and more connected. Recently,it has become a hub of fake news and sexistmemes that target women. Social Media shouldensure proper women’s safety and equality. Filteringsuch information from social media is ofparamount importance to achieving this goal. In this paper, we describe the system developedby our team for SemEval-2022 Task 5: MultimediaAutomatic Misogyny Identification. Wepropose a multimodal training methodologythat achieves good performance on both thesubtasks, ranking 4th for Subtask A (0.718macro F1-score) and 9th for Subtask B (0.695macro F1-score) while exceeding the baselineresults by good margins.

pdf bib
Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in Dravidian Languages
Premjith B | Bharathi Raja Chakravarthi | Malliga Subramanian | Bharathi B | Soman Kp | Dhanalakshmi V | Sreelakshmi K | Arunaggiri Pandian | Prasanna Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents the findings of the shared task on Multimodal Sentiment Analysis and Troll meme classification in Dravidian languages held at ACL 2022. Multimodal sentiment analysis deals with the identification of sentiment from video. In addition to video data, the task requires the analysis of corresponding text and audio features for the classification of movie reviews into five classes. We created a dataset for this task in Malayalam and Tamil. The Troll meme classification task aims to classify multimodal Troll memes into two categories. This task assumes the analysis of both text and image features for making better predictions. The performance of the participating teams was analysed using the F1-score. Only one team submitted their results in the Multimodal Sentiment Analysis task, whereas we received six submissions in the Troll meme classification task. The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0.24. In the Troll meme classification task, the winning team achieved an F1-score of 0.596.

pdf bib
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Subalalitha Cn | Sangeetha S | Malliga Subramanian | Kogilavani Shanmugavadivel | Parameswari Krishnamurthy | Adeep Hande | Siddhanth U Hegde | Roshan Nayak | Swetha Valli
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.

pdf bib
Overview of Abusive Comment Detection in Tamil-ACL 2022
Ruba Priyadharshini | Bharathi Raja Chakravarthi | Subalalitha Cn | Thenmozhi Durairaj | Malliga Subramanian | Kogilavani Shanmugavadivel | Siddhanth U Hegde | Prasanna Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.