2024
pdf
bib
abs
Overview of the Second Shared Task on Fake News Detection in Dravidian Languages: DravidianLangTech@EACL 2024
Malliga Subramanian
|
Bharathi Raja Chakravarthi
|
Kogilavani Shanmugavadivel
|
Santhiya Pandiyan
|
Prasanna Kumar Kumaresan
|
Balasubramanian Palani
|
Premjith B
|
Vanaja K
|
Mithunja S
|
Devika K
|
Hariprasath S.b
|
Haripriya B
|
Vigneshwar E
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rise of online social media has revolutionized communication, offering users a convenient way to share information and stay updated on current events. However, this surge in connectivity has also led to the proliferation of misinformation, commonly known as fake news. This misleading content, often disguised as legitimate news, poses a significant challenge as it can distort public perception and erode trust in reliable sources. This shared task consists of two subtasks such as task 1 and task 2. Task 1 aims to classify a given social media text into original or fake. The goal of the FakeDetect-Malayalam task2 is to encourage participants to develop effective models capable of accurately detecting and classifying fake news articles in the Malayalam language into different categories like False, Half True, Mostly False, Partly False, and Mostly True. For this shared task, 33 participants submitted their results.
pdf
bib
abs
Beyond Tech@DravidianLangTech2024 : Fake News Detection in Dravidian Languages Using Machine Learning
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Sanjai R
|
Mohammed Sameer B
|
Motheeswaran K
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
In the digital age, identifying fake news is essential when fake information travels quickly via social media platforms. This project employs machine learning techniques, including Random Forest, Logistic Regression, and Decision Tree, to distinguish between real and fake news. With the rise of news consumption on social media, it becomes essential to authenticate information shared on platforms like YouTube comments. The research emphasizes the need to stop spreading harmful rumors and focuses on authenticating news articles. The proposed model utilizes machine learning and natural language processing, specifically Support Vector Machines, to aggregate and determine the authenticity of news. To address the challenges of detecting fake news in this paper, describe the Machine Learning (ML) models submitted to ‘Fake News Detection in Dravidian Languages” at DravidianLangTech@EACL 2024 shared task. Four different models, namely: Naive Bayes, Support Vector Machine (SVM), Random forest, and Decision tree.
pdf
bib
abs
Code_Makers@DravidianLangTech-EACL 2024 : Sentiment Analysis in Code-Mixed Tamil using Machine Learning Techniques
Kogilavani Shanmugavadivel
|
Sowbharanika J S
|
Navbila K
|
Malliga Subramanian
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rising importance of sentiment analysis online community research is addressed in our project, which focuses on the surge of code-mixed writing in multilingual social media. Targeting sentiments in texts combining Tamil and English, our supervised learning approach, particularly the Decision Tree algorithm, proves essential for effective sentiment classification. Notably, Decision Tree(accuracy: 0.99, average F1 score: 0.39), Random Forest exhibit high accuracy (accuracy: 0.99, macro average F1 score : 0.35), SVM (accuracy: 0.78, macro average F1 score : 0.68), Logistic Regression (accuracy: 0.75, macro average F1 score: 0.62), KNN (accuracy: 0.73, macro average F1 score : 0.26) also demonstrate commendable results. These findings showcase the project’s efficacy, offering promise for linguistic research and technological advancements. Securing the 8th rank emphasizes its recognition in the field.
pdf
bib
abs
MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques
Kogilavani Shanmugavadivel
|
Sowbarnigaa K S
|
Mehal Sakthi M S
|
Subhadevi K
|
Malliga Subramanian
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This study presents a strong methodology for detecting offensive content in multilingual text, with a focus on Kannada and Kannada-English mixed comments. The first step in data preprocessing is to work with a dataset containing Kannada comments, which is backed by Google Translate for Kannada-English translation. Following tokenization and sequence labeling, BIO tags are assigned to indicate the existence and bounds of objectionable spans within the text. On annotated data, a Bidirectional LSTM neural network model is trained and BiLSTM model’s macro F1 score is 61.0 in recognizing objectionable content. Data preparation, model architecture definition, and iterative training with Kannada and Kannada- English text are all part of the training process. In a fresh dataset, the trained model accurately predicts offensive spans, emphasizing comments in the aforementioned languages. Predictions that have been recorded and include offensive span indices are organized into a database.
pdf
bib
abs
InnovationEngineers@DravidianLangTech-EACL 2024: Sentimental Analysis of YouTube Comments in Tamil by using Machine Learning
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Palanimurugan V
|
Pavul chinnappan D
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
There is opportunity for machine learning and natural language processing research because of the growing volume of textual data. Although there has been little research done on trend extraction from YouTube comments, sentiment analysis is an intriguing issue because of the poor consistency and quality of the material found there. The purpose of this work is to use machine learning techniques and algorithms to do sentiment analysis on YouTube comments pertaining to popular themes. The findings demonstrate that sentiment analysis is capable of giving a clear picture of how actual events affect public opinion. This study aims to make it easier for academics to find high-quality sentiment analysis research publications. Data normalisation methods are used to clean an annotated corpus of 1500 citation sentences for the study. .For classification, a system utilising one machine learning algorithm—K-Nearest Neighbour (KNN), Na ̈ıve Bayes, SVC (Support Vector Machine), and RandomForest—is built. Metrics like the f1-score and correctness score are used to assess the correctness of the system.
pdf
bib
abs
KEC_HAWKS@DravidianLangTech 2024 : Detecting Malayalam Fake News using Machine Learning Models
Malliga Subramanian
|
Jayanthjr J R
|
Muthu Karuppan P
|
Keerthibala T
|
Kogilavani Shanmugavadivel
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The proliferation of fake news in the Malayalam language across digital platforms has emerged as a pressing issue. By employing Recurrent Neural Networks (RNNs), a type of machine learning model, we aim to distinguish between Original and Fake News in Malayalam and achieved 9th rank in Task 1.RNNs are chosen for their ability to understand the sequence of words in a sentence, which is important in languages like Malayalam. Our main goal is to develop better models that can spot fake news effectively. We analyze various features to understand what contributes most to this accuracy. By doing so, we hope to provide a reliable method for identifying and combating fake news in the Malayalam language.
pdf
bib
abs
KEC-AI-NLP@LT-EDI-2024:Homophobia and Transphobia Detection in Social Media Comments using Machine Learning
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Shri R
|
Srigha S
|
Samyuktha K
|
Nithika K
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Our work addresses the growing concern of abusive comments in online platforms, particularly focusing on the identification of Homophobia and Transphobia in social media comments. The goal is to categorize comments into three classes: Homophobia, Transphobia, and non-anti LGBT+ comments. Utilizing machine learning techniques and a deep learning model, our work involves training on a English dataset with a designated training set and testing on a validation set. This approach aims to contribute to the understanding and detection of Homophobia and Transphobia within the realm of social media interactions. Our team participated in the shared task organized by LTEDI@EACL 2024 and secured seventh rank in the task of Homophobia/Transphobia Detection in social media comments in Tamil with a macro- f1 score of 0.315. Also, our run was submitted for the English language and secured eighth rank with a macro-F1 score of 0.369. The run submitted for Malayalam language securing fourth rank with a macro- F1 score of 0.883 using the Random Forest model.
pdf
bib
abs
KEC AI DSNLP@LT-EDI-2024:Caste and Migration Hate Speech Detection using Machine Learning Techniques
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Aiswarya M
|
Aruna T
|
Jeevaananth S
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Commonly used language defines “hate speech” as objectionable statements that may jeopardize societal harmony by singling out a group or a person based on fundamental traits (including gender, caste, or religion). Using machine learning techniques, our research focuses on identifying hate speech in social media comments. Using a variety of machine learning methods, we created machine learning models to detect hate speech. An approximate Macro F1 of 0.60 was attained by the created models.
pdf
bib
abs
KEC_AI_MIRACLE_MAKERS@LT-EDI-2024: Stress Identification in Dravidian Languages using Machine Learning Techniques
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Monika J
|
Monishaa S
|
Rishibalan B
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Identifying an individual where he/she is stressed or not stressed is our shared task topic. we have used several machine learning models for identifying the stress. This paper presents our system submission for the task 1 and 2 for both Tamil and Telugu dataset, focusing on us- ing supervised approaches. For Tamil dataset, we got highest accuracy for the Support Vector Machine model with f1-score of 0.98 and for Telugu dataset, we got highest accuracy for Random Forest algorithm with f1-score of 0.99. By using this model, Stress Identification System will be helpful for an individual to improve their mental health in optimistic manner.
2023
pdf
bib
abs
KEC_AI_NLP@DravidianLangTech: Abusive Comment Detection in Tamil Language
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Shri Durga R
|
Srigha S
|
Sree Harene J S
|
Yasvanth Bala P
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Our work aims to identify the negative comments that is associated with Counter-speech,Xenophobia, Homophobia,Transphobia, Misandry, Misogyny, None-of-the-above categories, In order to identify these categories from the given dataset, we propose three different models such as traditional machine learning techniques, deep learning model and transfer Learning model called BERT is also used to analyze the texts. In the Tamil dataset, we are training the models with Train dataset and test the models with Validation data. Our Team Participated in the shared task organised by DravidianLangTech and secured 4th rank in the task of abusive comment detection in Tamil with a macro- f1 score of 0.35. Also, our run was submitted for abusive comment detection in code-mixed languages (Tamil-English) and secured 6th rank with a macro-f1 score of 0.42.
pdf
bib
abs
KEC_AI_NLP@DravidianLangTech: Sentiment Analysis in Code Mixture Language
Kogilavani Shanmugavadivel
|
Malliga Subaramanian
|
VetriVendhan S
|
Pramoth Kumar M
|
Karthickeyan S
|
Kavin Vishnu N
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Sentiment Analysis is a process that involves analyzing digital text to determine the emo- tional tone, such as positive, negative, neu- tral, or unknown. Sentiment Analysis of code- mixed languages presents challenges in natural language processing due to the complexity of code-mixed data, which combines vocabulary and grammar from multiple languages and cre- ates unique structures. The scarcity of anno- tated data and the unstructured nature of code- mixed data are major challenges. To address these challenges, we explored various tech- niques, including Machine Learning models such as Decision Trees, Random Forests, Lo- gistic Regression, and Gaussian Na ̈ıve Bayes, Deep Learning model, such as Long Short- Term Memory (LSTM), and Transfer Learning model like BERT, were also utilized. In this work, we obtained the dataset from the Dravid- ianLangTech shared task by participating in a competition and accessing train, development and test data for Tamil Language. The results demonstrated promising performance in senti- ment analysis of code-mixed text. Among all the models, deep learning model LSTM pro- vides best accuracy of 0.61 for Tamil language.
pdf
bib
abs
Team-KEC@LT-EDI: Detecting Signs of Depression from Social Media Text
Malliga S
|
Kogilavani Shanmugavadivel
|
Arunaa S
|
Gokulkrishna R
|
Chandramukhii A
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
The rise of social media has led to a drastic surge in the dissemination of hostile and toxic content, fostering an alarming proliferation of hate speech, inflammatory remarks, and abusive language. The exponential growth of social media has facilitated the widespread circulation of hostile and toxic content, giving rise to an unprecedented influx of hate speech, incendiary language, and abusive rhetoric. The study utilized different techniques to represent the text data in a numerical format. Word embedding techniques aim to capture the semantic and syntactic information of the text data, which is essential in text classification tasks. The study utilized various techniques such as CNN, BERT, and N-gram to classify social media posts into depression and non-depression categories. Text classification tasks often rely on deep learning techniques such as Convolutional Neural Networks (CNN), while the BERT model, which is pre-trained, has shown exceptional performance in a range of natural language processing tasks. To assess the effectiveness of the suggested approaches, the research employed multiple metrics, including accuracy, precision, recall, and F1-score. The outcomes of the investigation indicate that the suggested techniques can identify symptoms of depression with an average accuracy rate of 56%.
pdf
bib
abs
KEC_AI_NLP_DEP @ LT-EDI : Detecting Signs of Depression From Social Media Texts
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Vasantharan K
|
Prethish Ga
|
Sankar S
|
Sabari S
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
The goal of this study is to use machine learning approaches to detect depression indications in social media articles. Data gathering, pre-processing, feature extraction, model training, and performance evaluation are all aspects of the research. The collection consists of social media messages classified into three categories: not depressed, somewhat depressed, and severely depressed. The study contributes to the growing field of social media data-driven mental health analysis by stressing the use of feature extraction algorithms for obtaining relevant information from text data. The use of social media communications to detect depression has the potential to increase early intervention and help for people at risk. Several feature extraction approaches, such as TF-IDF, Count Vectorizer, and Hashing Vectorizer, are used to quantitatively represent textual data. These features are used to train and evaluate a wide range of machine learning models, including Logistic Regression, Random Forest, Decision Tree, Gaussian Naive Bayes, and Multinomial Naive Bayes. To assess the performance of the models, metrics such as accuracy, precision, recall, F1 score, and the confusion matrix are utilized. The Random Forest model with Count Vectorizer had the greatest accuracy on the development dataset, coming in at 92.99 percent. And with a macro F1-score of 0.362, we came in 19th position in the shared task. The findings show that machine learning is effective in detecting depression markers in social media articles.
2022
pdf
bib
abs
Findings of the Shared Task on Emotion Analysis in Tamil
Anbukkarasi Sampath
|
Thenmozhi Durairaj
|
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Subalalitha Cn
|
Kogilavani Shanmugavadivel
|
Sajeetha Thavareesan
|
Sathiyaraj Thangasamy
|
Parameswari Krishnamurthy
|
Adeep Hande
|
Sean Benhur
|
Kishore Ponnusamy
|
Santhiya Pandiyan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.
pdf
bib
abs
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Subalalitha Cn
|
Sangeetha S
|
Malliga Subramanian
|
Kogilavani Shanmugavadivel
|
Parameswari Krishnamurthy
|
Adeep Hande
|
Siddhanth U Hegde
|
Roshan Nayak
|
Swetha Valli
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.
pdf
bib
abs
Overview of Abusive Comment Detection in Tamil-ACL 2022
Ruba Priyadharshini
|
Bharathi Raja Chakravarthi
|
Subalalitha Cn
|
Thenmozhi Durairaj
|
Malliga Subramanian
|
Kogilavani Shanmugavadivel
|
Siddhanth U Hegde
|
Prasanna Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.
pdf
bib
abs
Transformers at SemEval-2022 Task 5: A Feature Extraction based Approach for Misogynous Meme Detection
Shankar Mahadevan
|
Sean Benhur
|
Roshan Nayak
|
Malliga Subramanian
|
Kogilavani Shanmugavadivel
|
Kanchana Sivanraju
|
Bharathi Raja Chakravarthi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Social media is an idea created to make theworld smaller and more connected. Recently,it has become a hub of fake news and sexistmemes that target women. Social Media shouldensure proper women’s safety and equality. Filteringsuch information from social media is ofparamount importance to achieving this goal. In this paper, we describe the system developedby our team for SemEval-2022 Task 5: MultimediaAutomatic Misogyny Identification. Wepropose a multimodal training methodologythat achieves good performance on both thesubtasks, ranking 4th for Subtask A (0.718macro F1-score) and 9th for Subtask B (0.695macro F1-score) while exceeding the baselineresults by good margins.