2025
pdf
bib
abs
Overview of the Shared Task on Sentiment Analysis in Tamil and Tulu
Durairaj Thenmozhi
|
Bharathi Raja Chakravarthi
|
Asha Hegde
|
Hosahalli Lakshmaiah Shashirekha
|
Rajeswari Natarajan
|
Sajeetha Thavareesan
|
Ratnasingam Sakuntharaj
|
Krishnakumari Kalyanasundaram
|
Charmathi Rajkumar
|
Poorvi Shetty
|
Harshitha S Kumar
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Sentiment analysis is an essential task for interpreting subjective opinions and emotions in textual data, with significant implications across commercial and societal applications. This paper provides an overview of the shared task on Sentiment Analysis in Tamil and Tulu, organized as part of DravidianLangTech@NAACL 2025. The task comprises two components: one addressing Tamil and the other focusing on Tulu, both designed as multi-class classification challenges, wherein the sentiment of a given text must be categorized as positive, negative, neutral and unknown. The dataset was diligently organized by aggregating user-generated content from social media platforms such as YouTube and Twitter, ensuring linguistic diversity and real-world applicability. Participants applied a variety of computational approaches, ranging from classical machine learning algorithms such as Traditional Machine Learning Models, Deep Learning Models, Pre-trained Language Models and other Feature Representation Techniques to tackle the challenges posed by linguistic code-mixing, orthographic variations, and resource scarcity in these low resource languages.
2024
pdf
bib
abs
WordWizards@DravidianLangTech 2024:Fake News Detection in Dravidian Languages using Cross-lingual Sentence Embeddings
Akshatha Anbalagan
|
Priyadharshini T
|
Niranjana A
|
Shreedevi Balaji
|
Durairaj Thenmozhi
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The proliferation of fake news in digital media has become a significant societal concern, impacting public opinion, trust, and decision-making. This project focuses on the development of machine learning models for the detection of fake news. Leveraging a dataset containing both genuine and deceptive news articles, the proposed models employ natural language processing techniques, feature extraction and classification algorithms. This paper provides a solution to Fake News Detection in Dravidian Languages - DravidianLangTech 2024. There are two sub tasks: Task 1 - The goal of this task is to classify a given social media text into original or fake. We propose an approach for this with the help of a supervised machine learning model – SVM (Support Vector Machine). The SVM classifier achieved a macro F1 score of 0.78 in test data and a rank 11. The Task 2 is classifying fake news articles in Malayalam language into different categories namely False, Half True, Mostly False, Partly False and Mostly True.We have used Naive Bayes which achieved macro F1-score 0.3517 in test data and a rank 6.
pdf
bib
abs
WordWizards@DravidianLangTech 2024: Sentiment Analysis in Tamil and Tulu using Sentence Embedding
Shreedevi Balaji
|
Akshatha Anbalagan
|
Priyadharshini T
|
Niranjana A
|
Durairaj Thenmozhi
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Sentiment Analysis of Dravidian Languages has begun to garner attention recently as there is more need to analyze emotional responses and subjective opinions present in social media text. As this data is code-mixed and there are not many solutions to code-mixed text out there, we present to you a stellar solution to DravidianLangTech 2024: Sentiment Analysis in Tamil and Tulu task. To understand the sentiment of social media text, we used pre-trained transformer models and feature extraction vectorizers to classify the data with results that placed us 11th in the rankings for the Tamil task and 8th for the Tulu task with a accuracy F1 score of 0.12 and 0.30 which shows the efficiency of our approach.
pdf
bib
abs
Quartet@LT-EDI 2024: A Support Vector Machine Approach For Caste and Migration Hate Speech Detection
Shaun H
|
Samyuktaa Sivakumar
|
Rohan R
|
Nikilesh Jayaguptha
|
Durairaj Thenmozhi
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Hate speech refers to the offensive remarks against a community or individual based on inherent characteristics. Hate speech against a community based on their caste and native are unfortunately prevalent in the society. Especially with social media platforms being a very popular tool for communication and sharing ideas, people post hate speech against caste or migrants on social medias. The Shared Task LT–EDI 2024: Caste and Migration Hate Speech Detection was created with the objective to create an automatic classification system that detects and classifies hate speech posted on social media targeting a community belonging to a particular caste and migrants. Datasets in Tamil language were provided along with the shared task. We experimented with several traditional models such as Naive Bayes, Support Vector Machine (SVM), Logistic Regression, Random Forest Classifier and Decision Tree Classifier out of which Support Vector Machine yielded the best results placing us 8th in the rank list released by the organizers.
pdf
bib
abs
Quartet@LT-EDI 2024: A SVM-ResNet50 Approach For Multitask Meme Classification - Unraveling Misogynistic and Trolls in Online Memes
Shaun H
|
Samyuktaa Sivakumar
|
Rohan R
|
Nikilesh Jayaguptha
|
Durairaj Thenmozhi
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Meme is a very popular term prevailing among almost all social media platforms in recent days. A meme can be a combination of text and image whose sole purpose is meant to be funny and entertain people. Memes can sometimes promote misogynistic content expressing hatred, contempt, or prejudice against women. The Shared Task LT–EDI 2024: Multitask Meme Classification: Unraveling Misogynistic and Trolls in Online Memes Task 1 was created with the purpose to classify social media memes as “misogynistic” and “Non - Misogynistic”. The task encompassed Tamil and Malayalam datasets. We separately classified the textual data using Multinomial Naive Bayes and pictorial data using ResNet50 model. The results of from both data were combined to yield an overall result. We were ranked 2nd for both languages in this task.
pdf
bib
abs
Quartet@LT-EDI 2024: Support Vector Machine Based Approach For Homophobia/Transphobia Detection In Social Media Comments
Shaun H
|
Samyuktaa Sivakumar
|
Rohan R
|
Nikilesh Jayaguptha
|
Durairaj Thenmozhi
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Homophobia and transphobia are terms which are used to describe the fear or hatred towards people who are attracted to the same sex or people whose psychological gender differs from his biological sex. People use social media to exert this behaviour. The increased amount of abusive content negatively affects people in a lot of ways. It makes the environment toxic and unpleasant to LGBTQ+ people. The paper talks about the classification model for classifying the contents into 3 categories which are homophobic, transphobic and nonhomophobic/ transphobic. We used many traditional models like Support Vector Machine, Random Classifier, Logistic Regression and KNearest Neighbour to achieve this. The macro average F1 scores for Malayalam, Telugu, English, Marathi, Kannada, Tamil, Gujarati, Hindi are 0.88, 0.94, 0.96, 0.78, 0.93, 0.77, 0.94, 0.47 and the rank for these languages are 5, 6, 9, 6, 8, 6, 6, 4.
2023
pdf
bib
abs
Findings of the Shared Task on Sentiment Analysis in Tamil and Tulu Code-Mixed Text
Asha Hegde
|
Bharathi Raja Chakravarthi
|
Hosahalli Lakshmaiah Shashirekha
|
Rahul Ponnusamy
|
Subalalitha Chinnaudayar Navaneethakrishnan
|
Lavanya Sambath Kumar
|
Durairaj Thenmozhi
|
Martha Karunakar
|
Shreya Sriram
|
Sarah Aymen
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
In recent years, there has been a growing focus on Sentiment Analysis (SA) of code-mixed Dravidian languages. However, the majority of social media text in these languages is code-mixed, presenting a unique challenge. Despite this, there is currently lack of research on SA specifically tailored for code-mixed Dravidian languages, highlighting the need for further exploration and development in this domain. In this view, “Sentiment Analysis in Tamil and Tulu- DravidianLangTech” shared task at Recent Advances in Natural Language Processing (RANLP)- 2023 is organized. This shred consists two language tracks: code-mixed Tamil and Tulu and Tulu text is first ever explored in public domain for SA. We describe the task, its organization, and the submitted systems followed by the results. 57 research teams registered for the shared task and We received 27 systems each for code-mixed Tamil and Tulu texts. The performance of the systems (developed by participants) has been evaluated in terms of macro average F1 score. The top system for code-mixed Tamil and Tulu texts scored macro average F1 score of 0.32, and 0.542 respectively. The high quality and substantial quantity of submissions demonstrate a significant interest and attention in the analysis of code-mixed Dravidian languages. However, the current state of the art in this domain indicates the need for further advancements and improvements to effectively address the challenges posed by code-mixed Dravidian language SA.
pdf
bib
abs
Overview of the shared task on Detecting Signs of Depression from Social Media Text
Kayalvizhi Sampath
|
Durairaj Thenmozhi
|
Bharathi Raja Chakravarthi
|
Jerin Mahibha C
|
Kogilavani Shanmugavadivel
|
Pratik Anil Rahood
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Social media has become a vital platform for personal communication. Its widespread use as a primary means of public communication offers an exciting opportunity for early detection and management of mental health issues. People often share their emotions on social media, but understanding the true depth of their feelings can be challenging. Depression, a prevalent problem among young people, is of particular concern due to its link with rising suicide rates. Identifying depression levels in social media texts is crucial for timely support and prevention of negative outcomes. However, it’s a complex task because human emotions are dynamic and can change significantly over time. The DepSign-LT-EDI@RANLP 2023 shared task aims to classify social media text into three depression levels: “Not Depressed,” “Moderately Depressed,” and “Severely Depressed.” This overview covers task details, dataset, methodologies used, and results analysis. Roberta-based models emerged as top performers, with the best result achieving an impressive macro F1-score of 0.584 among 31 participating teams.
pdf
bib
abs
Brainstormers_msec at SemEval-2023 Task 10: Detection of sexism related comments in social media using deep learning
C. Jerin Mahibha
|
C. M Swaathi
|
R. Jeevitha
|
R. Princy Martina
|
Durairaj Thenmozhi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Social media is the media through which people share their thoughts and opinions. This has both its pros and cons which depends on the type of information being conveyed. If any information conveyed over social media hurts or affects a person, such information can be removed as it may disturb their mental health and may decrease their self confidence. During the last decade, hateful and sexist content towards women in being increasingly spread on social networks. The exposure to sexist speech has serious consequences to women’s life and limits their freedom of speech. Sexism is expressed in very different forms: it includes subtle stereotypes and attitudes that, although frequently unnoticed, are extremely harmful for both women and society. Sexist comments have a major impact on women being subjected to it. We as a team participated in the shared task Explainable Detection of Online Sexism (EDOS) at SemEval 2023 and have proposed a model which identifies the sexist comments and its type from English social media posts using the data set shared for the task. Different transformer model like BERT , DistilBERT and RoBERT are used by the proposed model for implementing all the three tasks shared by EDOS. On using the BERT model, macro F1 score of 0.8073, 0.5876 and 0.3729 are achieved for Task A, Task B and Task C respectively.
2022
pdf
bib
abs
Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Durairaj Thenmozhi
|
John Philip McCrae
|
Paul Buitelaar
|
Rahul Ponnusamy
|
Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score.