Subalalitha Chinnaudayar Navaneethakrishnan

Also published as: Subalalitha Chinnaudayar Navaneethakrishnan

2025

Overview of the Shared Task on Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data
Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Shanu Dhawale | Saranya Rajiakodi | Sajeetha Thavareesan | Subalalitha Chinnaudayar Navaneethakrishnan | Thenmozhi Durairaj
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The widespread use of social media has made it easier for false information to proliferate, particularly racially motivated hoaxes that can encourage violence and hatred. Such content is frequently shared in code-mixed languages in multilingual nations like India, which presents special difficulties for automated detection systems because of the casual language, erratic grammar, and rich cultural background. The shared task on detecting racial hoaxes in code mixed social media data aims to identify the racial hoaxes in Hindi-English data. It is a binary classification task with more than 5,000 labeled instances. A total of 11 teams participated in the task, and the results are evaluated using the macro-F1 score. The team that employed XLM-RoBERTa secured the first position in the task.

2023

pdf bib abs

In recent years, there has been a growing focus on Sentiment Analysis (SA) of code-mixed Dravidian languages. However, the majority of social media text in these languages is code-mixed, presenting a unique challenge. Despite this, there is currently lack of research on SA specifically tailored for code-mixed Dravidian languages, highlighting the need for further exploration and development in this domain. In this view, “Sentiment Analysis in Tamil and Tulu- DravidianLangTech” shared task at Recent Advances in Natural Language Processing (RANLP)- 2023 is organized. This shred consists two language tracks: code-mixed Tamil and Tulu and Tulu text is first ever explored in public domain for SA. We describe the task, its organization, and the submitted systems followed by the results. 57 research teams registered for the shared task and We received 27 systems each for code-mixed Tamil and Tulu texts. The performance of the systems (developed by participants) has been evaluated in terms of macro average F1 score. The top system for code-mixed Tamil and Tulu texts scored macro average F1 score of 0.32, and 0.542 respectively. The high quality and substantial quantity of submissions demonstrate a significant interest and attention in the analysis of code-mixed Dravidian languages. However, the current state of the art in this domain indicates the need for further advancements and improvements to effectively address the challenges posed by code-mixed Dravidian language SA.

pdf bib abs

This paper discusses the submissions to the shared task on abusive comment detection in Tamil and Telugu codemixed social media text conducted as part of the third Workshop on Speech and Language Technologies for Dravidian Languages at RANLP 20239. The task encourages researchers to develop models to detect the contents containing abusive information in Tamil and Telugu codemixed social media text. The task has three subtasks - abusive comment detection in Tamil, Tamil-English and Telugu-English. The dataset for all the tasks was developed by collecting comments from YouTube. The submitted models were evaluated using macro F1-score, and prepared the rank list accordingly.

pdf bib abs

VEL@LT-EDI: Detecting Homophobia and Transphobia in Code-Mixed Spanish Social Media Comments
Prasanna Kumar Kumaresan | Kishore Kumar Ponnusamy | Kogilavani Shanmugavadivel | Subalalitha Chinnaudayar Navaneethakrishnan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Our research aims to address the task of detecting homophobia and transphobia in social media code-mixed comments written in Spanish. Code-mixed text in social media often violates strict grammar rules and incorporates non-native scripts, posing challenges for identification. To tackle this problem, we perform pre-processing by removing unnecessary content and establishing a baseline for detecting homophobia and transphobia. Furthermore, we explore the effectiveness of various traditional machine-learning models with feature extraction and pre-trained transformer model techniques. Our best configurations achieve macro F1 scores of 0.84 on the test set and 0.82 on the development set for Spanish, demonstrating promising results in detecting instances of homophobia and transphobia in code-mixed comments.

2022

pdf bib abs

Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences. Hope speech is any message or content that is positive, encouraging, reassuring, inclusive and supportive that inspires and engenders optimism in the minds of people. In contrast to identifying and censoring negative speech patterns, hope speech detection is focussed on recognising and promoting positive speech patterns online. In this paper, we report an overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022) organised as a part of ACL 2022. The participants were provided with annotated training & development datasets and unlabelled test datasets in all the five languages. The goal of the shared task is to classify the given sentences into one of the two hope speech classes. The performances of the systems submitted by the participants were evaluated in terms of micro-F1 score and weighted-F1 score. The datasets for this challenge are openly available

pdf bib abs

Overview of Abusive Comment Detection in Tamil-ACL 2022
Ruba Priyadharshini | Bharathi Raja Chakravarthi | Subalalitha Chinnaudayar Navaneethakrishnan | Thenmozhi Durairaj | Malliga Subramanian | Kogilavani Shanmugavadivel | Siddhanth U Hegde | Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.

2021

pdf bib abs

Hypers at ComMA@ICON: Modelling Aggressive, Gender Bias and Communal Bias Identification
Sean Benhur | Roshan Nayak | Kanchana Sivanraju | Adeep Hande | Subalalitha Chinnaudayar Navaneethakrishnan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification

Due to the exponential increasing reach of social media, it is essential to focus on its negative aspects as it can potentially divide society and incite people into violence. In this paper, we present our system description of work on the shared task ComMA@ICON, where we have to classify how aggressive the sentence is and if the sentence is gender-biased or communal biased. These three could be the primary reasons to cause significant problems in society. Our approach utilizes different pretrained models with Attention and mean pooling methods. We were able to get Rank 1 with 0.253 Instance F1 score on Bengali, Rank 2 with 0.323 Instance F1 score on multilingual set, Rank 4 with 0.129 Instance F1 score on meitei and Rank 5 with 0.336 Instance F1 score on Hindi. The source code and the pretrained models of this work can be found here.