Saranya Rajiakodi - ACL Anthology

Saranya Rajiakodi

2025

Findings of the Shared Task Multilingual Bias and Propaganda Annotation in Political Discourse
Shunmuga Priya Muthusamy Chinnan | Bharathi Raja Chakravarthi | Meghann Drury-Grogan | Senthil Kumar B | Saranya Rajiakodi | Angel Deborah S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The Multilingual Bias and Propaganda Annotation task focuses on annotating biased and propagandist content in political discourse across English and Tamil. This paper presents the findings of the shared task on bias and propaganda annotation task. This task involves two sub tasks, one in English and another in Tamil, both of which are annotation task where a text comment is to be labeled. With a particular emphasis on polarizing policy debates such as the US Gender Policy and India’s Three Language Policy, this shared task invites participants to build annotation systems capable of labeling textual bias and propaganda. The dataset was curated by collecting comments from YouTube videos. Our curated dataset consists of 13,010 English sentences on US Gender Policy, Russia-Ukraine War and 5,880 Tamil sentences on Three Language Policy. Participants were instructed to annotate following the guidelines at sentence level with the bias labels that are fine-grained, domain specific and 4 propaganda labels. Participants were encouraged to leverage existing tools or develop novel approaches to perform fine-grained annotations that capture the complex socio-political nuances present in the data.

CUTN_Bio at BioLaySumm: Multi-Task Prompt Tuning with External Knowledge and Readability adaptation for Layman Summarization
Bhuvaneswari Sivagnanam | Rivo Krishnu C H | Princi Chauhan | Saranya Rajiakodi
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)

Findings of the Shared Task on Misogyny Meme Detection: DravidianLangTech@NAACL 2025
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Saranya Rajiakodi | Shunmuga Priya Muthusamy Chinnan | Paul Buitelaar | Bhuvaneswari Sivagnanam | Anshid Kizhakkeparambil
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rapid expansion of social media has facilitated communication but also enabled the spread of misogynistic memes, reinforcing gender stereotypes and toxic online environments. Detecting such content is challenging due to the multimodal nature of memes, where meaning emerges from the interplay of text and images. The Misogyny Meme Detection shared task at DravidianLangTech@NAACL 2025 focused on Tamil and Malayalam, encouraging the development of multimodal approaches. With 114 teams registered and 23 submitting predictions, participants leveraged various pretrained language models and vision models through fusion techniques. The best models achieved high macro F1 scores (0.83682 for Tamil, 0.87631 for Malayalam), highlighting the effectiveness of multimodal learning. Despite these advances, challenges such as bias in the data set, class imbalance, and cultural variations persist. Future research should refine multimodal detection methods to improve accuracy and adaptability, fostering safer and more inclusive online spaces.

Findings of the Shared Task on Abusive Tamil and Malayalam Text Targeting Women on Social Media: DravidianLangTech@NAACL 2025
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Shunmuga Priya Muthusamy Chinnan | Ruba Priyadharshini | Rajameenakshi J | Kathiravan P | Rahul Ponnusamy | Bhuvaneswari Sivagnanam | Paul Buitelaar | Bhavanimeena K | Jananayagam V | Kishore Kumar Ponnusamy
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This overview paper presents the findings of the Shared Task on Abusive Tamil and Malayalam Text Targeting Women on Social Media, organized as part of DravidianLangTech@NAACL 2025. The task aimed to encourage the development of robust systems to detectabusive content targeting women in Tamil and Malayalam, two low-resource Dravidian languages. Participants were provided with annotated datasets containing abusive and nonabusive text curated from YouTube comments. We present an overview of the approaches and analyse the results of the shared task submissions. We believe the findings presented in this paper will be useful to researchers working in Dravidian language technology.

Overview of the Shared Task on Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data
Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Shanu Dhawale | Saranya Rajiakodi | Sajeetha Thavareesan | Subalalitha Chinnaudayar Navaneethakrishnan | Thenmozhi Durairaj
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The widespread use of social media has made it easier for false information to proliferate, particularly racially motivated hoaxes that can encourage violence and hatred. Such content is frequently shared in code-mixed languages in multilingual nations like India, which presents special difficulties for automated detection systems because of the casual language, erratic grammar, and rich cultural background. The shared task on detecting racial hoaxes in code mixed social media data aims to identify the racial hoaxes in Hindi-English data. It is a binary classification task with more than 5,000 labeled instances. A total of 11 teams participated in the task, and the results are evaluated using the macro-F1 score. The team that employed XLM-RoBERTa secured the first position in the task.

Overview of the Shared Task on Multimodal Hate Speech Detection in Dravidian languages: DravidianLangTech@NAACL 2025
Jyothish Lal G | Premjith B | Bharathi Raja Chakravarthi | Saranya Rajiakodi | Bharathi B | Rajeswari Natarajan | Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The detection of hate speech in social media platforms is very crucial these days. This is due to its adverse impact on mental health, social harmony, and online safety. This paper presents the overview of the shared task on Multimodal Hate Speech Detection in Dravidian Languages organized as part of DravidianLangTech@NAACL 2025. The task emphasizes detecting hate speech in social media content that combines speech and text. Here, we focus on three low-resource Dravidian languages: Malayalam, Tamil, and Telugu. Participants were required to classify hate speech in three sub-tasks, each corresponding to one of these languages. The dataset was curated by collecting speech and corresponding text from YouTube videos. Various machine learning and deep learning-based models, including transformer-based architectures and multimodal frameworks, were employed by the participants. The submissions were evaluated using the macro F1 score. Experimental results underline the potential of multimodal approaches in advancing hate speech detection for low-resource languages. Team SSNTrio achieved the highest F1 score in Malayalam and Tamil of 0.7511 and 0.7332, respectively. Team lowes scored the best F1 score of 0.3817 in the Telugu sub-task.

An Overview of the Misogyny Meme Detection Shared Task for Chinese Social Media
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Ping Du | Xiaojian Zhuang | Saranya Rajiakodi | Paul Buitelaar | Premjith B | Bhuvaneswari Sivagnanam | Anshid Kizhakkeparambil | Lavanya S.K.
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The increasing prevalence of misogynistic content in online memes has raised concerns about their impact on digital discourse. The culture specific images and informal usage of text in the memes present considerable challenges for the automatic detection systems, especially in low-resource languages. While previous shared tasks have addressed misogyny detection in English and several European languages, misogynistic meme detection in the Chinese has remained largely unexplored. To address this gap, we introduced a shared task focused on binary classification of Chinese language memes as misogynistic or non-misogynistic. The task featured memes collected from the Chinese social media and annotated by native speakers. A total of 45 teams registered, with 8 teams submitting predictions from their multimodal models integrating textual and visual features through diverse fusion strategies. The best-performing system achieved a macro F1-score of 0.93035, highlighting the effectiveness of lightweight pretrained encoder fusion. This system used the Chinese BERT and DenseNet-121 for text and image feature extraction, respectively. A feedforward network was trained as a classifier using the features obtained by concatenating text and image features.

Findings of the Shared Task Caste and Migration Hate Speech Detection
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Rahul Ponnusamy | Shunmuga Priya Muthusamy Chinnan | Prasanna Kumar Kumaresan | Sathiyaraj Thangasamy | Bhuvaneswari Sivagnanam | Balasubramanian Palani | Kogilavani Shanmugavadivel | Abirami Murugappan | Charmathi Rajkumar
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Hate speech targeting caste and migration communities is a growing concern in online platforms, particularly in linguistically diverse regions. By focusing on Tamil language text content, this task provides a unique opportunity to tackle caste or migration related hate speech detection in a low resource language Tamil, contributing to a safer digital space. We present the results and main findings of the shared task caste and migration hate speech detection. The task is a binary classification determining whether a text is caste/migration related hate speech or not. The task attracted 17 participating teams, experimenting with a wide range of methodologies from traditional machine learning to advanced multilingual transformers. The top performing system achieved a macro F1-score of 0.88105, enhancing an ensemble of fine-tuned transformer models including XLM-R and MuRIL. Our analysis highlights the effectiveness of multilingual transformers in low resource, ensemble learning, and culturally informed socio political context based techniques.

Overview on Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments: DravidianLangTech@NAACL 2025
Bharathi Raja Chakravarthi | Saranya Rajiakodi | Thenmozhi Durairaj | Sathiyaraj Thangasamy | Ratnasingam Sakuntharaj | Prasanna Kumar Kumaresan | Kishore Kumar Ponnusamy | Arunaggiri Pandian Karunanidhi | Rohan R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Political multiclass detection is the task of identifying the predefined seven political classes. In this paper, we report an overview of the findings on the “Political Multiclass Sentiment Analysis of Tamil X(Twitter) Comments” shared task conducted at the workshop on DravidianLangTech@NAACL 2025. The participants were provided with annotated Twitter comments, which are split into training, development, and unlabelled test datasets. A total of 139 participants registered for this shared task, and 25 teams finally submitted their results. The performance of the submitted systems was evaluated and ranked in terms of the macro-F1 score.

Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Anand Kumar Madasamy | Sajeetha Thavareesan | Elizabeth Sherly | Saranya Rajiakodi | Balasubramanian Palani | Malliga Subramanian | Subalalitha Cn | Dhivya Chinnappa
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

2024

From Laughter to Inequality: Annotated Dataset for Misogyny Detection in Tamil and Malayalam Memes
Rahul Ponnusamy | Kathiravan Pannerselvam | Saranya Rajiakodi | Prasanna Kumar Kumaresan | Sajeetha Thavareesan | Bhuvaneswari Sivagnanam | Anshid K.A | Susminu S Kumar | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this digital era, memes have become a prevalent online expression, humor, sarcasm, and social commentary. However, beneath their surface lies concerning issues such as the propagation of misogyny, gender-based bias, and harmful stereotypes. To overcome these issues, we introduced MDMD (Misogyny Detection Meme Dataset) in this paper. This article focuses on creating an annotated dataset with detailed annotation guidelines to delve into online misogyny within the Tamil and Malayalam-speaking communities. Through analyzing memes, we uncover the intricate world of gender bias and stereotypes in these communities, shedding light on their manifestations and impact. This dataset, along with its comprehensive annotation guidelines, is a valuable resource for understanding the prevalence, origins, and manifestations of misogyny in various contexts, aiding researchers, policymakers, and organizations in developing effective strategies to combat gender-based discrimination and promote equality and inclusivity. It enables a deeper understanding of the issue and provides insights that can inform strategies for cultivating a more equitable and secure online environment. This work represents a crucial step in raising awareness and addressing gender-based discrimination in the digital space.

Findings of the Shared Task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu)@DravidianLangTech 2024
Premjith B | Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Saranya Rajiakodi | Sai Prashanth Karnati | Sai Rishith Reddy Mangamuru | Chandu Janakiram
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper examines the submissions of various participating teams to the task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu) organized as part of DravidianLangTech 2024. In order to identify the contents containing harmful information in Telugu codemixed social media text, the shared task pushes researchers and academicians to build models. The dataset for the task was created by gathering YouTube comments and annotated manually. A total of 23 teams participated and submitted their results to the shared task. The rank list was created by assessing the submitted results using the macro F1-score.

Overview of Shared Task on Multitask Meme Classification - Unraveling Misogynistic and Trolls in Online Memes
Bharathi Raja Chakravarthi | Saranya Rajiakodi | Rahul Ponnusamy | Kathiravan Pannerselvam | Anand Kumar Madasamy | Ramachandran Rajalakshmi | Hariharan LekshmiAmmal | Anshid Kizhakkeparambil | Susminu S Kumar | Bhuvaneswari Sivagnanam | Charmathi Rajkumar
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

This paper offers a detailed overview of the first shared task on “Multitask Meme Classification - Unraveling Misogynistic and Trolls in Online Memes,” organized as part of the LT-EDI@EACL 2024 conference. The task was set to classify misogynistic content and troll memes within online platforms, focusing specifically on memes in Tamil and Malayalam languages. A total of 52 teams registered for the competition, with four submitting systems for the Tamil meme classification task and three for the Malayalam task. The outcomes of this shared task are significant, providing insights into the current state of misogynistic content in digital memes and highlighting the effectiveness of various computational approaches in identifying such detrimental content. The top-performing model got a macro F1 score of 0.73 in Tamil and 0.87 in Malayalam.

Overview of Shared Task on Caste and Migration Hate Speech Detection
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Sathiyaraj Thangasamy | Bhuvaneswari Sivagnanam | Charmathi Rajkumar
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

We present an overview of the first shared task on “Caste and Migration Hate Speech Detection.” The shared task is organized as part of LTEDI@EACL 2024. The system must delineate between binary outcomes, ascertaining whether the text is categorized as a caste/migration hate speech or not. The dataset presented in this shared task is in Tamil, which is one of the under-resource languages. There are a total of 51 teams participated in this task. Among them, 15 teams submitted their research results for the task. To the best of our knowledge, this is the first time the shared task has been conducted on textual hate speech detection concerning caste and migration. In this study, we have conducted a systematic analysis and detailed presentation of all the contributions of the participants as well as the statistics of the dataset, which is the social media comments in Tamil language to detect hate speech. It also further goes into the details of a comprehensive analysis of the participants’ methodology and their findings.

SetFit: A Robust Approach for Offensive Content Detection in Tamil-English Code-Mixed Conversations Using Sentence Transfer Fine-tuning
Kathiravan Pannerselvam | Saranya Rajiakodi | Sajeetha Thavareesan | Sathiyaraj Thangasamy | Kishore Ponnusamy
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Code-mixed languages are increasingly prevalent on social media and online platforms, presenting significant challenges in offensive content detection for natural language processing (NLP) systems. Our study explores how effectively the Sentence Transfer Fine-tuning (Set-Fit) method, combined with logistic regression, detects offensive content in a Tamil-English code-mixed dataset. We compare our model’s performance with five other NLP models: Multilingual BERT (mBERT), LSTM, BERT, IndicBERT, and Language-agnostic BERT Sentence Embeddings (LaBSE). Our model, SetFit, outperforms these models in accuracy, achieving an impressive 89.72%, significantly higher than other models. These results suggest the sentence transformer model’s substantial potential for detecting offensive content in codemixed languages. Our study provides valuable insights into the sentence transformer model’s ability to identify various types of offensive material in Tamil-English online conversations, paving the way for more advanced NLP systems tailored to code-mixed languages.

Findings of the Shared Task on Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL)@DravidianLangTech 2024
Premjith B | Jyothish G | Sowmya V | Bharathi Raja Chakravarthi | K Nandhini | Rajeswari Natarajan | Abirami Murugappan | Bharathi B | Saranya Rajiakodi | Rahul Ponnusamy | Jayanth Mohan | Mekapati Reddy
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper presents the findings of the shared task on multimodal sentiment analysis, abusive language detection and hate speech detection in Dravidian languages. Through this shared task, researchers worldwide can submit models for three crucial social media data analysis challenges in Dravidian languages: sentiment analysis, abusive language detection, and hate speech detection. The aim is to build models for deriving fine-grained sentiment analysis from multimodal data in Tamil and Malayalam, identifying abusive and hate content from multimodal data in Tamil. Three modalities make up the multimodal data: text, audio, and video. YouTube videos were gathered to create the datasets for the tasks. Thirty-nine teams took part in the competition. However, only two teams, though, turned in their findings. The macro F1-score was used to assess the submissions

This paper provides a comprehensive summary of the “Homophobia and Transphobia Detection in Social Media Comments” shared task, which was held at the LT-EDI@EACL 2024. The objective of this task was to develop systems capable of identifying instances of homophobia and transphobia within social media comments. This challenge was extended across ten languages: English, Tamil, Malayalam, Telugu, Kannada, Gujarati, Hindi, Marathi, Spanish, and Tulu. Each comment in the dataset was annotated into three categories. The shared task attracted significant interest, with over 60 teams participating through the CodaLab platform. The submission of prediction from the participants was evaluated with the macro F1 score.

2023

CSSCUTN@DravidianLangTech:Abusive comments Detection in Tamil and Telugu
Kathiravan Pannerselvam | Saranya Rajiakodi | Rahul Ponnusamy | Sajeetha Thavareesan
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Code-mixing is a word or phrase-level act of interchanging two or more languages during a conversation or in written text within a sentence. This phenomenon is widespread on social media platforms, and understanding the underlying abusive comments in a code-mixed sentence is a complex challenge. We present our system in our submission for the DravidianLangTech Shared Task on Abusive Comment Detection in Tamil and Telugu. Our approach involves building a multiclass abusive detection model that recognizes 8 different labels. The provided samples are code-mixed Tamil-English text, where Tamil is represented in romanised form. We focused on the Multiclass classification subtask, and we leveraged Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR). Our method exhibited its effectiveness in the shared task by earning the ninth rank out of all competing systems for the classification of abusive comments in the code-mixed text. Our proposed classifier achieves an impressive accuracy of 0.99 and an F1-score of 0.99 for a balanced dataset using TF-IDF with SVM. It can be used effectively to detect abusive comments in Tamil, English code-mixed text

Co-authors

Kathiravan Pannerselvam 5

Sajeetha Thavareesan 5

Shunmuga Priya Muthusamy Chinnan 4

Anshid Kizhakkeparambil 4

Sathiyaraj Thangasamy 4

Kishore Kumar Ponnusamy 3

Ruba Priyadharshini 3

Charmathi Rajkumar 3

Thenmozhi Durairaj 2

Susminu S Kumar 2

Anand Kumar M 2

Abirami Murugappan 2

Rajeswari Natarajan 2

Balasubramanian Palani 2

Senthil Kumar B 1

Rivo Krishnu C H 1

Princi Chauhan 1

Dhivya Chinnappa 1

Subalalitha Cn 1

Shanu Dhawale 1

Meghann Drury-Grogan 1

Jyothish Lal G 1

Daniel García-Baena 1

Miguel Ángel García-Cumbreras 1

José Antonio García-Díaz 1

Rajameenakshi J 1

Chandu Janakiram 1

Salud María Jiménez-Zafra 1

Bhavanimeena K 1

Sai Prashanth Karnati 1

Arunaggiri Pandian Karunanidhi 1

Nandhini Kumaresh 1

Hariharan LekshmiAmmal 1

Sai Rishith Reddy Mangamuru 1

Jayanth Mohan 1

Subalalitha Chinnaudayar Navaneethakrishnan 1

Kishore Ponnusamy 1

Ratnavel Rajalakshmi 1

Ramachandran Rajalakshmi 1

Mekapati Reddy 1

Angel Deborah S 1

Ratnasingam Sakuntharaj 1

Kogilavani Shanmugavadivel 1

Hosahalli Lakshmaiah Shashirekha 1

Elizabeth Sherly 1

Poorvi Shetty 1

Malliga Subramanian 1

Jananayagam V 1

Rafael Valencia-García 1

Xiaojian Zhuang 1

Venues