Prasanna Kumar Kumaresan - ACL Anthology

Prasanna Kumar Kumaresan

2025

Overview on Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments: DravidianLangTech@NAACL 2025
Bharathi Raja Chakravarthi | Saranya Rajiakodi | Thenmozhi Durairaj | Sathiyaraj Thangasamy | Ratnasingam Sakuntharaj | Prasanna Kumar Kumaresan | Kishore Kumar Ponnusamy | Arunaggiri Pandian Karunanidhi | Rohan R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Political multiclass detection is the task of identifying the predefined seven political classes. In this paper, we report an overview of the findings on the “Political Multiclass Sentiment Analysis of Tamil X(Twitter) Comments” shared task conducted at the workshop on DravidianLangTech@NAACL 2025. The participants were provided with annotated Twitter comments, which are split into training, development, and unlabelled test datasets. A total of 139 participants registered for this shared task, and 25 teams finally submitted their results. The performance of the submitted systems was evaluated and ranked in terms of the macro-F1 score.

Overview of the Shared Task on Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data
Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Shanu Dhawale | Saranya Rajiakodi | Sajeetha Thavareesan | Subalalitha Chinnaudayar Navaneethakrishnan | Thenmozhi Durairaj
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The widespread use of social media has made it easier for false information to proliferate, particularly racially motivated hoaxes that can encourage violence and hatred. Such content is frequently shared in code-mixed languages in multilingual nations like India, which presents special difficulties for automated detection systems because of the casual language, erratic grammar, and rich cultural background. The shared task on detecting racial hoaxes in code mixed social media data aims to identify the racial hoaxes in Hindi-English data. It is a binary classification task with more than 5,000 labeled instances. A total of 11 teams participated in the task, and the results are evaluated using the macro-F1 score. The team that employed XLM-RoBERTa secured the first position in the task.

Findings of the Shared Task Caste and Migration Hate Speech Detection
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Rahul Ponnusamy | Shunmuga Priya Muthusamy Chinnan | Prasanna Kumar Kumaresan | Sathiyaraj Thangasamy | Bhuvaneswari Sivagnanam | Balasubramanian Palani | Kogilavani Shanmugavadivel | Abirami Murugappan | Charmathi Rajkumar
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Hate speech targeting caste and migration communities is a growing concern in online platforms, particularly in linguistically diverse regions. By focusing on Tamil language text content, this task provides a unique opportunity to tackle caste or migration related hate speech detection in a low resource language Tamil, contributing to a safer digital space. We present the results and main findings of the shared task caste and migration hate speech detection. The task is a binary classification determining whether a text is caste/migration related hate speech or not. The task attracted 17 participating teams, experimenting with a wide range of methodologies from traditional machine learning to advanced multilingual transformers. The top performing system achieved a macro F1-score of 0.88105, enhancing an ensemble of fine-tuned transformer models including XLM-R and MuRIL. Our analysis highlights the effectiveness of multilingual transformers in low resource, ensemble learning, and culturally informed socio political context based techniques.

Overview of the Shared Task on Detecting AI Generated Product Reviews in Dravidian Languages: DravidianLangTech@NAACL 2025
Premjith B | Nandhini Kumaresh | Bharathi Raja Chakravarthi | Thenmozhi Durairaj | Balasubramanian Palani | Sajeetha Thavareesan | Prasanna Kumar Kumaresan
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The detection of AI-generated product reviews is critical due to the increased use of large language models (LLMs) and their capability to generate convincing sentences. The AI-generated reviews can affect the consumers and businesses as they influence the trust and decision-making. This paper presents the overview of the shared task on Detecting AI-generated product reviews in Dravidian Languages” organized as part of DravidianLangTech@NAACL 2025. This task involves two subtasks—one in Malayalam and another in Tamil, both of which are binary classifications where a review is to be classified as human-generated or AI-generated. The dataset was curated by collecting comments from YouTube videos. Various machine learning and deep learning-based models ranging from SVM to transformer-based architectures were employed by the participants.

Overview of Homophobia and Transphobia Span Detection in Social Media Comments
Prasanna Kumar Kumaresan | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Paul Buitelaar | Malliga Subramanian | Kishore Kumar Ponnusamy
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The rise and the intensity of harassment and hate speech in social media platforms against LGBTQ+ communities is a growing concern. This work is an initiative to address this problem by conducting a shared task focused on the detection of homophobic and transphobic content in multilingual settings. The task comprises two subtasks: (1) multi-class classification of content into Homophobia, Transphobia, or Non-anti-LGBT+ categories across eight languages and (2) span-level detection to identify specific toxic segments within comments in English, Tamil, and Marathi. This initiative helps the development of explainable and socially re- sponsible AI tools for combating identity-based harm in digital spaces. Multiple teams registered for the task, however only two teams submitted their results, and the results were evaluated using the macro F1 score.

2024

Dataset for Identification of Homophobia and Transphobia for Telugu, Kannada, and Gujarati
Prasanna Kumar Kumaresan | Rahul Ponnusamy | Dhruv Sharma | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Users of social media platforms are negatively affected by the proliferation of hate or abusive content. There has been a rise in homophobic and transphobic content in recent years targeting LGBT+ individuals. The increasing levels of homophobia and transphobia online can make online platforms harmful and threatening for LGBT+ persons, potentially inhibiting equality, diversity, and inclusion. We are introducing a new dataset for three languages, namely Telugu, Kannada, and Gujarati. Additionally, we have created an expert-labeled dataset to automatically identify homophobic and transphobic content within comments collected from YouTube. We provided comprehensive annotation rules to educate annotators in this process. We collected approximately 10,000 comments from YouTube for all three languages. Marking the first dataset of these languages for this task, we also developed a baseline model with pre-trained transformers.

Overview of Second Shared Task on Sentiment Analysis in Code-mixed Tamil and Tulu
Lavanya Sambath Kumar | Asha Hegde | Bharathi Raja Chakravarthi | Hosahalli Shashirekha | Rajeswari Natarajan | Sajeetha Thavareesan | Ratnasingam Sakuntharaj | Thenmozhi Durairaj | Prasanna Kumar Kumaresan | Charmathi Rajkumar
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment Analysis (SA) in Dravidian codemixed text is a hot research area right now. In this regard, the “Second Shared Task on SA in Code-mixed Tamil and Tulu” at Dravidian- LangTech (EACL-2024) is organized. Two tasks namely SA in Tamil-English and Tulu- English code-mixed data, make up this shared assignment. In total, 64 teams registered for the shared task, out of which 19 and 17 systems were received for Tamil and Tulu, respectively. The performance of the systems submitted by the participants was evaluated based on the macro F1-score. The best method obtained macro F1-scores of 0.260 and 0.584 for code-mixed Tamil and Tulu texts, respectively.

Findings of the Shared Task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu)@DravidianLangTech 2024
Premjith B | Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Saranya Rajiakodi | Sai Prashanth Karnati | Sai Rishith Reddy Mangamuru | Chandu Janakiram
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper examines the submissions of various participating teams to the task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu) organized as part of DravidianLangTech 2024. In order to identify the contents containing harmful information in Telugu codemixed social media text, the shared task pushes researchers and academicians to build models. The dataset for the task was created by gathering YouTube comments and annotated manually. A total of 23 teams participated and submitted their results to the shared task. The rank list was created by assessing the submitted results using the macro F1-score.

From Laughter to Inequality: Annotated Dataset for Misogyny Detection in Tamil and Malayalam Memes
Rahul Ponnusamy | Kathiravan Pannerselvam | Saranya Rajiakodi | Prasanna Kumar Kumaresan | Sajeetha Thavareesan | Bhuvaneswari Sivagnanam | Anshid K.A | Susminu S Kumar | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this digital era, memes have become a prevalent online expression, humor, sarcasm, and social commentary. However, beneath their surface lies concerning issues such as the propagation of misogyny, gender-based bias, and harmful stereotypes. To overcome these issues, we introduced MDMD (Misogyny Detection Meme Dataset) in this paper. This article focuses on creating an annotated dataset with detailed annotation guidelines to delve into online misogyny within the Tamil and Malayalam-speaking communities. Through analyzing memes, we uncover the intricate world of gender bias and stereotypes in these communities, shedding light on their manifestations and impact. This dataset, along with its comprehensive annotation guidelines, is a valuable resource for understanding the prevalence, origins, and manifestations of misogyny in various contexts, aiding researchers, policymakers, and organizations in developing effective strategies to combat gender-based discrimination and promote equality and inclusivity. It enables a deeper understanding of the issue and provides insights that can inform strategies for cultivating a more equitable and secure online environment. This work represents a crucial step in raising awareness and addressing gender-based discrimination in the digital space.

Overview of Shared Task on Caste and Migration Hate Speech Detection
Saranya Rajiakodi | Bharathi Raja Chakravarthi | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Sathiyaraj Thangasamy | Bhuvaneswari Sivagnanam | Charmathi Rajkumar
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

We present an overview of the first shared task on “Caste and Migration Hate Speech Detection.” The shared task is organized as part of LTEDI@EACL 2024. The system must delineate between binary outcomes, ascertaining whether the text is categorized as a caste/migration hate speech or not. The dataset presented in this shared task is in Tamil, which is one of the under-resource languages. There are a total of 51 teams participated in this task. Among them, 15 teams submitted their research results for the task. To the best of our knowledge, this is the first time the shared task has been conducted on textual hate speech detection concerning caste and migration. In this study, we have conducted a systematic analysis and detailed presentation of all the contributions of the participants as well as the statistics of the dataset, which is the social media comments in Tamil language to detect hate speech. It also further goes into the details of a comprehensive analysis of the participants’ methodology and their findings.

Overview of the Second Shared Task on Fake News Detection in Dravidian Languages: DravidianLangTech@EACL 2024
Malliga Subramanian | Bharathi Raja Chakravarthi | Kogilavani Shanmugavadivel | Santhiya Pandiyan | Prasanna Kumar Kumaresan | Balasubramanian Palani | Premjith B | Vanaja K | Mithunja S | Devika K | Hariprasath S.b | Haripriya B | Vigneshwar E
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rise of online social media has revolutionized communication, offering users a convenient way to share information and stay updated on current events. However, this surge in connectivity has also led to the proliferation of misinformation, commonly known as fake news. This misleading content, often disguised as legitimate news, poses a significant challenge as it can distort public perception and erode trust in reliable sources. This shared task consists of two subtasks such as task 1 and task 2. Task 1 aims to classify a given social media text into original or fake. The goal of the FakeDetect-Malayalam task2 is to encourage participants to develop effective models capable of accurately detecting and classifying fake news articles in the Malayalam language into different categories like False, Half True, Mostly False, Partly False, and Mostly True. For this shared task, 33 participants submitted their results.

This paper provides a comprehensive summary of the “Homophobia and Transphobia Detection in Social Media Comments” shared task, which was held at the LT-EDI@EACL 2024. The objective of this task was to develop systems capable of identifying instances of homophobia and transphobia within social media comments. This challenge was extended across ten languages: English, Tamil, Malayalam, Telugu, Kannada, Gujarati, Hindi, Marathi, Spanish, and Tulu. Each comment in the dataset was annotated into three categories. The shared task attracted significant interest, with over 60 teams participating through the CodaLab platform. The submission of prediction from the participants was evaluated with the macro F1 score.

2023

Exploring Techniques to Detect and Mitigate Non-Inclusive Language Bias in Marketing Communications Using a Dictionary-Based Approach
Bharathi Raja Chakravarthi | Prasanna Kumar Kumaresan | Rahul Ponnusamy | John P McCrae | Michaela Comerford | Jay Megaro | Deniz Keles | Last Feremenga
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We propose a new dataset for detecting non-inclusive language in sentences in English. These sentences were gathered from public sites, explaining what is inclusive and what is non-inclusive. We also extracted potentially non-inclusive keywords/phrases from the guidelines from business websites. A phrase dictionary was created by using an automatic extension with a word embedding trained on a massive corpus of general English text. In the end, a phrase dictionary was constructed by hand-editing the previous one to exclude inappropriate expansions and add the keywords from the guidelines. In a business context, the words individuals use can significantly impact the culture of inclusion and the quality of interactions with clients and prospects. Knowing the right words to avoid helps customers of different backgrounds and historically excluded groups feel included. They can make it easier to have productive, engaging, and positive communications. You can find the dictionaries, the code, and the method for making requests for the corpus at (we will release the link for data and code once the paper is accepted).

VEL@LT-EDI: Detecting Homophobia and Transphobia in Code-Mixed Spanish Social Media Comments
Prasanna Kumar Kumaresan | Kishore Kumar Ponnusamy | Kogilavani Shanmugavadivel | Subalalitha Chinnaudayar Navaneethakrishnan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Our research aims to address the task of detecting homophobia and transphobia in social media code-mixed comments written in Spanish. Code-mixed text in social media often violates strict grammar rules and incorporates non-native scripts, posing challenges for identification. To tackle this problem, we perform pre-processing by removing unnecessary content and establishing a baseline for detecting homophobia and transphobia. Furthermore, we explore the effectiveness of various traditional machine-learning models with feature extraction and pre-trained transformer model techniques. Our best configurations achieve macro F1 scores of 0.84 on the test set and 0.82 on the development set for Spanish, demonstrating promising results in detecting instances of homophobia and transphobia in code-mixed comments.

KaustubhSharedTask@LT-EDI 2023: Homophobia-Transphobia Detection in Social Media Comments with NLPAUG-driven Data Augmentation
Kaustubh Lande | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Our research in Natural Language Processing (NLP) aims to detect hate speech comments specifically targeted at the LGBTQ+ community within the YouTube platform shared task conducted by LTEDI workshop. The dataset provided by the organizers exhibited a high degree of class imbalance, and to mitigate this, we employed NLPAUG, a data augmentation library. We employed several classification methods and reported the results using recall, precision, and F1-score metrics. The classification models discussed in this paper include a Bidirectional Long Short-Term Memory (BiLSTM) model trained with Word2Vec embeddings, a BiLSTM model trained with Twitter GloVe embeddings, transformer models such as BERT, DistiBERT, RoBERTa, and XLM-RoBERTa, all of which were trained and fine-tuned. We achieved a weighted F1-score of 0.699 on the test data and secured fifth place in task B with 7 classes for the English language.

Overview of the shared task on Fake News Detection from Social Media Text
Malliga Subramanian | Bharathi Raja Chakravarthi | Kogilavani Shanmugavadivel | Santhiya Pandiyan | Prasanna Kumar Kumaresan | Balasubramanian Palani | Muskaan Singh | Sandhiya Raja | Vanaja | Mithunajha S
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
Prasanna Kumar Kumaresan | Bharathi Raja Chakravarthi | Subalalitha Cn | Miguel Ángel García-Cumbreras | Salud María Jiménez Zafra | José Antonio García-Díaz | Rafael Valencia-García | Momchil Hardalov | Ivan Koychev | Preslav Nakov | Daniel García-Baena | Kishore Kumar Ponnusamy
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Hope serves as a powerful driving force that encourages individuals to persevere in the face of the unpredictable nature of human existence. It instills motivation within us to remain steadfast in our pursuit of important goals, regardless of the uncertainties that lie ahead. In today’s digital age, platforms such as Facebook, Twitter, Instagram, and YouTube have emerged as prominent social media outlets where people freely express their views and opinions. These platforms have also become crucial for marginalized individuals seeking online assistance and support[1][2][3]. The outbreak of the pandemic has exacerbated people’s fears around the world, as they grapple with the possibility of losing loved ones and the lack of access to essential services such as schools, hospitals, and mental health facilities.

VEL@DravidianLangTech: Sentiment Analysis of Tamil and Tulu
Kishore Kumar Ponnusamy | Charmathi Rajkumar | Prasanna Kumar Kumaresan | Elizabeth Sherly | Ruba Priyadharshini
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

We participated in the Sentiment Analysis in Tamil and Tulu - DravidianLangTech 2023-RANLP 2023 task in the team name of VEL. This research focuses on addressing the challenge of detecting sentiment analysis in social media code-mixed comments written in Tamil and Tulu languages. Code-mixed text in social media often deviates from strict grammar rules and incorporates non-native scripts, making sentiment identification a complex task. To tackle this issue, we employ pre-processing techniques to remove unnecessary content and develop a model specifically designed for sentiment analysis detection. Additionally, we explore the effectiveness of traditional machine-learning models combined with feature extraction techniques. Our best model logistic regression configurations achieve impressive macro F1 scores of 0.43 on the Tamil test set and 0.51 on the Tulu test set, indicating promising results in accurately detecting instances of sentiment in code-mixed comments.

Overview of Shared-task on Abusive Comment Detection in Tamil and Telugu
Ruba Priyadharshini | Bharathi Raja Chakravarthi | Malliga Subramanian | Subalalitha Chinnaudayar Navaneethakrishnan | Kogilavani Shanmugavadivel | Premjith B | Abirami Murugappan | Prasanna Kumar Kumaresan | Karnati Sai Prashanth | Mangamuru Sai Rishith Reddy | Janakiram Chandu
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

This paper discusses the submissions to the shared task on abusive comment detection in Tamil and Telugu codemixed social media text conducted as part of the third Workshop on Speech and Language Technologies for Dravidian Languages at RANLP 20239. The task encourages researchers to develop models to detect the contents containing abusive information in Tamil and Telugu codemixed social media text. The task has three subtasks - abusive comment detection in Tamil, Tamil-English and Telugu-English. The dataset for all the tasks was developed by collecting comments from YouTube. The submitted models were evaluated using macro F1-score, and prepared the rank list accordingly.

2022

Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
Bharathi Raja Chakravarthi | Vigneshwaran Muralidaran | Ruba Priyadharshini | Subalalitha Chinnaudayar Navaneethakrishnan | John Philip McCrae | Miguel Ángel García-Cumbreras | Salud María Jiménez-Zafra | Rafael Valencia-García | Prasanna Kumar Kumaresan | Rahul Ponnusamy | Daniel García-Baena | José Antonio García-Díaz
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences. Hope speech is any message or content that is positive, encouraging, reassuring, inclusive and supportive that inspires and engenders optimism in the minds of people. In contrast to identifying and censoring negative speech patterns, hope speech detection is focussed on recognising and promoting positive speech patterns online. In this paper, we report an overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022) organised as a part of ACL 2022. The participants were provided with annotated training & development datasets and unlabelled test datasets in all the five languages. The goal of the shared task is to classify the given sentences into one of the two hope speech classes. The performances of the systems submitted by the participants were evaluated in terms of micro-F1 score and weighted-F1 score. The datasets for this challenge are openly available

Findings of the Shared Task on Multimodal Sentiment Analysis and Troll Meme Classification in Dravidian Languages
Premjith B | Bharathi Raja Chakravarthi | Malliga Subramanian | Bharathi B | Soman KP | Dhanalakshmi Vadivel | Sreelakshmi K | Arunaggiri Pandian | Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents the findings of the shared task on Multimodal Sentiment Analysis and Troll meme classification in Dravidian languages held at ACL 2022. Multimodal sentiment analysis deals with the identification of sentiment from video. In addition to video data, the task requires the analysis of corresponding text and audio features for the classification of movie reviews into five classes. We created a dataset for this task in Malayalam and Tamil. The Troll meme classification task aims to classify multimodal Troll memes into two categories. This task assumes the analysis of both text and image features for making better predictions. The performance of the participating teams was analysed using the F1-score. Only one team submitted their results in the Multimodal Sentiment Analysis task, whereas we received six submissions in the Troll meme classification task. The only team that participated in the Multimodal Sentiment Analysis shared task obtained an F1-score of 0.24. In the Troll meme classification task, the winning team achieved an F1-score of 0.596.

Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Durairaj Thenmozhi | John Philip McCrae | Paul Buitelaar | Rahul Ponnusamy | Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score.

Overview of Abusive Comment Detection in Tamil-ACL 2022
Ruba Priyadharshini | Bharathi Raja Chakravarthi | Subalalitha Chinnaudayar Navaneethakrishnan | Thenmozhi Durairaj | Malliga Subramanian | Kogilavani Shanmugavadivel | Siddhanth U Hegde | Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.

Thirumurai: A Large Dataset of Tamil Shaivite Poems and Classification of Tamil Pann
Shankar Mahadevan | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Prabakaran Chandran | Ruba Priyadharshini | Sangeetha Sivanesan | Bharathi Raja Chakravarthi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Thirumurai, also known as Panniru Thirumurai, is a collection of Tamil Shaivite poems dating back to the Hindu revival period between the 6th and the 10th century. These poems are par excellence, in both literary and musical terms. They have been composed based on the ancient, now non-existent Tamil Pann system and can be set to music. We present a large dataset containing all the Thirumurai poems and also attempt to classify the Pann and author of each poem using transformer based architectures. Our work is the first of its kind in dealing with ancient Tamil text datasets, which are severely under-resourced. We explore several Deep Learning-based techniques for solving this challenge effectively and provide essential insights into the problem and how to address it.

2021

Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Navya Jose | Anand Kumar M | Thomas Mandl | Prasanna Kumar Kumaresan | Rahul Ponnusamy | Hariharan R L | John Philip McCrae | Elizabeth Sherly
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Detecting offensive language in social media in local languages is critical for moderating user-generated content. Thus, the field of offensive language identification in under-resourced Tamil, Malayalam and Kannada languages are essential. As the user-generated content is more code-mixed and not well studied for under-resourced languages, it is imperative to create resources and conduct benchmarking studies to encourage research in under-resourced Dravidian languages. We created a shared task on offensive language detection in Dravidian languages. We summarize here the dataset for this challenge which are openly available at https://competitions.codalab.org/competitions/27654, and present an overview of the methods and the results of the competing systems.

IIITK@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion in Tamil , Malayalam and English
Nikhil Ghanghor | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This paper describes the IIITK’s team submissions to the hope speech detection for equality, diversity and inclusion in Dravidian languages shared task organized by LT-EDI 2021 workshop@EACL 2021. Our best configurations for the shared tasks achieve weighted F1 scores of 0.60 for Tamil, 0.83 for Malayalam, and 0.93 for English. We have secured ranks of 4, 3, 2 in Tamil, Malayalam and English respectively.

Co-authors

Kishore Kumar Ponnusamy 6

Kogilavani Shanmugavadivel 6

Malliga Subramanian 6

Paul Buitelaar 5

Subalalitha Chinnaudayar Navaneethakrishnan 5

Sajeetha Thavareesan 5

John Philip McCrae 4

Balasubramanian Palani 4

Charmathi Rajkumar 4

Daniel García-Baena 3

Miguel Ángel García-Cumbreras 3

José Antonio García-Díaz 3

Salud María Jiménez-Zafra 3

Bhuvaneswari Sivagnanam 3

Sathiyaraj Thangasamy 3

Rafael Valencia-García 3

Abirami Murugappan 2

Santhiya Pandiyan 2

Ratnasingam Sakuntharaj 2

Elizabeth Sherly 2

Hariprasath .s.b 1

Prabakaran Chandran 1

Janakiram Chandu 1

Shunmuga Priya Muthusamy Chinnan 1

Subalalitha Cn 1

Michaela Comerford 1

Shanu Dhawale 1

Last Feremenga 1

Nikhil Ghanghor 1

Momchil Hardalov 1

Chandu Janakiram 1

Sreelakshmi K 1

Sai Prashanth Karnati 1

Arunaggiri Pandian Karunanidhi 1

Anshid Kizhakkeparambil 1

Susminu S Kumar 1

Anand Kumar M 1

Nandhini Kumaresh 1

Hariharan R. L 1

Kaustubh Lande 1

Shankar Mahadevan 1

Sai Rishith Reddy Mangamuru 1

Vigneshwaran Muralidaran 1

Preslav Nakov 1

Rajeswari Natarajan 1

Arunaggiri Pandian 1

Kathiravan Pannerselvam 1

Sandhiya Raja 1

Karnati Sai Prashanth 1

Mangamuru Sai Rishith Reddy 1

Lavanya Sambath Kumar 1

Hosahalli Shashirekha 1

Hosahalli Lakshmaiah Shashirekha 1

Poorvi Shetty 1

Muskaan Singh 1

Sangeetha Sivanesan 1

Siddhanth U Hegde 1

Dhanalakshmi Vadivel 1

Venues