Sajeetha Thavareesan


2023

pdf bib
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Bharathi R. Chakravarthi | Ruba Priyadharshini | Anand Kumar M | Sajeetha Thavareesan | Elizabeth Sherly
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

pdf bib
CSSCUTN@DravidianLangTech:Abusive comments Detection in Tamil and Telugu
Kathiravan Pannerselvam | Saranya Rajiakodi | Rahul Ponnusamy | Sajeetha Thavareesan
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

Code-mixing is a word or phrase-level act of interchanging two or more languages during a conversation or in written text within a sentence. This phenomenon is widespread on social media platforms, and understanding the underlying abusive comments in a code-mixed sentence is a complex challenge. We present our system in our submission for the DravidianLangTech Shared Task on Abusive Comment Detection in Tamil and Telugu. Our approach involves building a multiclass abusive detection model that recognizes 8 different labels. The provided samples are code-mixed Tamil-English text, where Tamil is represented in romanised form. We focused on the Multiclass classification subtask, and we leveraged Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR). Our method exhibited its effectiveness in the shared task by earning the ninth rank out of all competing systems for the classification of abusive comments in the code-mixed text. Our proposed classifier achieves an impressive accuracy of 0.99 and an F1-score of 0.99 for a balanced dataset using TF-IDF with SVM. It can be used effectively to detect abusive comments in Tamil, English code-mixed text

pdf bib
VEL@LT-EDI-2023: Automatic Detection of Hope Speech in Bulgarian Language using Embedding Techniques
Rahul Ponnusamy | Malliga S | Sajeetha Thavareesan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Many people may find motivation in their lives by spreading content on social media that is encouraging or hopeful. Creating an effective model that helps in accurately predicting the target class is a challenging task. The problem of Hope speech identification is dealt with in this work using machine learning and deep learning methods. This paper presents the description of the system submitted by our team(VEL) to the Hope Speech Detection for Equality, Diversity, and Inclusion(HSD-EDI) LT-EDI-RANLP 2023 shared task for the Bulgarian language. The main goal of this shared task is to identify the given text into the Hope speech or Non-Hope speech category. The proposed method used the H2O deep learning model with MPNet embeddings and achieved the second rank for the Bulgarian language with the Macro F1 score of 0.69.

2022

pdf bib
Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments
Manikandan Ravikiran | Bharathi Raja Chakravarthi | Anand Kumar Madasamy | Sangeetha S | Ratnavel Rajalakshmi | Sajeetha Thavareesan | Rahul Ponnusamy | Shankar Mahadevan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.

pdf bib
Findings of the Shared Task on Emotion Analysis in Tamil
Anbukkarasi Sampath | Thenmozhi Durairaj | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Subalalitha Cn | Kogilavani Shanmugavadivel | Sajeetha Thavareesan | Sathiyaraj Thangasamy | Parameswari Krishnamurthy | Adeep Hande | Sean Benhur | Kishore Ponnusamy | Santhiya Pandiyan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.

2021

pdf bib
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

In a world with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not hope speech in English, Malayalam, and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021. The codes for our best submission can be viewed.

pdf bib
IIITK@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion in Tamil , Malayalam and English
Nikhil Ghanghor | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This paper describes the IIITK’s team submissions to the hope speech detection for equality, diversity and inclusion in Dravidian languages shared task organized by LT-EDI 2021 workshop@EACL 2021. Our best configurations for the shared tasks achieve weighted F1 scores of 0.60 for Tamil, 0.83 for Malayalam, and 0.93 for English. We have secured ranks of 4, 3, 2 in Tamil, Malayalam and English respectively.

pdf bib
UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention
Siddhanth U Hegde | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Tamil is a Dravidian language that is commonly used and spoken in the southern part of Asia. During the 21st century and in the era of social media, memes have been a fun moment during the day to day life of people. Here, we try to analyze the true meaning of Tamil memes by classifying them as troll or non-troll. We present an ingenious model consisting of transformer-transformer architecture that tries to attain state of the art by using attention as its main component. The dataset consists of troll and non-troll images with their captions as texts. The task is a binary classification task. The objective of the model was to pay more and more attention to the extracted features and to ignore the noise in both images and text.

pdf bib
IIITT@DravidianLangTech-EACL2021: Transfer Learning for Offensive Language Detection in Dravidian Languages
Konthala Yasaswini | Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper demonstrates our work for the shared task on Offensive Language Identification in Dravidian Languages-EACL 2021. Offensive language detection in the various social media platforms was identified previously. But with the increase in diversity of users, there is a need to identify the offensive language in multilingual posts that are largely code-mixed or written in a non-native script. We approach this challenge with various transfer learning-based models to classify a given post or comment in Dravidian languages (Malayalam, Tamil, and Kannada) into 6 categories. The source codes for our systems are published.

pdf bib
IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada
Nikhil Ghanghor | Parameswari Krishnamurthy | Sajeetha Thavareesan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper describes the IIITK team’s submissions to the offensive language identification, and troll memes classification shared tasks for Dravidian languages at DravidianLangTech 2021 workshop@EACL 2021. Our best configuration for Tamil troll meme classification achieved 0.55 weighted average F1 score, and for offensive language identification, our system achieved weighted F1 scores of 0.75 for Tamil, 0.95 for Malayalam, and 0.71 for Kannada. Our rank on Tamil troll meme classification is 2, and offensive language identification in Tamil, Malayalam and Kannada are 3, 3 and 4 respectively.

pdf bib
OffTamil@DravideanLangTech-EASL2021: Offensive Language Identification in Tamil Text
Disne Sivalingam | Sajeetha Thavareesan
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

In the last few decades, Code-Mixed Offensive texts are used penetratingly in social media posts. Social media platforms and online communities showed much interest on offensive text identification in recent years. Consequently, research community is also interested in identifying such content and also contributed to the development of corpora. Many publicly available corpora are there for research on identifying offensive text written in English language but rare for low resourced languages like Tamil. The first code-mixed offensive text for Dravidian languages are developed by shared task organizers which is used for this study. This study focused on offensive language identification on code-mixed low-resourced Dravidian language Tamil using four classifiers (Support Vector Machine, random forest, k- Nearest Neighbour and Naive Bayes) using chiˆ2 feature selection technique along with BoW and TF-IDF feature representation techniques using different combinations of n-grams. This proposed model achieved an accuracy of 76.96% while using linear SVM with TF-IDF feature representation technique.