Adeep Hande


2022

pdf bib
DE-ABUSE@TamilNLP-ACL 2022: Transliteration as Data Augmentation for Abuse Detection in Tamil
Vasanth Palanikumar | Sean Benhur | Adeep Hande | Bharathi Raja Chakravarthi
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

With the rise of social media and internet, thereis a necessity to provide an inclusive space andprevent the abusive topics against any gender,race or community. This paper describes thesystem submitted to the ACL-2022 shared taskon fine-grained abuse detection in Tamil. In ourapproach we transliterated code-mixed datasetas an augmentation technique to increase thesize of the data. Using this method we wereable to rank 3rd on the task with a 0.290 macroaverage F1 score and a 0.590 weighted F1score

pdf bib
Findings of the Shared Task on Emotion Analysis in Tamil
Anbukkarasi Sampath | Thenmozhi Durairaj | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Subalalitha Cn | Kogilavani Shanmugavadivel | Sajeetha Thavareesan | Sathiyaraj Thangasamy | Parameswari Krishnamurthy | Adeep Hande | Sean Benhur | Kishore Ponnusamy | Santhiya Pandiyan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.

pdf bib
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Subalalitha Cn | Sangeetha S | Malliga Subramanian | Kogilavani Shanmugavadivel | Parameswari Krishnamurthy | Adeep Hande | Siddhanth U Hegde | Roshan Nayak | Swetha Valli
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.

pdf bib
The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
Adeep Hande | Siddhanth U Hegde | Sangeetha S | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

In recent years, various methods have been developed to control the spread of negativity by removing profane, aggressive, and offensive comments from social media platforms. There is, however, a scarcity of research focusing on embracing positivity and reinforcing supportive and reassuring content in online forums. As a result, we concentrate our research on developing systems to detect hope speech in code-mixed Kannada. As a result, we present DC-LM, a dual-channel language model that sees hope speech by using the English translations of the code-mixed dataset for additional training. The approach is jointly modelled on both English and code-mixed Kannada to enable effective cross-lingual transfer between the languages. With a weighted F1-score of 0.756, the method outperforms other models. We aim to initiate research in Kannada while encouraging researchers to take a pragmatic approach to inspire positive and supportive online content.

2021

pdf bib
IIITT at CASE 2021 Task 1: Leveraging Pretrained Language Models for Multilingual Protest Detection
Pawan Kalyan | Duddukunta Reddy | Adeep Hande | Ruba Priyadharshini | Ratnasingam Sakuntharaj | Bharathi Raja Chakravarthi
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

In a world abounding in constant protests resulting from events like a global pandemic, climate change, religious or political conflicts, there has always been a need to detect events/protests before getting amplified by news media or social media. This paper demonstrates our work on the sentence classification subtask of multilingual protest detection in CASE@ACL-IJCNLP 2021. We approached this task by employing various multilingual pre-trained transformer models to classify if any sentence contains information about an event that has transpired or not. We performed soft voting over the models, achieving the best results among the models, accomplishing a macro F1-Score of 0.8291, 0.7578, and 0.7951 in English, Spanish, and Portuguese, respectively.

pdf bib
Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Thenmozi Durairaj | Anbukkarasi Sampath | Kingston Pal Thamburaj | Bharathi Raja Chakravarthi
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English→Marathi and English⇔Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English→Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English⇔Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English→Marathi, Irish→English, and English→Irish respectively. The codes for our systems are published1 .

pdf bib
UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention
Siddhanth U Hegde | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Tamil is a Dravidian language that is commonly used and spoken in the southern part of Asia. During the 21st century and in the era of social media, memes have been a fun moment during the day to day life of people. Here, we try to analyze the true meaning of Tamil memes by classifying them as troll or non-troll. We present an ingenious model consisting of transformer-transformer architecture that tries to attain state of the art by using attention as its main component. The dataset consists of troll and non-troll images with their captions as texts. The task is a binary classification task. The objective of the model was to pay more and more attention to the extracted features and to ignore the noise in both images and text.

pdf bib
IIITT@DravidianLangTech-EACL2021: Transfer Learning for Offensive Language Detection in Dravidian Languages
Konthala Yasaswini | Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper demonstrates our work for the shared task on Offensive Language Identification in Dravidian Languages-EACL 2021. Offensive language detection in the various social media platforms was identified previously. But with the increase in diversity of users, there is a need to identify the offensive language in multilingual posts that are largely code-mixed or written in a non-native script. We approach this challenge with various transfer learning-based models to classify a given post or comment in Dravidian languages (Malayalam, Tamil, and Kannada) into 6 categories. The source codes for our systems are published.

pdf bib
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

In a world with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not hope speech in English, Malayalam, and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021. The codes for our best submission can be viewed.

2020

pdf bib
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
Adeep Hande | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

We introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification. The KanCMD dataset highlights two real-world issues from the social media text. First, it contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook. Second, it has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language. Hence, KanCMD is meant to stimulate research in under-resourced Kannada language on real-world code-mixed social media text and multi-task learning. KanCMD was obtained by crawling the YouTube, and a minimum of three annotators annotates each comment. We release KanCMD 7,671 comments for multitask learning research purpose.