Ruba Priyadharshini


2021

pdf bib
IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

In a world with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not hope speech in English, Malayalam, and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021. The codes for our best submission can be viewed.

pdf bib
IIITK@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion in Tamil , Malayalam and English
Nikhil Ghanghor | Rahul Ponnusamy | Prasanna Kumar Kumaresan | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This paper describes the IIITK’s team submissions to the hope speech detection for equality, diversity and inclusion in Dravidian languages shared task organized by LT-EDI 2021 workshop@EACL 2021. Our best configurations for the shared tasks achieve weighted F1 scores of 0.60 for Tamil, 0.83 for Malayalam, and 0.93 for English. We have secured ranks of 4, 3, 2 in Tamil, Malayalam and English respectively.

pdf bib
IIITT at CASE 2021 Task 1: Leveraging Pretrained Language Models for Multilingual Protest Detection
Pawan Kalyan | Duddukunta Reddy | Adeep Hande | Ruba Priyadharshini | Ratnasingam Sakuntharaj | Bharathi Raja Chakravarthi
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

In a world abounding in constant protests resulting from events like a global pandemic, climate change, religious or political conflicts, there has always been a need to detect events/protests before getting amplified by news media or social media. This paper demonstrates our work on the sentence classification subtask of multilingual protest detection in CASE@ACL-IJCNLP 2021. We approached this task by employing various multilingual pre-trained transformer models to classify if any sentence contains information about an event that has transpired or not. We performed soft voting over the models, achieving the best results among the models, accomplishing a macro F1-Score of 0.8291, 0.7578, and 0.7951 in English, Spanish, and Portuguese, respectively.

pdf bib
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Anand Kumar M | Parameswari Krishnamurthy | Elizabeth Sherly
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

pdf bib
Findings of the Shared Task on Machine Translation in Dravidian languages
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Shubhanker Banerjee | Richard Saldanha | John P. McCrae | Anand Kumar M | Parameswari Krishnamurthy | Melvin Johnson
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper presents an overview of the shared task on machine translation of Dravidian languages. We presented the shared task results at the EACL 2021 workshop on Speech and Language Technologies for Dravidian Languages. This paper describes the datasets used, the methodology used for the evaluation of participants, and the experiments’ overall results. As a part of this shared task, we organized four sub-tasks corresponding to machine translation of the following language pairs: English to Tamil, English to Malayalam, English to Telugu and Tamil to Telugu which are available at https://competitions.codalab.org/competitions/27650. We provided the participants with training and development datasets to perform experiments, and the results were evaluated on unseen test data. In total, 46 research groups participated in the shared task and 7 experimental runs were submitted for evaluation. We used BLEU scores for assessment of the translations.

pdf bib
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Navya Jose | Anand Kumar M | Thomas Mandl | Prasanna Kumar Kumaresan | Rahul Ponnusamy | Hariharan R L | John P. McCrae | Elizabeth Sherly
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Detecting offensive language in social media in local languages is critical for moderating user-generated content. Thus, the field of offensive language identification in under-resourced Tamil, Malayalam and Kannada languages are essential. As the user-generated content is more code-mixed and not well studied for under-resourced languages, it is imperative to create resources and conduct benchmarking studies to encourage research in under-resourced Dravidian languages. We created a shared task on offensive language detection in Dravidian languages. We summarize here the dataset for this challenge which are openly available at https://competitions.codalab.org/competitions/27654, and present an overview of the methods and the results of the competing systems.

pdf bib
UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention
Siddhanth U Hegde | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Tamil is a Dravidian language that is commonly used and spoken in the southern part of Asia. During the 21st century and in the era of social media, memes have been a fun moment during the day to day life of people. Here, we try to analyze the true meaning of Tamil memes by classifying them as troll or non-troll. We present an ingenious model consisting of transformer-transformer architecture that tries to attain state of the art by using attention as its main component. The dataset consists of troll and non-troll images with their captions as texts. The task is a binary classification task. The objective of the model was to pay more and more attention to the extracted features and to ignore the noise in both images and text.

pdf bib
IIITT@DravidianLangTech-EACL2021: Transfer Learning for Offensive Language Detection in Dravidian Languages
Konthala Yasaswini | Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper demonstrates our work for the shared task on Offensive Language Identification in Dravidian Languages-EACL 2021. Offensive language detection in the various social media platforms was identified previously. But with the increase in diversity of users, there is a need to identify the offensive language in multilingual posts that are largely code-mixed or written in a non-native script. We approach this challenge with various transfer learning-based models to classify a given post or comment in Dravidian languages (Malayalam, Tamil, and Kannada) into 6 categories. The source codes for our systems are published.

pdf bib
IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada
Nikhil Ghanghor | Parameswari Krishnamurthy | Sajeetha Thavareesan | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper describes the IIITK team’s submissions to the offensive language identification, and troll memes classification shared tasks for Dravidian languages at DravidianLangTech 2021 workshop@EACL 2021. Our best configuration for Tamil troll meme classification achieved 0.55 weighted average F1 score, and for offensive language identification, our system achieved weighted F1 scores of 0.75 for Tamil, 0.95 for Malayalam, and 0.71 for Kannada. Our rank on Tamil troll meme classification is 2, and offensive language identification in Tamil, Malayalam and Kannada are 3, 3 and 4 respectively.

pdf bib
Findings of the VarDial Evaluation Campaign 2021
Bharathi Raja Chakravarthi | Gaman Mihaela | Radu Tudor Ionescu | Heidi Jauhiainen | Tommi Jauhiainen | Krister Lindén | Nikola Ljubešić | Niko Partanen | Ruba Priyadharshini | Christoph Purschke | Eswari Rajagopal | Yves Scherrer | Marcos Zampieri
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects

This paper describes the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2021. The campaign was part of the eighth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with EACL 2021. Four separate shared tasks were included this year: Dravidian Language Identification (DLI), Romanian Dialect Identification (RDI), Social Media Variety Geolocation (SMG), and Uralic Language Identification (ULI). DLI was organized for the first time and the other three continued a series of tasks from previous evaluation campaigns.

pdf bib
Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Thenmozi Durairaj | Anbukkarasi Sampath | Kingston Pal Thamburaj | Bharathi Raja Chakravarthi
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

This paper reports the Machine Translation (MT) systems submitted by the IIITT team for the English→Marathi and English⇔Irish language pairs LoResMT 2021 shared task. The task focuses on getting exceptional translations for rather low-resourced languages like Irish and Marathi. We fine-tune IndicTrans, a pretrained multilingual NMT model for English→Marathi, using external parallel corpus as input for additional training. We have used a pretrained Helsinki-NLP Opus MT English⇔Irish model for the latter language pair. Our approaches yield relatively promising results on the BLEU metrics. Under the team name IIITT, our systems ranked 1, 1, and 2 in English→Marathi, Irish→English, and English→Irish respectively. The codes for our systems are published1 .

2020

pdf bib
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
Bharathi Raja Chakravarthi | Vigneshwaran Muralidaran | Ruba Priyadharshini | John Philip McCrae
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.

pdf bib
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
Adeep Hande | Ruba Priyadharshini | Bharathi Raja Chakravarthi
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

We introduce Kannada CodeMixed Dataset (KanCMD), a multi-task learning dataset for sentiment analysis and offensive language identification. The KanCMD dataset highlights two real-world issues from the social media text. First, it contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook. Second, it has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language. Hence, KanCMD is meant to stimulate research in under-resourced Kannada language on real-world code-mixed social media text and multi-task learning. KanCMD was obtained by crawling the YouTube, and a minimum of three annotators annotates each comment. We release KanCMD 7,671 comments for multitask learning research purpose.

2019

pdf bib
Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Bernardo Stearns | Arun Jayapal | Sridevy S | Mihael Arcan | Manel Zarrouk | John P McCrae
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages