2024
pdf
bib
abs
Findings of the First Shared Task on Offensive Span Identification from Code-Mixed Kannada-English Comments
Manikandan Ravikiran
|
Ratnavel Rajalakshmi
|
Bharathi Raja Chakravarthi
|
Anand Kumar Madasamy
|
Sajeetha Thavareesan
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Effectively managing offensive content is crucial on social media platforms to encourage positive online interactions. However, addressing offensive contents in code-mixed Dravidian languages faces challenges, as current moderation methods focus on flagging entire comments rather than pinpointing specific offensive segments. This limitation stems from a lack of annotated data and accessible systems designed to identify offensive language sections. To address this, our shared task presents a dataset comprising Kannada-English code-mixed social comments, encompassing offensive comments. This paper outlines the dataset, the utilized algorithms, and the results obtained by systems participating in this shared task.
pdf
bib
abs
DLRG-DravidianLangTech@EACL2024 : Combating Hate Speech in Telugu Code-mixed Text on Social Media
Ratnavel Rajalakshmi
|
Saptharishee M
|
Hareesh S
|
Gabriel R
|
Varsini Sr
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Detecting hate speech in code-mixed language is vital for a secure online space, curbing harmful content, promoting inclusive communication, and safeguarding users from discrimination. Despite the linguistic complexities of code-mixed languages, this study explores diverse pre-processing methods. It finds that the Transliteration method excels in handling linguistic variations. The research comprehensively investigates machine learning and deep learning approaches, namely Logistic Regression and Bi-directional Gated Recurrent Unit (Bi-GRU) models. These models achieved F1 scores of 0.68 and 0.70, respectively, contributing to ongoing efforts to combat hate speech in code-mixed languages and offering valuable insights for future research in this critical domain.
2022
pdf
bib
abs
DLRG@LT-EDI-ACL2022:Detecting signs of Depression from Social Media using XGBoost Method
Herbert Sharen
|
Ratnavel Rajalakshmi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Depression is linked to the development of dementia. Cognitive functions such as thinkingand remembering generally deteriorate in dementiapatients. Social media usage has beenincreased among the people in recent days. Thetechnology advancements help the communityto express their views publicly. Analysing thesigns of depression from texts has become animportant area of research now, as it helps toidentify this kind of mental disorders among thepeople from their social media posts. As part ofthe shared task on detecting signs of depressionfrom social media text, a dataset has been providedby the organizers (Sampath et al.). Weapplied different machine learning techniquessuch as Support Vector Machine, Random Forestand XGBoost classifier to classify the signsof depression. Experimental results revealedthat, the XGBoost model outperformed othermodels with the highest classification accuracyof 0.61% and an Macro F1 score of 0.54.
pdf
bib
abs
DLRG@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil using Multilingual Transformer Models
Ratnavel Rajalakshmi
|
Ankita Duraphe
|
Antonette Shibani
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Online Social Network has let people to connect and interact with each other. It does, however, also provide a platform for online abusers to propagate abusive content. The vast majority of abusive remarks are written in a multilingual style, which allows them to easily slip past internet inspection. This paper presents a system developed for the Shared Task on Abusive Comment Detection (Misogyny, Misandry, Homophobia, Transphobic, Xenophobia, CounterSpeech, Hope Speech) in Tamil DravidianLangTech@ACL 2022 to detect the abusive category of each comment. We approach the task with three methodologies - Machine Learning, Deep Learning and Transformer-based modeling, for two sets of data - Tamil and Tamil+English language dataset. The dataset used in our system can be accessed from the competition on CodaLab. For Machine Learning, eight algorithms were implemented, among which Random Forest gave the best result with Tamil+English dataset, with a weighted average F1-score of 0.78. For Deep Learning, Bi-Directional LSTM gave best result with pre-trained word embeddings. In Transformer-based modeling, we used IndicBERT and mBERT with fine-tuning, among which mBERT gave the best result for Tamil dataset with a weighted average F1-score of 0.7.
pdf
bib
abs
DLRG@TamilNLP-ACL2022: Offensive Span Identification in Tamil usingBiLSTM-CRF approach
Ratnavel Rajalakshmi
|
Mohit More
|
Bhamatipati Shrikriti
|
Gitansh Saharan
|
Hanchate Samyuktha
|
Sayantan Nandy
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Identifying offensive speech is an exciting andessential area of research, with ample tractionin recent times. This paper presents our sys-tem submission to the subtask 1, focusing onusing supervised approaches for extracting Of-fensive spans from code-mixed Tamil-Englishcomments. To identify offensive spans, wedeveloped the Bidirectional Long Short-TermMemory (BiLSTM) model with Glove Em-bedding. To this end, the developed systemachieved an overall F1 of 0.1728. Addition-ally, for comments with less than 30 characters,the developed system shows an F1 of 0.3890,competitive with other submissions.
pdf
bib
abs
Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments
Manikandan Ravikiran
|
Bharathi Raja Chakravarthi
|
Anand Kumar Madasamy
|
Sangeetha S
|
Ratnavel Rajalakshmi
|
Sajeetha Thavareesan
|
Rahul Ponnusamy
|
Shankar Mahadevan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.
pdf
bib
abs
Multimodal Code-Mixed Tamil Troll Meme Classification using Feature Fusion
Ramesh Kannan
|
Ratnavel Rajalakshmi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
Memes became an important way of expressing relevant idea through social media platforms and forums. At the same time, these memes are trolled by a person who tries to get identified from the other internet users like social media users, chat rooms and blogs. The memes contain both textual and visual information. Based on the content of memes, they are trolled in online community. There is no restriction for language usage in online media. The present work focuses on whether memes are trolled or not trolled. The proposed multi modal approach achieved considerably better weighted average F1 score of 0.5437 compared to Unimodal approaches. The other performance metrics like precision, recall, accuracy and macro average have also been studied to observe the proposed system.
pdf
bib
abs
Understanding the role of Emojis for emotion detection in Tamil
Ratnavel Rajalakshmi
|
Faerie Mattins R
|
Srivarshan Selvaraj
|
Antonette Shibani
|
Anand Kumar M
|
Bharathi Raja Chakravarthi
Proceedings of the First Workshop on Multimodal Machine Learning in Low-resource Languages
of expressing relevant idea through social media platforms and forums. At the same time, these memes are trolled by a person who tries to get identified from the other internet users like social media users, chat rooms and blogs. The memes contain both textual and visual information. Based on the content of memes, they are trolled in online community. There is no restriction for language usage in online media. The present work focuses on whether memes are trolled or not trolled. The proposed multi modal approach achieved considerably better weighted average F1 score of 0.5437 compared to Unimodal approaches. The other performance metrics like precision, recall, accuracy and macro average have also been studied to observe the proposed system.
2021
pdf
bib
abs
DLRG@DravidianLangTech-EACL2021: Transformer based approachfor Offensive Language Identification on Code-Mixed Tamil
Ratnavel Rajalakshmi
|
Yashwant Reddy
|
Lokesh Kumar
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Internet advancements have made a huge impact on the communication pattern of people and their life style. People express their opinion on products, politics, movies etc. in social media. Even though, English is predominantly used, nowadays many people prefer to tweet in their native language and some- times by combining it with English. Sentiment analysis on such code-mixed tweets is challenging, due to large vocabulary, grammar and colloquial usage of many words. In this paper, the transformer based language model is applied to analyse the sentiment on Tanglish tweets, which is a combination of Tamil and English. This work has been submitted to the the shared task on DravidianLangTech- EACL2021. From the experimental results, it is shown that an F 1 score of 64% was achieved in detecting the hate speech in code-mixed Tamil-English tweets using bidirectional trans- former model.