2022
pdf
bib
abs
Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments
Manikandan Ravikiran
|
Bharathi Raja Chakravarthi
|
Anand Kumar Madasamy
|
Sangeetha S
|
Ratnavel Rajalakshmi
|
Sajeetha Thavareesan
|
Rahul Ponnusamy
|
Shankar Mahadevan
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.
pdf
bib
abs
Findings of the Shared Task on Multi-task Learning in Dravidian Languages
Bharathi Raja Chakravarthi
|
Ruba Priyadharshini
|
Subalalitha Cn
|
Sangeetha S
|
Malliga Subramanian
|
Kogilavani Shanmugavadivel
|
Parameswari Krishnamurthy
|
Adeep Hande
|
Siddhanth U Hegde
|
Roshan Nayak
|
Swetha Valli
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
We present our findings from the first shared task on Multi-task Learning in Dravidian Languages at the second Workshop on Speech and Language Technologies for Dravidian Languages. In this task, a sentence in any of three Dravidian Languages is required to be classified into two closely related tasks namely Sentiment Analyis (SA) and Offensive Language Identification (OLI). The task spans over three Dravidian Languages, namely, Kannada, Malayalam, and Tamil. It is one of the first shared tasks that focuses on Multi-task Learning for closely related tasks, especially for a very low-resourced language family such as the Dravidian language family. In total, 55 people signed up to participate in the task, and due to the intricate nature of the task, especially in its first iteration, 3 submissions have been received.
pdf
bib
abs
Thirumurai: A Large Dataset of Tamil Shaivite Poems and Classification of Tamil Pann
Shankar Mahadevan
|
Rahul Ponnusamy
|
Prasanna Kumar Kumaresan
|
Prabakaran Chandran
|
Ruba Priyadharshini
|
Sangeetha S
|
Bharathi Raja Chakravarthi
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Thirumurai, also known as Panniru Thirumurai, is a collection of Tamil Shaivite poems dating back to the Hindu revival period between the 6th and the 10th century. These poems are par excellence, in both literary and musical terms. They have been composed based on the ancient, now non-existent Tamil Pann system and can be set to music. We present a large dataset containing all the Thirumurai poems and also attempt to classify the Pann and author of each poem using transformer based architectures. Our work is the first of its kind in dealing with ancient Tamil text datasets, which are severely under-resourced. We explore several Deep Learning-based techniques for solving this challenge effectively and provide essential insights into the problem and how to address it.
pdf
bib
abs
The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
Adeep Hande
|
Siddhanth U Hegde
|
Sangeetha S
|
Ruba Priyadharshini
|
Bharathi Raja Chakravarthi
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
In recent years, various methods have been developed to control the spread of negativity by removing profane, aggressive, and offensive comments from social media platforms. There is, however, a scarcity of research focusing on embracing positivity and reinforcing supportive and reassuring content in online forums. As a result, we concentrate our research on developing systems to detect hope speech in code-mixed Kannada. As a result, we present DC-LM, a dual-channel language model that sees hope speech by using the English translations of the code-mixed dataset for additional training. The approach is jointly modelled on both English and code-mixed Kannada to enable effective cross-lingual transfer between the languages. With a weighted F1-score of 0.756, the method outperforms other models. We aim to initiate research in Kannada while encouraging researchers to take a pragmatic approach to inspire positive and supportive online content.