2025
pdf
bib
abs
Overview of the Shared Task on Multimodal Hate Speech Detection in Dravidian languages: DravidianLangTech@NAACL 2025
Jyothish Lal G
|
Premjith B
|
Bharathi Raja Chakravarthi
|
Saranya Rajiakodi
|
Bharathi B
|
Rajeswari Natarajan
|
Ratnavel Rajalakshmi
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The detection of hate speech in social media platforms is very crucial these days. This is due to its adverse impact on mental health, social harmony, and online safety. This paper presents the overview of the shared task on Multimodal Hate Speech Detection in Dravidian Languages organized as part of DravidianLangTech@NAACL 2025. The task emphasizes detecting hate speech in social media content that combines speech and text. Here, we focus on three low-resource Dravidian languages: Malayalam, Tamil, and Telugu. Participants were required to classify hate speech in three sub-tasks, each corresponding to one of these languages. The dataset was curated by collecting speech and corresponding text from YouTube videos. Various machine learning and deep learning-based models, including transformer-based architectures and multimodal frameworks, were employed by the participants. The submissions were evaluated using the macro F1 score. Experimental results underline the potential of multimodal approaches in advancing hate speech detection for low-resource languages. Team SSNTrio achieved the highest F1 score in Malayalam and Tamil of 0.7511 and 0.7332, respectively. Team lowes scored the best F1 score of 0.3817 in the Telugu sub-task.
pdf
bib
abs
Overview of the Shared Task on Sentiment Analysis in Tamil and Tulu
Thenmozhi Durairaj
|
Bharathi Raja Chakravarthi
|
Asha Hegde
|
Hosahalli Lakshmaiah Shashirekha
|
Rajeswari Natarajan
|
Sajeetha Thavareesan
|
Ratnasingam Sakuntharaj
|
Krishnakumari K
|
Charmathi Rajkumar
|
Poorvi Shetty
|
Harshitha S Kumar
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Sentiment analysis is an essential task for interpreting subjective opinions and emotions in textual data, with significant implications across commercial and societal applications. This paper provides an overview of the shared task on Sentiment Analysis in Tamil and Tulu, organized as part of DravidianLangTech@NAACL 2025. The task comprises two components: one addressing Tamil and the other focusing on Tulu, both designed as multi-class classification challenges, wherein the sentiment of a given text must be categorized as positive, negative, neutral and unknown. The dataset was diligently organized by aggregating user-generated content from social media platforms such as YouTube and Twitter, ensuring linguistic diversity and real-world applicability. Participants applied a variety of computational approaches, ranging from classical machine learning algorithms such as Traditional Machine Learning Models, Deep Learning Models, Pre-trained Language Models and other Feature Representation Techniques to tackle the challenges posed by linguistic code-mixing, orthographic variations, and resource scarcity in these low resource languages.
2024
pdf
bib
abs
Findings of the Shared Task on Multimodal Social Media Data Analysis in Dravidian Languages (MSMDA-DL)@DravidianLangTech 2024
Premjith B
|
Jyothish G
|
Sowmya V
|
Bharathi Raja Chakravarthi
|
K Nandhini
|
Rajeswari Natarajan
|
Abirami Murugappan
|
Bharathi B
|
Saranya Rajiakodi
|
Rahul Ponnusamy
|
Jayanth Mohan
|
Mekapati Reddy
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents the findings of the shared task on multimodal sentiment analysis, abusive language detection and hate speech detection in Dravidian languages. Through this shared task, researchers worldwide can submit models for three crucial social media data analysis challenges in Dravidian languages: sentiment analysis, abusive language detection, and hate speech detection. The aim is to build models for deriving fine-grained sentiment analysis from multimodal data in Tamil and Malayalam, identifying abusive and hate content from multimodal data in Tamil. Three modalities make up the multimodal data: text, audio, and video. YouTube videos were gathered to create the datasets for the tasks. Thirty-nine teams took part in the competition. However, only two teams, though, turned in their findings. The macro F1-score was used to assess the submissions
pdf
bib
abs
Overview of Second Shared Task on Sentiment Analysis in Code-mixed Tamil and Tulu
Lavanya Sambath Kumar
|
Asha Hegde
|
Bharathi Raja Chakravarthi
|
Hosahalli Shashirekha
|
Rajeswari Natarajan
|
Sajeetha Thavareesan
|
Ratnasingam Sakuntharaj
|
Thenmozhi Durairaj
|
Prasanna Kumar Kumaresan
|
Charmathi Rajkumar
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Sentiment Analysis (SA) in Dravidian codemixed text is a hot research area right now. In this regard, the “Second Shared Task on SA in Code-mixed Tamil and Tulu” at Dravidian- LangTech (EACL-2024) is organized. Two tasks namely SA in Tamil-English and Tulu- English code-mixed data, make up this shared assignment. In total, 64 teams registered for the shared task, out of which 19 and 17 systems were received for Tamil and Tulu, respectively. The performance of the systems submitted by the participants was evaluated based on the macro F1-score. The best method obtained macro F1-scores of 0.260 and 0.584 for code-mixed Tamil and Tulu texts, respectively.
pdf
bib
abs
Overview of the Third Shared Task on Speech Recognition for Vulnerable Individuals in Tamil
Bharathi B
|
Bharathi Raja Chakravarthi
|
Sripriya N
|
Rajeswari Natarajan
|
Suhasini S
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
The overview of the shared task on speech recognition for vulnerable individuals in Tamil (LT-EDI-2024) is described in this paper. The work comes with a Tamil dataset that was gath- ered from elderly individuals who identify as male, female, or transgender. The audio sam- ples were taken in public places such as marketplaces, vegetable shops, hospitals, etc. The training phase and the testing phase are when the dataset is made available. The task required of the participants was to handle audio signals using various models and techniques, and then turn in their results as transcriptions of the pro- vided test samples. The participant’s results were assessed using WER (Word Error Rate). The transformer-based approach was employed by the participants to achieve automatic voice recognition. This overview paper discusses the findings and various pre-trained transformer- based models that the participants employed.
2023
pdf
bib
abs
Findings of the Shared Task on Multimodal Abusive Language Detection and Sentiment Analysis in Tamil and Malayalam
Premjith B
|
Jyothish Lal G
|
Sowmya V
|
Bharathi Raja Chakravarthi
|
Rajeswari Natarajan
|
Nandhini K
|
Abirami Murugappan
|
Bharathi B
|
Kaushik M
|
Prasanth Sn
|
Aswin Raj R
|
Vijai Simmon S
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
This paper summarizes the shared task on multimodal abusive language detection and sentiment analysis in Dravidian languages as part of the third Workshop on Speech and Language Technologies for Dravidian Languages at RANLP 2023. This shared task provides a platform for researchers worldwide to submit their models on two crucial social media data analysis problems in Dravidian languages - abusive language detection and sentiment analysis. Abusive language detection identifies social media content with abusive information, whereas sentiment analysis refers to the problem of determining the sentiments expressed in a text. This task aims to build models for detecting abusive content and analyzing fine-grained sentiment from multimodal data in Tamil and Malayalam. The multimodal data consists of three modalities - video, audio and text. The datasets for both tasks were prepared by collecting videos from YouTube. Sixty teams participated in both tasks. However, only two teams submitted their results. The submissions were evaluated using macro F1-score.
pdf
bib
abs
Overview of the Second Shared Task on Speech Recognition for Vulnerable Individuals in Tamil
Bharathi B
|
Bharathi Raja Chakravarthi
|
Subalalitha Cn
|
Sripriya Natarajan
|
Rajeswari Natarajan
|
S Suhasini
|
Swetha Valli
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
This paper manifest the overview of the shared task on Speech Recognition for Vulnerable individuals in Tamil(LT-EDI-ACL2023). Task is provided with an Tamil dataset, which is collected from elderly people of three different genders, male, female and transgender. The audio samples were recorded from the public locations like hospitals, markets, vegetable shop, etc. The dataset is released in two phase, training and testing phase. The partcipants were asked to use different models and methods to handle audio signals and submit the result as transcription of the test samples given. The result submitted by the participants was evaluated using WER (Word Error Rate). The participants used the transformer-based model for automatic speech recognition. The results and different pre-trained transformer based models used by the participants is discussed in this overview paper.