2024
pdf
bib
abs
SSN-Nova@LT-EDI 2024: Leveraging Vectorisation Techniques in an Ensemble Approach for Stress Identification in Low-Resource Languages
A Reddy
|
Ann Thomas
|
Pranav Moorthi
|
Bharathi B
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
This paper presents our submission for Shared task on Stress Identification in Dravidian Languages: StressIdent LT-EDI@EACL2024. The objective of this task is to identify stress levels in individuals based on their social media content. The system is tasked with analysing posts written in a code-mixed language of Tamil and Telugu and categorising them into two labels: “stressed” or “not stressed.” Our approach aimed to leverage feature extraction and juxtapose the performance of widely used traditional, deep learning and transformer models. Our research highlighted that building a pipeline with traditional classifiers proved to significantly improve their performance (0.98 and 0.93 F1-scores in Telugu and Tamil respectively), surpassing the baseline as well as deep learning and transformer models.
pdf
bib
abs
SSN-Nova@LT-EDI 2024: POS Tagging, Boosting Techniques and Voting Classifiers for Caste And Migration Hate Speech Detection
A Reddy
|
Ann Thomas
|
Pranav Moorthi
|
Bharathi B
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
This paper presents our submission for the shared task on Caste and Migration Hate Speech Detection: LT-EDI@EACL 20241 . This text classification task aims to foster the creation of models capable of identifying hate speech related to caste and migration. The dataset comprises social media comments, and the goal is to categorize them into negative and positive sentiments. Our approach explores back-translation for data augmentation to address sparse datasets in low-resource Dravidian languages. While Part-of-Speech (POS) tagging is valuable in natural language processing, our work highlights its ineffectiveness in Dravidian languages, with model performance drastically reducing from 0.73 to 0.67 on application. In analyzing boosting and ensemble methods, the voting classifier with traditional models outperforms others and the boosting techniques, underscoring the efficacy of simper models on low-resource data despite augmentation.
2023
pdf
bib
abs
Supernova@DravidianLangTech 2023@Abusive Comment Detection in Tamil and Telugu - (Tamil, Tamil-English, Telugu-English)
Ankitha Reddy
|
Pranav Moorthi
|
Ann Maria Thomas
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
This paper focuses on using Support Vector Machines (SVM) classifiers with TF-IDF feature extraction to classify whether a comment is abusive or not.The paper tries to identify abusive content in regional languages.The dataset analysis presents the distribution of target variables in the Tamil-English, Telugu-English, and Tamil datasets.The methodology section describes the preprocessing steps, including consistency, removal of special characters and emojis, removal of stop words, and stemming of data. Overall, the study contributes to the field of abusive comment detection in Tamil and Telugu languages.