CUET_Binary_Hackers@DravidianLangTech-EACL 2024: Sentiment Analysis using Transformer-Based Models in Code-Mixed and Transliterated Tamil and Tulu

Asrarul Eusha, Salman Farsi, Ariful Islam, Jawad Hossain, Shawly Ahsan, Mohammed Moshiul Hoque


Abstract
Textual Sentiment Analysis (TSA) delves into people’s opinions, intuitions, and emotions regarding any entity. Natural Language Processing (NLP) serves as a technique to extract subjective knowledge, determining whether an idea or comment leans positive, negative, neutral, or a mix thereof toward an entity. In recent years, it has garnered substantial attention from NLP researchers due to the vast availability of online comments and opinions. Despite extensive studies in this domain, sentiment analysis in low-resourced languages such as Tamil and Tulu needs help handling code-mixed and transliterated content. To address these challenges, this work focuses on sentiment analysis of code-mixed and transliterated Tamil and Tulu social media comments. It explored four machine learning (ML) approaches (LR, SVM, XGBoost, Ensemble), four deep learning (DL) methods (BiLSTM and CNN with FastText and Word2Vec), and four transformer-based models (m-BERT, MuRIL, L3Cube-IndicSBERT, and Distilm-BERT) for both languages. For Tamil, L3Cube-IndicSBERT and ensemble approaches outperformed others, while m-BERT demonstrated superior performance among the models for Tulu. The presented models achieved the 3rd and 1st ranks by attaining macro F1-scores of 0.227 and 0.584 in Tamil and Tulu, respectively.
Anthology ID:
2024.dravidianlangtech-1.34
Volume:
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Rajeswari Nadarajan, Manikandan Ravikiran
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
205–211
Language:
URL:
https://aclanthology.org/2024.dravidianlangtech-1.34
DOI:
Bibkey:
Cite (ACL):
Asrarul Eusha, Salman Farsi, Ariful Islam, Jawad Hossain, Shawly Ahsan, and Mohammed Moshiul Hoque. 2024. CUET_Binary_Hackers@DravidianLangTech-EACL 2024: Sentiment Analysis using Transformer-Based Models in Code-Mixed and Transliterated Tamil and Tulu. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 205–211, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
CUET_Binary_Hackers@DravidianLangTech-EACL 2024: Sentiment Analysis using Transformer-Based Models in Code-Mixed and Transliterated Tamil and Tulu (Eusha et al., DravidianLangTech-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.dravidianlangtech-1.34.pdf