Poorvi@DravidianLangTech: Sentiment Analysis on Code-Mixed Tulu and Tamil Corpus

Poorvi Shetty

Poorvi@DravidianLangTech: Sentiment Analysis on Code-Mixed Tulu and Tamil Corpus

Abstract

Sentiment analysis in code-mixed languages poses significant challenges, particularly for highly under-resourced languages such as Tulu and Tamil. Existing corpora, primarily sourced from YouTube comments, suffer from class imbalance across sentiment categories. Moreover, the limited number of samples in these corpus hampers effective sentiment classification. This study introduces a new corpus tailored for sentiment analysis in Tulu code-mixed texts. The research applies standard pre-processing techniques to ensure data quality and consistency and handle class imbalance. Subsequently, multiple classifiers are employed to analyze the sentiment of the code-mixed texts, yielding promising results. By leveraging the new corpus, the study contributes to advancing sentiment analysis techniques in under-resourced code-mixed languages. This work serves as a stepping stone towards better understanding and addressing the challenges posed by sentiment analysis in highly under-resourced languages.

Anthology ID:: 2023.dravidianlangtech-1.16
Volume:: Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:: September
Year:: 2023
Address:: Varna, Bulgaria
Editors:: Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:: DravidianLangTech | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 124–132
Language:
URL:: https://aclanthology.org/2023.dravidianlangtech-1.16/
DOI:
Bibkey:
Cite (ACL):: Poorvi Shetty. 2023. Poorvi@DravidianLangTech: Sentiment Analysis on Code-Mixed Tulu and Tamil Corpus. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 124–132, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Poorvi@DravidianLangTech: Sentiment Analysis on Code-Mixed Tulu and Tamil Corpus (Shetty, DravidianLangTech 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.dravidianlangtech-1.16.pdf

PDF Cite Search Fix data