ML&AI_IIITRanchi@DravidianLangTech: Fine-Tuning IndicBERT for Exploring Language-specific Features for Sentiment Classification in Code-Mixed Dravidian Languages

Kirti Kumari, Shirish Shekhar Jha, Zarikunte Kunal Dayanand, Praneesh Sharma


Abstract
Code-mixing presents challenges to sentiment analysis due to limited availability of annotated data found on low-resource languages such as Tulu. To address this issue, comprehensive work was done in creating a gold-standard labeled corpus that incorporates both languages while facilitating accurate analyses of sentiments involved. Encapsulated within this research was the employed use of varied techniques including data collection, cleaning processes as well as preprocessing leading up to effective annotation along with finding results using fine tuning indic bert and performing experiments over tf-idf plus bag of words. The outcome is an invaluable resource for developing custom-tailored models meant solely for analyzing sentiments involved with code mixed texts across Tamil and Tulu domain limits; allowing a focused insight into what makes up such expressions. Remarkably, the adoption of hybrid models yielded promising outcomes, culminating in a 10th rank achievement for Tulu, and a 14thrank achievement for Tamil, supported by an macro F1 score of 0.471 and 0.124 respectively.
Anthology ID:
2023.dravidianlangtech-1.27
Volume:
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:
DravidianLangTech | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
192–197
Language:
URL:
https://aclanthology.org/2023.dravidianlangtech-1.27
DOI:
Bibkey:
Cite (ACL):
Kirti Kumari, Shirish Shekhar Jha, Zarikunte Kunal Dayanand, and Praneesh Sharma. 2023. ML&AI_IIITRanchi@DravidianLangTech: Fine-Tuning IndicBERT for Exploring Language-specific Features for Sentiment Classification in Code-Mixed Dravidian Languages. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 192–197, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
ML&AI_IIITRanchi@DravidianLangTech: Fine-Tuning IndicBERT for Exploring Language-specific Features for Sentiment Classification in Code-Mixed Dravidian Languages (Kumari et al., DravidianLangTech-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.dravidianlangtech-1.27.pdf