Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Akshat Gupta; Sargam Menghani; Sai Krishna Rallabandi; Alan W. Black

doi:10.18653/v1/2021.calcs-1.13

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black

Abstract

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - ‘Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?’. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.

Anthology ID:: 2021.calcs-1.13
Volume:: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:: June
Year:: 2021
Address:: Online
Editors:: Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
Venue:: CALCS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 103–112
Language:
URL:: https://aclanthology.org/2021.calcs-1.13/
DOI:: 10.18653/v1/2021.calcs-1.13
Bibkey:
Cite (ACL):: Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, and Alan W Black. 2021. Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 103–112, Online. Association for Computational Linguistics.
Cite (Informal):: Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data (Gupta et al., CALCS 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.calcs-1.13.pdf

PDF Cite Search Fix data