Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black


Abstract
Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - ‘Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?’. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.
Anthology ID:
2021.calcs-1.13
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Venues:
CALCS | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–112
Language:
URL:
https://aclanthology.org/2021.calcs-1.13
DOI:
10.18653/v1/2021.calcs-1.13
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.13.pdf
Data
TweetEval