Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black


Abstract
Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - ‘Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?’. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.
Anthology ID:
2021.calcs-1.13
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Editors:
Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
Venue:
CALCS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–112
Language:
URL:
https://aclanthology.org/2021.calcs-1.13
DOI:
10.18653/v1/2021.calcs-1.13
Bibkey:
Cite (ACL):
Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, and Alan W Black. 2021. Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 103–112, Online. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data (Gupta et al., CALCS 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.13.pdf
Data
TweetEval