Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings

Pranaydeep Singh; Els Lefever

Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings

Abstract

This paper investigates the use of unsupervised cross-lingual embeddings for solving the problem of code-mixed social media text understanding. We specifically investigate the use of these embeddings for a sentiment analysis task for Hinglish Tweets, viz. English combined with (transliterated) Hindi. In a first step, baseline models, initialized with monolingual embeddings obtained from large collections of tweets in English and code-mixed Hinglish, were trained. In a second step, two systems using cross-lingual embeddings were researched, being (1) a supervised classifier and (2) a transfer learning approach trained on English sentiment data and evaluated on code-mixed data. We demonstrate that incorporating cross-lingual embeddings improves the results (F1-score of 0.635 versus a monolingual baseline of 0.616), without any parallel data required to train the cross-lingual embeddings. In addition, the results show that the cross-lingual embeddings not only improve the results in a fully supervised setting, but they can also be used as a base for distant supervision, by training a sentiment model in one of the source languages and evaluating on the other language projected in the same space. The transfer learning experiments result in an F1-score of 0.556, which is almost on par with the supervised settings and speak to the robustness of the cross-lingual embeddings approach.

Anthology ID:: 2020.calcs-1.6
Volume:: Proceedings of the 4th Workshop on Computational Approaches to Code Switching
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Thamar Solorio, Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Amitava Das, Mona Diab
Venue:: CALCS
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 45–51
Language:: English
URL:: https://aclanthology.org/2020.calcs-1.6/
DOI:
Bibkey:
Cite (ACL):: Pranaydeep Singh and Els Lefever. 2020. Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching, pages 45–51, Marseille, France. European Language Resources Association.
Cite (Informal):: Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings (Singh & Lefever, CALCS 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.calcs-1.6.pdf

PDF Cite Search Fix data