Identifying Emotions in Code Mixed Hindi-English Tweets

Sanket Sonu; Rejwanul Haque; Mohammed Hasanuzzaman; Paul Stynes; Pramod Pathak

Identifying Emotions in Code Mixed Hindi-English Tweets

Sanket Sonu, Rejwanul Haque, Mohammed Hasanuzzaman, Paul Stynes, Pramod Pathak

Abstract

Emotion detection (ED) in tweets is a text classification problem that is of interest to Natural Language Processing (NLP) researchers. Code-mixing (CM) is a process of mixing linguistic units such as words of two different languages. The CM languages are characteristically different from the languages whose linguistic units are used for mixing. Whilst NLP has been shown to be successful for low-resource languages, it becomes challenging to perform NLP tasks on CM languages. As for ED, it has been rarely investigated on CM languages such as Hindi—English due to the lack of training data that is required for today’s data-driven classification algorithms. This research proposes a gold standard dataset for detecting emotions in CM Hindi–English tweets. This paper also presents our results about the investigation of the usefulness of our gold-standard dataset while testing a number of state-of-the-art classification algorithms. We found that the ED classifier built using SVM provided us the highest accuracy (75.17%) on the hold-out test set. This research would benefit the NLP community in detecting emotions from social media platforms in multilingual societies.

Anthology ID:: 2022.wildre-1.7
Volume:: Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Girish Nath Jha, Sobha L., Kalika Bali, Atul Kr. Ojha
Venue:: WILDRE
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 35–41
Language:
URL:: https://aclanthology.org/2022.wildre-1.7/
DOI:
Bibkey:
Cite (ACL):: Sanket Sonu, Rejwanul Haque, Mohammed Hasanuzzaman, Paul Stynes, and Pramod Pathak. 2022. Identifying Emotions in Code Mixed Hindi-English Tweets. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference, pages 35–41, Marseille, France. European Language Resources Association.
Cite (Informal):: Identifying Emotions in Code Mixed Hindi-English Tweets (Sonu et al., WILDRE 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wildre-1.7.pdf

PDF Cite Search Fix data