Sentiment Analysis of English-Punjabi Code-Mixed Social Media Content

Mukhtiar Singh, Vishal Goyal


Abstract
Sentiment analysis is a field of study for analyzing people’s emotions, such as Nice, Happy, ਦੁਖੀ (sad), changa (Good), etc. towards the entities and attributes expressed in written text. It noticed that, on microblogging websites (Facebook, YouTube, Twitter ), most people used more than one language to express their emotions. The change of one language to another language within the same written text is called code-mixing. In this research, we gathered the English-Punjabi code-mixed corpus from micro-blogging websites. We have performed language identification of code-mix text, which includes Phonetic Typing, Abbreviation, Wordplay, Intentionally misspelled words and Slang words. Then we performed tokenization of English and Punjabi language words consisting of different spellings. Then we performed sentiment analysis based on the above text based on the lexicon approach. The dictionary created for English Punjabi code mixed consists of opinionated words. The opinionated words are then categorized into three categories i.e. positive words list, negative words list, and neutral words list. The rest of the words are being stored in an unsorted word list. By using the N-gram approach, a statistical technique is applied at sentence level sentiment polarity of the English-Punjabi code-mixed dataset. Our results show an accuracy of 83% with an F-1 measure of 77%.
Anthology ID:
2020.icon-demos.9
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Month:
DECEMBER
Year:
2020
Address:
Patna, India
Editors:
Vishal Goyal, Asif Ekbal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
24–25
Language:
URL:
https://aclanthology.org/2020.icon-demos.9
DOI:
Bibkey:
Cite (ACL):
Mukhtiar Singh and Vishal Goyal. 2020. Sentiment Analysis of English-Punjabi Code-Mixed Social Media Content. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations, pages 24–25, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Sentiment Analysis of English-Punjabi Code-Mixed Social Media Content (Singh & Goyal, ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-demos.9.pdf