Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags

Sainik Mahata, Dipankar Das, Sivaji Bandyopadhyay


Abstract
Sentiment analysis tools and models have been developed extensively throughout the years, for European languages. In contrast, similar tools for Indian Languages are scarce. This is because, state-of-the-art pre-processing tools like POS tagger, shallow parsers, etc., are not readily available for Indian languages. Although, such working tools for Indian languages, like Hindi and Bengali, that are spoken by the majority of the population, are available, finding the same for less spoken languages like, Tamil, Telugu, and Malayalam, is difficult. Moreover, due to the advent of social media, the multi-lingual population of India, who are comfortable with both English ad their regional language, prefer to communicate by mixing both languages. This gives rise to massive code-mixed content and automatically annotating them with their respective sentiment labels becomes a challenging task. In this work, we take up a similar challenge of developing a sentiment analysis model that can work with English-Tamil code-mixed data. The proposed work tries to solve this by using bi-directional LSTMs along with language tagging. Other traditional methods, based on classical machine learning algorithms have also been discussed in the literature, and they also act as the baseline systems to which we will compare our Neural Network based model. The performance of the developed algorithm, based on Neural Network architecture, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.
Anthology ID:
2021.dravidianlangtech-1.4
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–35
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.4
DOI:
Bibkey:
Cite (ACL):
Sainik Mahata, Dipankar Das, and Sivaji Bandyopadhyay. 2021. Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 28–35, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags (Mahata et al., DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.4.pdf
Software:
 2021.dravidianlangtech-1.4.Software.zip