Sentiment Analysis of Tweets in Three Indian Languages
Shanta Phani | Shibamouli Lahiri | Arindam Biswas
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
In this paper, we describe the results of sentiment analysis on tweets in three Indian languages – Bengali, Hindi, and Tamil. We used the recently released SAIL dataset (Patra et al., 2015), and obtained state-of-the-art results in all three languages. Our features are simple, robust, scalable, and language-independent. Further, we show that these simple features provide better results than more complex and language-specific features, in two separate classification tasks. Detailed feature analysis and error analysis have been reported, along with learning curves for Hindi and Bengali.