SUKHAN: Corpus of Hindi Shayaris annotated with Sentiment Polarity Information

Salil Aggarwal, Abhigyan Ghosh, Radhika Mamidi


Abstract
Shayari is a form of poetry mainly popular in the Indian subcontinent, in which the poet expresses his emotions and feelings in a very poetic manner. It is one of the best ways to express our thoughts and opinions. Therefore, it is of prime importance to have an annotated corpus of Hindi shayaris for the task of sentiment analysis. In this paper, we introduce SUKHAN, a dataset consisting of Hindi shayaris along with sentiment polarity labels. To the best of our knowledge, this is the first corpus of Hindi shayaris annotated with sentiment polarity information. This corpus contains a total of 733 Hindi shayaris of various genres. Also, this dataset is of utmost value as all the annotation is done manually by five annotators and this makes it a very rich dataset for training purposes. This annotated corpus is also used to build baseline sentiment classification models using machine learning techniques.
Anthology ID:
2020.icon-main.29
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
228–233
Language:
URL:
https://aclanthology.org/2020.icon-main.29
DOI:
Bibkey:
Cite (ACL):
Salil Aggarwal, Abhigyan Ghosh, and Radhika Mamidi. 2020. SUKHAN: Corpus of Hindi Shayaris annotated with Sentiment Polarity Information. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 228–233, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
SUKHAN: Corpus of Hindi Shayaris annotated with Sentiment Polarity Information (Aggarwal et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.29.pdf