An Indian Language Social Media Collection for Hate and Offensive Speech

Anita Saroj, Sukomal Pal


Abstract
In social media, people express themselves every day on issues that affect their lives. During the parliamentary elections, people’s interaction with the candidates in social media posts reflects a lot of social trends in a charged atmosphere. People’s likes and dislikes on leaders, political parties and their stands often become subject of hate and offensive posts. We collected social media posts in Hindi and English from Facebook and Twitter during the run-up to the parliamentary election 2019 of India (PEI data-2019). We created a dataset for sentiment analysis into three categories: hate speech, offensive and not hate, or not offensive. We report here the initial results of sentiment classification for the dataset using different classifiers.
Anthology ID:
2020.restup-1.2
Volume:
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | ResTUP | WS
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2–8
Language:
English
URL:
https://aclanthology.org/2020.restup-1.2
DOI:
Bibkey:
Cite (ACL):
Anita Saroj and Sukomal Pal. 2020. An Indian Language Social Media Collection for Hate and Offensive Speech. In Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language, pages 2–8, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
An Indian Language Social Media Collection for Hate and Offensive Speech (Saroj & Pal, ResTUP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.restup-1.2.pdf