Vio-Lens: A Novel Dataset of Annotated Social Network Posts Leading to Different Forms of Communal Violence and its Evaluation

Sourav Saha, Jahedul Alam Junaed, Maryam Saleki, Arnab Sen Sharma, Mohammad Rashidujjaman Rifat, Mohamed Rahouti, Syed Ishtiaque Ahmed, Nabeel Mohammed, Mohammad Ruhul Amin


Abstract
This paper presents a computational approach for creating a dataset on communal violence in the context of Bangladesh and West Bengal of India and benchmark evaluation. In recent years, social media has been used as a weapon by factions of different religions and backgrounds to incite hatred, resulting in physical communal violence and causing death and destruction. To prevent such abusive use of online platforms, we propose a framework for classifying online posts using an adaptive question-based approach. We collected more than 168,000 YouTube comments from a set of manually selected videos known for inciting violence in Bangladesh and West Bengal. Using both unsupervised and later semi-supervised topic modeling methods on those unstructured data, we discovered the major word clusters to interpret the related topics of peace and violence. Topic words were later used to select 20,142 posts related to peace and violence of which we annotated a total of 6,046 posts. Finally, we applied different modeling techniques based on linguistic features, and sentence transformers to benchmark the labeled dataset with the best-performing model reaching ~71% macro F1 score.
Anthology ID:
2023.banglalp-1.9
Volume:
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, Ruhul Amin
Venue:
BanglaLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
72–84
Language:
URL:
https://aclanthology.org/2023.banglalp-1.9
DOI:
10.18653/v1/2023.banglalp-1.9
Bibkey:
Cite (ACL):
Sourav Saha, Jahedul Alam Junaed, Maryam Saleki, Arnab Sen Sharma, Mohammad Rashidujjaman Rifat, Mohamed Rahouti, Syed Ishtiaque Ahmed, Nabeel Mohammed, and Mohammad Ruhul Amin. 2023. Vio-Lens: A Novel Dataset of Annotated Social Network Posts Leading to Different Forms of Communal Violence and its Evaluation. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 72–84, Singapore. Association for Computational Linguistics.
Cite (Informal):
Vio-Lens: A Novel Dataset of Annotated Social Network Posts Leading to Different Forms of Communal Violence and its Evaluation (Saha et al., BanglaLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.banglalp-1.9.pdf