ARGUABLY at ComMA@ICON: Detection of Multilingual Aggressive, Gender Biased, and Communally Charged Tweets Using Ensemble and Fine-Tuned IndicBERT

Guneet Kohli, Prabsimran Kaur, Jatin Bedi


Abstract
The proliferation in Social Networking has increased offensive language, aggression, and hate-speech detection, which has drawn the focus of the NLP community. However, people’s difference in perception makes it difficult to distinguish between acceptable content and aggressive/hateful content, thus making it harder to create an automated system. In this paper, we propose multi-class classification techniques to identify aggressive and offensive language used online. Two main approaches have been developed for the classification of data into aggressive, gender-biased, and communally charged. The first approach is an ensemble-based model comprising of XG-Boost, LightGBM, and Naive Bayes applied on vectorized English data. The data used was obtained using an Indic Transliteration on the original data comprising of Meitei, Bangla, Hindi, and English language. The second approach is a BERT-based architecture used to detect misogyny and aggression. The proposed model employs IndicBERT Embeddings to define contextual understanding. The results of the models are validated on the ComMA v 0.2 dataset.
Anthology ID:
2021.icon-multigen.7
Volume:
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification
Month:
December
Year:
2021
Address:
NIT Silchar
Editors:
Ritesh Kumar, Siddharth Singh, Enakshi Nandi, Shyam Ratan, Laishram Niranjana Devi, Bornini Lahiri, Akanksha Bansal, Akash Bhagat, Yogesh Dawer
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
46–52
Language:
URL:
https://aclanthology.org/2021.icon-multigen.7
DOI:
Bibkey:
Cite (ACL):
Guneet Kohli, Prabsimran Kaur, and Jatin Bedi. 2021. ARGUABLY at ComMA@ICON: Detection of Multilingual Aggressive, Gender Biased, and Communally Charged Tweets Using Ensemble and Fine-Tuned IndicBERT. In Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification, pages 46–52, NIT Silchar. NLP Association of India (NLPAI).
Cite (Informal):
ARGUABLY at ComMA@ICON: Detection of Multilingual Aggressive, Gender Biased, and Communally Charged Tweets Using Ensemble and Fine-Tuned IndicBERT (Kohli et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-multigen.7.pdf