fBERT: A Neural Transformer for Identifying Offensive Content

Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, Alexander Ororbia


Abstract
Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT’s performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
Anthology ID:
2021.findings-emnlp.154
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1792–1798
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.154
DOI:
10.18653/v1/2021.findings-emnlp.154
Bibkey:
Cite (ACL):
Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, and Alexander Ororbia. 2021. fBERT: A Neural Transformer for Identifying Offensive Content. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1792–1798, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
fBERT: A Neural Transformer for Identifying Offensive Content (Sarkar et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.154.pdf
Data
HatEvalOLID