Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers

Sumer Singh, Sheng Li


Abstract
Offensive language detection (OLD) has received increasing attention due to its societal impact. Recent work shows that bidirectional transformer based methods obtain impressive performance on OLD. However, such methods usually rely on large-scale well-labeled OLD datasets for model training. To address the issue of data/label scarcity in OLD, in this paper, we propose a simple yet effective domain adaptation approach to train bidirectional transformers. Our approach introduces domain adaptation (DA) training procedures to ALBERT, such that it can effectively exploit auxiliary data from source domains to improve the OLD performance in a target domain. Experimental results on benchmark datasets show that our approach, ALBERT (DA), obtains the state-of-the-art performance in most cases. Particularly, our approach significantly benefits underrepresented and under-performing classes, with a significant improvement over ALBERT.
Anthology ID:
2021.woah-1.1
Volume:
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)
Month:
August
Year:
2021
Address:
Online
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–5
Language:
URL:
https://aclanthology.org/2021.woah-1.1
DOI:
10.18653/v1/2021.woah-1.1
Bibkey:
Cite (ACL):
Sumer Singh and Sheng Li. 2021. Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages 1–5, Online. Association for Computational Linguistics.
Cite (Informal):
Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers (Singh & Li, WOAH 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.woah-1.1.pdf
Video:
 https://aclanthology.org/2021.woah-1.1.mp4
Data
OLID