XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders

Xiangjue Dong, Jinho D. Choi


Abstract
This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media. For the individual models, deep transformer layers are applied to perform multi-head attentions. For the ensemble model, the utterance representations taken from those individual models are concatenated and fed into a linear decoder to make the final decisions. Our ensemble model outperforms the individual models and shows up to 8.6% improvement over the individual models on the development set. On the test set, it achieves macro-F1 of 90.9% and becomes one of the high performing systems among 85 participants in the sub-task A of this shared task. Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.
Anthology ID:
2020.semeval-1.299
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Venues:
COLING | SemEval
SIGs:
SIGLEX | SIGSEM
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
2244–2250
Language:
URL:
https://aclanthology.org/2020.semeval-1.299
DOI:
10.18653/v1/2020.semeval-1.299
Bibkey:
Cite (ACL):
Xiangjue Dong and Jinho D. Choi. 2020. XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2244–2250, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language Identification in Social Media Using Transformer Encoders (Dong & Choi, SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.299.pdf
Data
OLID