Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models

Shuohuan Wang; Jiaxiang Liu; Xuan Ouyang; Yu Sun

doi:10.18653/v1/2020.semeval-1.189

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models

Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun

Abstract

This paper describes Galileo’s performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

Anthology ID:: 2020.semeval-1.189
Volume:: Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:: December
Year:: 2020
Address:: Barcelona (online)
Editors:: Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:: SemEval
SIG:: SIGLEX
Publisher:: International Committee for Computational Linguistics
Note:
Pages:: 1448–1455
Language:
URL:: https://aclanthology.org/2020.semeval-1.189/
DOI:: 10.18653/v1/2020.semeval-1.189
Bibkey:
Cite (ACL):: Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, and Yu Sun. 2020. Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1448–1455, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):: Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models (Wang et al., SemEval 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.semeval-1.189.pdf

PDF Cite Search Fix data