PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection

Gretel Liz De la Peña Sarracén, Paolo Rosso


Abstract
The present paper describes the system submitted by the PRHLT-UPV team for the task 12 of SemEval-2020: OffensEval 2020. The official title of the task is Multilingual Offensive Language Identification in Social Media, and aims to identify offensive language in texts. The languages included in the task are English, Arabic, Danish, Greek and Turkish. We propose a model based on the BERT architecture for the analysis of texts in English. The approach leverages knowledge within a pre-trained model and performs fine-tuning for the particular task. In the analysis of the other languages the Multilingual BERT is used, which has been pre-trained for a large number of languages. In the experiments, the proposed method for English texts is compared with other approaches to analyze the relevance of the architecture used. Furthermore, simple models for the other languages are evaluated to compare them with the proposed one. The experimental results show that the model based on BERT outperforms other approaches. The main contribution of this work lies in this study, despite not obtaining the first positions in most cases of the competition ranking.
Anthology ID:
2020.semeval-1.209
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1605–1614
Language:
URL:
https://aclanthology.org/2020.semeval-1.209
DOI:
10.18653/v1/2020.semeval-1.209
Bibkey:
Cite (ACL):
Gretel Liz De la Peña Sarracén and Paolo Rosso. 2020. PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1605–1614, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection (De la Peña Sarracén & Rosso, SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.209.pdf