Johannes Bernhard


2020

pdf bib
CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling
Kathryn Chapman | Johannes Bernhard | Dietrich Klakow
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present our submission and results for SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) where we participated in offensive tweet classification tasks in English, Arabic, Greek, Turkish and Danish. Our approach included classical machine learning architectures such as support vector machines and logistic regression combined in an ensemble with a multilingual transformer-based model (XLM-R). The transformer model is trained on all languages combined in order to create a fully multilingual model which can leverage knowledge between languages. The machine learning model hyperparameters are fine-tuned and the statistically best performing ones included in the final ensemble.