DeTox at GermEval 2021: Toxic Comment Classification

Mina Schütz, Christoph Demus, Jonas Pitz, Nadine Probol, Melanie Siegel, Dirk Labudde


Abstract
In this work, we present our approaches on the toxic comment classification task (subtask 1) of the GermEval 2021 Shared Task. For this binary task, we propose three models: a German BERT transformer model; a multilayer perceptron, which was first trained in parallel on textual input and 14 additional linguistic features and then concatenated in an additional layer; and a multilayer perceptron with both feature types as input. We enhanced our pre-trained transformer model by re-training it with over 1 million tweets and fine-tuned it on two additional German datasets of similar tasks. The embeddings of the final fine-tuned German BERT were taken as the textual input features for our neural networks. Our best models on the validation data were both neural networks, however our enhanced German BERT gained with a F1-score = 0.5895 a higher prediction on the test data.
Anthology ID:
2021.germeval-1.8
Volume:
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
Month:
September
Year:
2021
Address:
Duesseldorf, Germany
Editors:
Julian Risch, Anke Stoll, Lena Wilms, Michael Wiegand
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–61
Language:
URL:
https://aclanthology.org/2021.germeval-1.8
DOI:
Bibkey:
Cite (ACL):
Mina Schütz, Christoph Demus, Jonas Pitz, Nadine Probol, Melanie Siegel, and Dirk Labudde. 2021. DeTox at GermEval 2021: Toxic Comment Classification. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, pages 54–61, Duesseldorf, Germany. Association for Computational Linguistics.
Cite (Informal):
DeTox at GermEval 2021: Toxic Comment Classification (Schütz et al., GermEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.germeval-1.8.pdf