SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

Michał Satława, Katarzyna Zamłyńska, Jarosław Piersa, Joanna Kolis, Klaudia Firląg, Katarzyna Beksa, Zuzanna Bordzicka, Christian Goltz, Paweł Bujnowski, Piotr Andruszkiewicz


Abstract
This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language models. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.
Anthology ID:
2021.semeval-1.133
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
974–983
Language:
URL:
https://aclanthology.org/2021.semeval-1.133
DOI:
10.18653/v1/2021.semeval-1.133
Bibkey:
Cite (ACL):
Michał Satława, Katarzyna Zamłyńska, Jarosław Piersa, Joanna Kolis, Klaudia Firląg, Katarzyna Beksa, Zuzanna Bordzicka, Christian Goltz, Paweł Bujnowski, and Piotr Andruszkiewicz. 2021. SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 974–983, Online. Association for Computational Linguistics.
Cite (Informal):
SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection (Satława et al., SemEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.133.pdf