%0 Conference Proceedings
%T SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection
%A Dale, David
%A Markov, Igor
%A Logacheva, Varvara
%A Kozlova, Olga
%A Semenov, Nikita
%A Panchenko, Alexander
%Y Palmer, Alexis
%Y Schneider, Nathan
%Y Schluter, Natalie
%Y Emerson, Guy
%Y Herbelot, Aurelie
%Y Zhu, Xiaodan
%S Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
%D 2021
%8 August
%I Association for Computational Linguistics
%C Online
%F dale-etal-2021-skoltechnlp
%X This work describes the participation of the Skoltech NLP group team (Sk) in the Toxic Spans Detection task at SemEval-2021. The goal of the task is to identify the most toxic fragments of a given sentence, which is a binary sequence tagging problem. We show that fine-tuning a RoBERTa model for this problem is a strong baseline. This baseline can be further improved by pre-training the RoBERTa model on a large dataset labeled for toxicity at the sentence level. While our solution scored among the top 20% participating models, it is only 2 points below the best result. This suggests the viability of our approach.
%R 10.18653/v1/2021.semeval-1.126
%U https://aclanthology.org/2021.semeval-1.126
%U https://doi.org/10.18653/v1/2021.semeval-1.126
%P 927-934