%0 Conference Proceedings %T SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection %A Dale, David %A Markov, Igor %A Logacheva, Varvara %A Kozlova, Olga %A Semenov, Nikita %A Panchenko, Alexander %Y Palmer, Alexis %Y Schneider, Nathan %Y Schluter, Natalie %Y Emerson, Guy %Y Herbelot, Aurelie %Y Zhu, Xiaodan %S Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) %D 2021 %8 August %I Association for Computational Linguistics %C Online %F dale-etal-2021-skoltechnlp %X This work describes the participation of the Skoltech NLP group team (Sk) in the Toxic Spans Detection task at SemEval-2021. The goal of the task is to identify the most toxic fragments of a given sentence, which is a binary sequence tagging problem. We show that fine-tuning a RoBERTa model for this problem is a strong baseline. This baseline can be further improved by pre-training the RoBERTa model on a large dataset labeled for toxicity at the sentence level. While our solution scored among the top 20% participating models, it is only 2 points below the best result. This suggests the viability of our approach. %R 10.18653/v1/2021.semeval-1.126 %U https://aclanthology.org/2021.semeval-1.126 %U https://doi.org/10.18653/v1/2021.semeval-1.126 %P 927-934