LISAC FSDM USMBA at SemEval-2021 Task 5: Tackling Toxic Spans Detection Challenge with Supervised SpanBERT-based Model and Unsupervised LIME-based Model

Toxic spans detection is an emerging challenge that aims to find toxic spans within a toxic text. In this paper, we describe our solutions to tackle toxic spans detection. The first solution, which follows a supervised approach, is based on SpanBERT model. This latter is intended to better embed and predict spans of text. The second solution, which adopts an unsupervised approach, combines linear support vector machine with the Local Interpretable Model-Agnostic Explanations (LIME). This last is used to interpret predictions of learning-based models. Our supervised model outperformed the unsupervised model and achieved the f-score of 67,84% (ranked 22/85) in Task 5 at SemEval-2021: Toxic Spans Detection.


Introduction
By dint of the massive production of user-generated content in social media, moderation becomes crucial to promote healthy online discussions by removing toxic posts and contents. However, it is nearly impossible for a human to keep tracking user-generated content. Thus, the need for the right tools and technologies to help in such a task becomes a necessity. The Toxic Spans Detection task aims to detect the spans that make a text toxic. While several toxicity detection datasets (Wulczyn et al., 2017;Borkan et al., 2019) and models (Pavlopoulos et al., 2017a(Pavlopoulos et al., , 2019Schmidt and Wiegand, 2017;Pavlopoulos et al., 2017b;Zampieri et al., 2019;Alami et al., 2020) have been released. However, these works estimate the likelihood of a document being toxic with weak interpretability. In fact, highlighting toxic spans can assist human moderators who often deal with lengthy comments, and who prefer attribution instead of just a system-generated unexplained toxicity score per post. * contributed equally In this paper, we propose two solutions to tackle toxic spans detection (Pavlopoulos et al., 2021). The first solution, which follows a supervised approach, is based on SpanBERT (Joshi et al., 2020) model that is pre-trained on span boundary objective and considers masks contiguous spans. Therefore, SpanBERT gives better spans representations and predictions. The second solution, which adopts an unsupervised approach, combines linear support vector machine (Fan et al., 2008) with the Local Interpretable Model-Agnostic Explanations (LIME) (Ribeiro et al., 2016). LIME is an explanation technique that seeks to faithfully interpret the predictions of any classifier. This paper is organized as follows: Section 2 describes the proposed methods; Section 3 presents the experimental results; Finally, Section 4 concludes and outlines future directions.

Methods
In this section, we describe the proposed solutions including SpanBERT-based method which is based on supervised approach, and SVM and LIME-based method that is based on unsupervised approach.

SpanBERT-based method
We use SpanBERT (Joshi et al., 2020) a pre-trained model built to improve spans of text representation and prediction. It differs from BERT (Devlin et al., 2019) as it (1) masks contiguous random spans, instead of random tokens; and (2) is trained on span-boundary objective, i.e., the model is optimized to predict the masked span given tokens at its boundary. We considered the toxic span text detection as a sequence labeling task. Thus, we performed a transformation to the dataset and finetuned SpanBERT to this specific task.

Data preparation
The raw dataset consists of a set of toxic texts where each element is annotated with an array that contains characters' indices. These indices are considered as the toxic span of text. In order to train SpanBERT on this dataset, we applied the pre-trained SpanBERT tokenizer to tokenize sentences, and we built the target arrays by annotating words that contain toxic characters' indices. For instance, given a sentence that contains n tokens, then the target array contains n elements, where the elements that contain a toxic character are labeled as positive "1" otherwise they are labeled as negative "0". Figure 1 illustrates the pipeline of dataset preparation.

Toxic spans detection
We considered the toxic span detection as a sequence labeling task. Therefore, we fine-tuned SpanBERT pre-trained model on token classification task. First, we tokenize the sentence and map its tokens into indices according to SpanBERT vocabulary. Next, we fed the model with tokens indices. Then, it computes tokens embeddings by applying SpanBERT pre-trained model. After that, we compute the probability if a given token is toxic by applying a linear layer followed by a softmax on tokens embeddings. Finally, the model is trained to optimize the cross-entropy loss. Figure 2 shows the flowchart of the SpanBERT-based model. It is worth noting that we filter predicted spans by removing toxic character offsets that have a size equal to one.

Data preparation
The data preparation for our unsupervised method can be summarized as follows: 2. Word-level uni-grams and bi-grams are extracted, then vectorized using TF-IDF scores.

Toxic spans detection
The toxic spans detection task adopted by our unsupervised method can be summarized as follows: 2. We apply the trained model on the SemEval 2021 Task 5: Toxic Spans Detection test set comments to predict their toxicity, then, we use the LIME technique to explain the predictions (Figure 3).
3. We discard words that contribute less to the toxic category by applying a thresholding technique. Words with a high influence score, greater or equal to the threshold, are considered toxic, therefore, we retrieve their character offsets (toxic spans). By training the linear support vector machine classifier on the SemEval 2021 Task 5: Toxic Spans Detection test set, we guarantee that the model accurately predicts the toxicity of its comments with precision, recall, f-score, and accuracy of 1 (the model correctly predict the toxicity of all 2000 reviews in the test set). Besides, we ensure that the LIME explanations are somewhat accurate. In fact, if the model misclassifies the toxicity of the comments, the LIME explanations will be inaccurate since the latter will explain wrong predictions.
From Figure 3, we can see that the words "silly" and "stupid" contribute to the toxic category 42% and 23% respectively in the following toxic comment "Please people, stop using these silly, stupid emoticons". Since we only consider words with high influence scores for the toxic category (greater or equal to 0.13), we keep the two words "silly" and "stupid", and we discard the remaining words. Next, we retrieve their character offsets from the comment as shown in Table 1.

Experimental results
We experimented our models on the SemEval 2021 Task 5: Toxic Spans Detection dataset. The training set and test set contain 7939 and 2000 toxic comments labeled with their toxic spans. All our experiments have been conducted in Google Colab environment 2 , The following libraries: Hugging Face 3 , LIME 4 , Scikit-Learn 5 , and PyTorch 6 were used to train and to asses the performance of our models.

Evaluation Metric
In order to measure the performance of our models, we employ the F1 score proposed in (Da San Martino et al., 2019). Considering a post t and a system A i which predict a set S t A i of toxic character offsets. Let denote by G t the expected character offsets. Then, the F 1 score of the model A i with respect to G for t is computed in the following manner: where | · | denotes set cardinality.

Performance Evaluation
On the one hand, we compared various pre-trained models, including BERT-base, BERT-large, Distil-BERT (Sanh et al., 2019), and SpanBERT-large, to compute tokens embeddings. All the models are based on transformers (Vaswani et al., 2017) technique. The SpanBERT model achieves the best results due to the fact that is trained with contiguous masked spans and optimizes the span boundary objective. On the other hand, we compared the logistic regression LIME (LR-LIME) to linear support vector machine LIME (LSVM-LIME). The latter produces superior scores.  [32,33,34,35,36,38,39,40,41,42,43,44]   as the weight decay. For the unsupervised technique, several experiments have been conducted to reach the suitable threshold. Actually, 0.12 and 0.13 thresholds achieved the best performances for LR-LIME and LSVM-LIME, respectively.

Conclusion
In this paper, we described our models for tackling SEMEval 2021 Task 5: Toxic Spans Detection. Two approaches have been employed. A supervised approach based on transformers technique, where toxic sequences are tokenized and embedded using pre-trained models. We optimize the likelihood of a token to be toxic by minimizing the cross-entropy loss. SpanBERT scored the best results by achieving about 0.6783 F1 score. An unsupervised approach based on shallow machine learning and LIME, which is an explanation technique that explains the prediction of any classifier in an interpretable and faithful manner. Since the top-ranked score was about 0.7083 F1 score, future studies and works will focus on improving the performance of toxic spans detection task.