A Text-to-Text Model for Multilingual Offensive Language Identification

The ubiquity of offensive content on social media is a growing cause for concern among companies and government organizations. Recently, transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance in detecting various forms of offensive content (e.g. hate speech, cyberbullying, and cyberaggression). However, the majority of these models are limited in their capabilities due to their encoder-only architecture, which restricts the number and types of labels in downstream tasks. Addressing these limitations, this study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5) trained on two large offensive language identification datasets; SOLID and CCTK. We investigate the effectiveness of combining two datasets and selecting an optimal threshold in semi-supervised instances in SOLID in the T5 retraining step. Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks. Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish). The results demonstrate that this multilingual model achieves a new state-of-the-art on all the above datasets, showing its usefulness in multilingual scenarios. Our proposed T5-based models will be made freely available to the community.


Introduction
The widespread of offensive posts on social media platforms can have detrimental effects on users' mental health among other undesirable consequences.The relation between offensive language and mental health along with potential risks of self harm and depression has been widely addressed by previous studies (Bonanno and Hymel, 2013;Bannink et al., 2014;Bucur et al., 2021).To address this important issue, one of the most commonly employed strategies is to train systems to identify offensive content (Pavlopoulos et al., 2021a) mitigating its spread on social media platforms.By proactively identifying potentially harmful content, social media platforms aim to establish a safer and more inclusive environment for all users.
Early approaches to identifying offensive language ranged from classical machine learning models, such as support vector machines (Malmasi andZampieri, 2017, 2018), to deep learning models based on word embeddings (Hettiarachchi and Ranasinghe, 2019).With the introduction of BERT (Devlin et al., 2019), transformer models have shown excellent results in offensive language identification (Zia et al., 2022).More recently, domainspecific language models for offensive language identification, such as fBERT (Sarkar et al., 2021), HateBERT (Caselli et al., 2021), and ToxicBERT1 .have provided state-of-the-art in multiple offensive language identification benchmarks.
The aforementioned models can be grouped into two main categories following their training strategies.Models such as ToxicBERT have been trained using a classification objective by adding a classification layer on top of a BERT model and training on a large offensive language dataset.A clear limitation of this approach is that the trained model can only predict the classes that appear on the dataset.On the other hand, models such as fBERT (Sarkar et al., 2021) and HateBERT (Caselli et al., 2021) have been trained with a masked language modelling (MLM) objective.fBERT (Sarkar et al., 2021) has been trained on the offensive tweets in the SOLID (Rosenthal et al., 2021) dataset, while HateBERT (Caselli et al., 2021) has been trained on banned posts from Reddit Abusive Language dataset.The MLM strategy is not dependent on the number of classes present in the dataset.However, it is not possible to concatenate two datasets annotated with different annotation taxonomies under this strategy without mapping them into a common label (e.g. a general offensive class).This is a critical issue in offensive language identification as different datasets use different annotation schemes and problem formulations (e.g.hate speech, offensive, toxic, profanity).As a result, MLM based models are only trained on one dataset, which can limit their capabilities.
To address this important shortcoming, we introduce FT5, a pre-trained T5 model (Raffel et al., 2020) trained on two large-scale offensive language identification datasets.Since T5 follows a text-totext approach, it does not rely on a classification layer.Therefore T5 (Raffel et al., 2020) can be used to train an offensive language identification model using different datasets without relying on the number of classes.We show that the proposed FT5 outperforms the plain T5 implementation as well as HateBERT (Caselli et al., 2021) and fBERT (Sarkar et al., 2021) on various offensive and hate speech detection tasks.To the best of our knowledge, this is the first pre-trained offensive language identification model based on T5.
All the previous models, such as ToxicBERT, HateBERT (Caselli et al., 2021) and fBERT (Sarkar et al., 2021) only supports English and training a large language model using similar approaches in low-resource languages can be difficult due to data scarcity.In this paper, we address this limitation by training a multilingual offensive language model, mFT5, which uses mT5 (Xue et al., 2021) as the base model.the results confirm that fine-tuned mFT5 produces state-of-the-art results in six languages, outperforming strong transformer-based models.To the best of our knowledge, mFT5 is the first multilingual model on offensive language opening exciting avenues for a multitude of languages.
The contributions of this paper are as follows: 1.An empirical evaluation of semi-supervised learning techniques that can be applied to train text-to-text models such as T5 (Raffel et al., 2020) and mT5 (Xue et al., 2021) in offensive language identification 2. A comprehensive evaluation of the effect of combining different datasets in pre-training text-to-text models.
3. The first-ever cross-lingual evaluation of mT5 (Xue et al., 2021) model in both high-resource and low-resource language settings.
4. The release of the FT5 and mFT5 made freely available to the research community, which are high-performing, state-of-the-art pre-trained models based on T5 for English and multilingual offensive language identification2 .
2 Related Work

Offensive Language Identification
The use of large pre-trained transformer models has become prevalent in NLP.This includes the development of various offensive language identification systems, which are based on transformer architectures, such as BERT (Devlin et al., 2019).These systems have demonstrated top performance in well-known competitions such as HASOC (Mandl et al., 2019), HatEval (Basile et al., 2019), Offen-sEval (Zampieri et al., 2019b(Zampieri et al., , 2020)), TRAC (Kumar et al., 2018), and TSD (Pavlopoulos et al., 2021b).Many of these competitions feature multilingual datasets opening opportunities for the use of cross-lingual models (Ranasinghe andZampieri, 2020, 2021;Nozza, 2021).The outstanding results achieved by these systems provide concrete evidence that pre-trained transformer models are well-suited for detecting offensive content in both monolingual and multilingual settings.User-generated content and offensive language online possess unique characteristics that are often not adequately captured by models trained on standard texts.Consequently, research has focused on the task of fine-tuning pre-trained models specifically for this challenging domain.There are several transformer models such as Hate-BERT (Caselli et al., 2021), fBERT (Sarkar et al., 2021) built for this purpose.In this study, we address the aforementioned limitations of fine-tuned transformer-based models.We propose the first multilingual domain-specific pre-trained offensive language identification model.

T5 Models
T5 models introduced by Raffel et al. (2020) have been widely used in many NLP tasks such as text classification (Bird et al., 2023), semantic similarity (Ni et al., 2022) and named entity recognition (Tavan and Najafi, 2022).As the T5 architecture follows a text-to-text approach, multi task learning can be used to improve the results with t5 (Raffel et al., 2020).Following the initial T5 model, multilingual T5 (mT5) models have also been proposed by Xue et al. (2021) which has provided excellent results in multilingual benchmarks.Several studies have used T5 in offensive language identification (Sabry et al., 2022;Adewumi et al., 2022).However, these studies only fine-tune the general T5 models for offensive language identification.To the best of our knowledge, this paper presents the first pre-trained domain specific T5 model for offensive language identification.

Methodology
Training Data In this study, we use two large offensive language identification datasets to retrain the T5 models; SOLID (Rosenthal et al., 2021) with over 9 million English tweets and CCTK with over 1.8 million posts from the civil comments platform.SOLID was the official dataset of SemEval-2020 Task 12 (OffensEval) (Zampieri et al., 2020).SOLID's annotation follows the OLID taxonomy (Zampieri et al., 2019a), which uses its level A (offensive vs non-offensive).Each instance in SOLID has been annotated semi-automatically with the mean and the standard deviation (STD) of the value for offensiveness predicted by four different machine learning models.CCTK was released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle competition3 .Each instance in CCTK has been annotated with one of the binary labels (toxic vs not toxic).Finally, we used multiple datasets for testing presented in Sections 4 and 5.
Retraining T5 We select the t5-large4 (Raffel et al., 2020) and train it using the instances from SOLID (Rosenthal et al., 2021) and CCTK.For the instances in SOLID, the input texts to the model were tweets, and the output texts were "OFF" if the mean value in SOLID is above 0.5 and "NOT" otherwise.We used different thresholds (0.05, 0.1, For each threshold, we consider appending/ not appending the CCTK dataset.For the instances in CCTK, the input texts to the model were the posts, and the output texts were "TOX" if the text is toxic and "NOT" if they are not toxic.As shown in Figure 1 we use "OLID_A" prefix for SOLID instances and "CCTK" prefix for CCTK instances.To create mFT5, we select mt5-large (Xue et al., 2021) 5 and repeat the same process.For both models, we use the same configurations; a batch-size of 16, Adam optimiser with learning rate 1e−4, and a linear learning rate warm-up over 10% of the training data and trained the models over ten epochs.We use a cluster of four GeForce RTX 3090 GPUs to train the models.

English Experiments and Results
To determine the effectiveness and portability of the trained FT5, we conducted a series of experiments using benchmark datasets in English and compared our model with a general-purpose T5 model.We used the same set of configurations for all the datasets evaluated in order to ensure consistency between all the experiments.This also provides a good starting configuration for researchers who intend to use FT5 on a new dataset.For the sentence-level tasks; the input to the model is the text and the output is the related label.For the token level tasks the input to the model is the text and the output is the text with "[OFF]" placeholders infront of the offensive tokens as shown in Figure 2.
For each dataset, we used different task spe- cific prefixes (OLID_A for OLID, TSD for toxic spans detection etc.).We used a batch-size of eight, Adam optimizer with learning rate 1e−4, and a linear learning rate warm-up over 10% of the training data.During the training process, the parameters of the transformer model were updated.The models were trained using only training data.Furthermore, they were evaluated while training using an evaluation set that had one fifth of the rows in training data.We performed early stopping if the evaluation loss did not improve over ten evaluation steps.All the models were trained for three epochs.These experiments were also conducted in a GeForce RTX 3090 GPU.All the experiments were conducted for ten times and we report the mean and standard deviation for each experiment.TRAC TRAC was released for TRAC shared task 2020 (Kumar et al., 2020).The dataset has 4200 training and 1200 test instances with three classes: overtly aggressive, covertly aggressive and non-aggressive.TRAC is the most heterogeneous dataset we used in terms of data sources containing posts from Facebook, Twitter, and YouTube.
TSD For token-level prediction we use TSD, released within the scope of SemEval-2021 Task 5: Toxic Spans Detection for English (Pavlopoulos et al., 2021a).The dataset contains 10,000 posts (comments) from the publicly available Civil Comments dataset (Borkan et al., 2019).If a post is toxic, it has been annotated for its toxic spans.
HateX HateXplain dataset (Mathew et al., 2021) was also for offensive language identification at the token level.The dataset contains 11535 training and 3844 testing instances from GAB and twitter.
We only used the word level annotations.

Sentence-level Offensive Language Identification
To evaluate the sentence-level tasks, we used the macro F1 score computed on predicted sentence labels and gold sentence labels.We present the results for the SOLID data selection thresholds and data augmentation with CCTK in Table 1 in terms of F 1 Macro.For most of the datasets tested, the 0.1 STD SOLID threshold combined with the CCTK provided the best results.Having a large number of training instances provides better results up to a certain STD threshold in SOLID, and results do not improve with adding further training instances.Furthermore, the results show that the combination of CCTK and SOLID provides better results than having one dataset in the training set.This confirms our previous assumption that T5 can take advantage of multiple datasets via text to text transfer learning.
We select the T5 model retrained on SOLID filtered with 0.1 STD combined with the CCTK dataset as the FT5 model, which provided the best result in most of the datasets.We then compare the performance of FT5 with fBERT and HateBERT.As can be seen in Table 2, FT5 outperforms fBERT and HateBERT in all of the datasets.Since the datasets contain offensive language identification, fine grained offensive language identification and fine-grained aggression identification, we can validate the effectiveness of the proposed FT5 model for offensive and aggressive language sentencelevel classification tasks.

Sentence-level
Token-level

Token-level Offensive Language Identification
Multiple studies on token-level offensive language identification has discussed the need for accurate token-level predictions for improved model explainability (Mathew et al., 2021;Zampieri et al., 2023).Motivated by recent studies, we investigate our model performance on token-level offensive language identification.The token-level tasks were evaluated using the macro F1 score computed on predicted character offsets and gold character offsets (Da San Martino et al., 2019).FT5 outperforms fBERT and HateBERT in all of the token-level offensive language identification datasets too as can be seen in clear improvement with the TSD dataset where the FT5 model outperforms the fBERT model by 0.11 macro F1 score which is over a 20% boost.

Multilingual Experiments and Results
To determine the effectiveness and portability of our multilingual model; mFT5, we conducted a series of experiments using benchmark datasets covering high-resource, mid-resource and lowresource languages.These datasets are summarised in Table 4.We used the same set of configurations we used for English experiments.The models were trained using the training set and evaluated on the test sets of each dataset.

Sentence-level Offensive Language Identification
For sentence-level offensive language identification, we mapped the labels of each dataset to its closet annotation scheme as we did for English benchmarks; OLID level A (offensive, not offensive) or CCTK (toxic, non toxic).Following this, Spanish, Hindi, Korean, Sinhala and Marathi labels were mapped to OLID level A, and German labels were mapped to CCTK as shown in sentence-level column in Table 4.We use "OLID_A" prefix for Spanish, Hindi, Korean, Sinhala and Marathi instances and "CCTK" prefix for German instances.
In the training process, we started with different pre-trained models on the configurations described in Section 3. The input to the model is the text preceded by the relevant prefix, and the output is the related label.We performed individual experiments for each language separately.
We present the results for the SOLID data selection thresholds and data augmentation with CCTK in Table 5 in terms of F 1 Macro for each test set.
For most of the datasets tested, the 0.1 STD SOLID threshold combined with the CCTK provided the best results.Having a large number of training instances provides better results up to a certain STD threshold in SOLID, and results do not improve with adding further training instances.Furthermore, the results show that the combination of CCTK and SOLID provides better results than having one dataset in the training set.This confirms our previous assumption that T5 can take advantage of multiple datasets via text to text transfer learning.We select the T5 model retrained on SOLID filtered with 0.1 STD combined with the CCTK dataset as the FT5 model, which provided the best result in five out of six datasets.
We then compare the performance of mFT5 with mBERT and XLM-R base models.These models are trained on the training set of each dataset by adding a classification layer on top of the transformer model.As shown in Table 6, mFT5 outperforms mBERT and XLM-R in all of the datasets.Since these datasets contain high-resource and lowresource languages as well as data from different social media platforms, we can validate the effectiveness of the proposed mFT5 model for offensive language identification in multiple languages and platforms.

Dataset Model
Macro F1 Zero-shot Offensive Language Identification -We also experimented with zero-shot cross-lingual offensive language identification with the mFT5 model.With this setting, we did not train the mFT5 on the language-specific training data.The results are shown in mFT5* rows in Table 6.While zeroshot cross-lingual experiments did not provide the best results, they provided very competitive results compared to the baselines.The results confirm the strong cross-lingual nature of the pre-trained mFT5 model in detecting offensive language.It should also be noted that an MLM approach similar to fBERT and HateBERT needs labelled data to fine-tune and will not be able to provide zeroshot offensive language identification.Therefore, mFT5 is useful for low-resource languages where the training data is scarce.

Token-level Offensive Language Identification
We also experimented with token-level offensive language identification in the datasets where the token-level labels are available; Korean and Sinhala.The input to the model is the text, and the output is the text with "[OFF]" placeholders in front of the offensive tokens.To evaluate the models, we used the macro F1 score computed on predicted offensive tokens and gold offensive tokens.We compared the results with token classification architecture in mBERT and XLM-R large models.As shown in Table 7, mFT5 outperforms mBERT and XLM-R in all of the token-level offensive language identification datasets too.It should be noted that pre-trained models with task-specific heads such as toxic-bert will not be able to perform tokenlevel tasks.However, our text-to-text approach in mFT5 provided state-of-the-art results at tokenlevel too.

Conclusion and Future Work
Neural transformer models have outperformed previous state-of-the-art deep learning models across different NLP tasks including offensive language identification.Following impressive results in international benchmark competitions, domainspecific pre-trained neural transformers such as ToxicBERT, fBERT and HateBERT have been proposed for offensive language identification.As discussed in this paper, these models have limitations which makes it difficult extend them in to different datasets.We address these limitations by proposing FT5, a t5-large model that has been trained using over 2 million instances from the SOLID and CCTK datasets.The FT5 model achieves better results in both sentence-level and token-level tasks across different offensive language identification benchmarks.This paper also introduced mFT5.To the best of our knowledge, mFT5 is the first pre-trained multilingual offensive language detection model.The model uses the mt5-large and is trained using the same data used to train FT5.We show that the proposed FmT5 model achieves better results in both sentence-level and token-level tasks compared to the mBERT, XLM-R, and the vanilla mT5 model across different offensive language identification benchmark datasets.We show that the model performs consistently well across different languages and platforms.Furthermore, the model showed strong zero-shot cross-lingual results opening exciting new avenues for multilingual offensive language detection.
In future work, we would like to extend the proposed FT5 model to the identification of offensive spans along with their targets using the recently released TBO dataset (Zampieri et al., 2023).We believe that modelling targets and offensive expressions jointly is an important step towards improving explainability in offensive language identification systems.Another important direction we have been exploring is the computational efficiency.We have recently experimented with teacher-student architectures using knowledge distillation (KD) and we have shown that the use of KD results in lightweight models that are more computationally efficient and perform on par with larger models (Ranasinghe and Zampieri, 2023).We would like to investigate teacher-student architectures using the proposed FT5 model.Finally, we are interested in the application of multilingual models to low-resource scenarios.Thousands of languages and dialects are spoken in the world, but research on offensive language identification is still (mostly) restricted to English and a few other high-resource languages.We believe that the release of mT5 will encourage research on offensive language identification models for low-resource languages, dialects, and other challenging linguistic scenarios such as code-mixed texts.

Limitations
Training a T5 model requires a large amount of computing resources.We noted that training a T5 model can take more GPU resources than training a BERT model such as fBERT (Sarkar et al., 2021).Therefore, we did not experiment with large T5 models such as T5-XL and T5-XXL.While these models might perform better than T5-Large models we experimented with, they would consume more resources, limiting their potential use cases.

Ethics Statement
FT5 and mFT5 are essentially T5 models for offensive language identification, which is trained on multiple publicly available datasets.We used multiple datasets referenced in this paper which were previously collected and annotated to evaluate the models.No new data collection has been carried out as part of this work.We have not collected or processed writers'/users' information, nor have we carried out any form of user profiling to protect users' privacy and identity.
We believe that content moderation should be a trustworthy and transparent process applied to clearly harmful content so it does not hinder individual freedom of expression rights.We encourage research in automatically detecting offensive content on the web trough a trustworthy and transparent process.Using our proposed models for this purpose will alleviate the psychological burden for social media moderators who are exposed to large amounts offensive content while ensuring a more transparent moderation process.
The computational experiments in this paper were conducted on an Aston EPS Machine Learning Server, funded by the EPSRC Core Equipment Fund, Grant EP/V036106/1.

Table 2 :
The test set macro F 1 scores for sentence-level datasets and models.Results are ordered by performance.Best results are shown in bold font.

Table 3 .
There is a

Table 3 :
The test set macro F 1 scores for sentence-level datasets and models.Results are ordered by performance.Best results are shown in bold font.

Table 4 :
Datasets that were used to evaluate mFt5 model.Source column displays the platform data extracted,Train, dev, test size column shows the number of instances of the train, dev and test sets.Label column shows the original labels and Sentence-level column show the output label in sentence-level experimenst discussed in Section 5. Token-level column shows the availability of the token-level data.

Table 5 :
mFT5 results for different multilingual offensive language detection benchmarks.

Table 6 :
The test set macro F 1 scores for coarse-grained offensive language detection.Results are ordered by performance.The best results are shown in bold font.

Table 7 :
The test set macro F 1 scores for token-level offensive language detection.Results are ordered by performance.The best results are shown in bold font.