Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal is to transfer detoxification ability to another language for which we do not have such a corpus. Moreover, we are the first to explore a new task where text translation and detoxification are performed simultaneously, providing several strong baselines for this task. Finally, we introduce new automatic detoxification evaluation metrics with higher correlations with human judgments than previous benchmarks. We assess the most promising approaches also with manual markup, determining the answer for the best strategy to transfer the knowledge of text detoxification between languages.


Introduction
The original monolingual task of text detoxification can be considered as text style transfer (TST), where the goal is to build a function that, given a source style s src , a destination style s dst , and an input text t src to produce an output text t dst such that: (i) the style is indeed changed (in case of detoxification from toxic into neutral); (ii) the content is saved as much as possible; (iii) the newly generated text is fluent.
The task of detoxification was already addressed with several approaches.Firstly, several unsupervised methods based on masked language modelling (Tran et al., 2020;Dale et al., 2021) and disentangled representations for style and content (John et al., 2019;dos Santos et al., 2018) were explored.More recently, Logacheva et al. (2022b) showed the superiority of supervised seq2seq models for detoxification trained on a parallel corpus of crowdsourced toxic ↔ neutral sentence pairs.Afterwards, there were experiments in multilingual detoxification.However, crosslingual transfer between languages with multilingual seq2seq models was shown to be a challenging task (Moskovskiy et al., 2022).
In this work, we aim to fill this gap and present an extensive overview of different approaches for cross-lingual text detoxification methods (tested in English and Russian), showing that promising results can be obtained in contrast to prior findings.Besides, we explore combining of two seq2seq tasks/models in a single one to achieve computational gains (i.e., avoid the need to store and perform inference with several models).Namely, we conduct simultaneous translation and style transfer experiments, comparing them to a step-by-step pipeline.The contributions of this work are as follows:

Data
• We present a comprehensive study of crosslingual detoxification transfer methods, • We are the first to explore the task of simultaneous detoxification and translation and test several baseline approaches to solve it, • We present a set of updated metrics for automatic evaluation of detoxification improving correlations with human judgements.

Related Work
Text Detoxification Datasets Previously, several datasets for different languages were released for toxic and hate speech detection.For instance, there exist several versions of Jigsaw datasetsmonolingual (Jigsaw, 2018) for English and multilingual (Jigsaw, 2020) covering 6 languages.In addition, there are corpora specifically for Russian (Semiletov, 2020), Korean (Moon et al., 2020), French (Vanetik and Mimoun, 2022) languages, inter alia.These are non-parallel classification datasets.In previous work on detoxification methods, such kind of datasets were used to develop and test unsupervised text style transfer approaches (Wu et al., 2019;Tran et al., 2020;Dale et al., 2021;Hallinan et al., 2022).However, lately a parallel dataset ParaDetox for training supervised text detoxification models for English was released (Logacheva et al., 2022b) similar to previous parallel TST datasets for formality (Rao and Tetreault, 2018;Briakou et al., 2021).Pairs of toxic-neutral sentences were collected with a pipeline based on three crowdsourcing tasks.The first task is the main paraphrasing task.Then, the next two tasks -content preservation check and toxicity classification -are used to verify a paraphrase.Using this crowdsourcing methodology, a Russian parallel text detoxification dataset was also collected (Dementieva et al., 2022).We base our cross-lingual text detoxification experiments on these comparably collected data (cf.Table 2).

Dev Test Total
English (Logacheva et al., 2022b) 18 777 988 671 20 436 Russian (Dementieva et al., 2022) 5 058 1 000 1 000 7 058 Text Detoxification Models Addressing text detoxification task as seq2seq task based on a par-allel corpus was shown to be more successful than the application of unsupervised methods by Logacheva et al. (2022b).For English methods, the fine-tuned BART model (Lewis et al., 2020) on English ParaDetox significantly outperformed all the baselines and other seq2seq models in both automatic and manual evaluations.For Russian in (Dementieva et al., 2022), there was released ruT5 model (Raffel et al., 2020) fined-tuned on Russian ParaDetox.These SOTA monolingual models for English1 and Russian2 are publicly available.
Multilingual Models Together with pre-trained monolingual language models (LM), there is a trend of releasing multilingual models covering more and more languages.For instance, the NLLB model (Costa-jussà et al., 2022) is pretrained for 200 languages.However, large multilingual models can have many parameters (NLLB has 54.5B parameters), simultaneously requiring a vast amount of GPU memory to work with it.
As the SOTA detoxification models were finetuned versions of T5 and BART, we experiment in this work with multilingual versions of them -mT5 (Xue et al., 2021) and mBART (Tang et al., 2020).The mT5 model covers 101 languages and has several versions.The mBART model has several implementations and several versions as well.We use mBART-50, which covers 50 languages.Also, we use in our experiments the M2M100 model (Fan et al., 2021) that was trained for translation between 100 languages.All these models have less than 1B parameters (in large versions).

Cross-lingual Knowledge Transfer
A common case is when data for a specific task is available for English but none for the target language.In this situation, techniques for knowledge transfer between languages are applied.
One of the approaches usually used to address the lack of training data is the translation approach.It was already tested for offensive language classification (El-Alami et al., 2022;Wadud et al., 2023).The idea is to translate the training data in the available language into the target language and train the corresponding model based on the new translated dataset.
The methods for zero-shot and few-shot text style transfer were already explored.In (Krishna et al., 2022), the operation between style and lan- guage embeddings is used to transfer style knowledge to a new language.The authors in (Lai et al., 2022b) use adapter layers to incorporate the knowledge about the target language into a TST model.
For text detoxification, only in (Moskovskiy et al., 2022) cross-lingual setup was explored through the translation of inputs and outputs of a monolingual system.It has been shown that detoxification trained for English using a multilingual Transformer is not working for Russian (and vice versa).In this work, we present several approaches to cross-lingual detoxification, which, in contrast, yield promising results.

Simultaneous Text Generation&Translation
The simultaneous translation and text generation was already introduced for text summarization.Several datasets with a wide variety of languages were created (Perez-Beltrachini and Lapata, 2021;Hasan et al., 2021).The main approaches to tackle this task -either to perform step-by-step text generation and translation or train a supervised model on a parallel corpus.To the best of our knowledge, there were no such experiments in the domain of text detoxification.This work provides the first experiments to address this gap.

Cross-lingual Detoxification Transfer
In this section, we consider the setup when a parallel detoxification corpus is available for a resource-rich language (e.g., English), but we need to perform detoxification for another language such corpus is unavailable.We test several approaches that differ by the amount of data and computational sources listed below.

Backtranslation
One of the baseline approaches is translating input sentences into the language for which a detoxification model is available.For instance, we first train a detoxification model on available English ParaDetox.Then, if we have an input sentence in another language, we translate it into English, perform detoxification, and translate it back into Russian (Figure 1).Thus, for this approach, we require two models (one model for translation and one for detoxification) and three inferences (one for translation from the target language into the available language, text detoxification, and translation back into the target language).
In previous work (Moskovskiy et al., 2022), Google Translate API and FSMT (Ng et al., 2019) models were used to make translations.In this work, we extend these experiments with two additional models for translation: • Helsinki OPUS-MT (Tiedemann and Thottingal, 2020) -Transformer-based model trained specifically for English-Russian translation. 3 • Yandex Translate API available from Yandex company and considered high/top quality for the Russian-English pair. 4   We test the backtranslation approach with two types of models: (i) SOTA models for corresponding monolingual detoxification; (ii) multilingual LM.

Training Data Translation
Another way of how translation can be used is the translation of available training data.If we have available training data in one language, we can fully translate it into another and use it to train a separate detoxification model for this language (Figure 2).For translation, we use the same models described in the previous section.
As detoxification corpus is available for the target language in this setup, we can fine-tune either multilingual LM where this language is present or monolingual LM if it is separately pre-trained for the required language.Compared to the previous approach, this method requires a fine-tuning step that implies additional computational resources.

Multitask Learning
Extending the idea of using translated ParaDetox, we can add additional datasets that might help improve model performance.
We suggest multitasking training for crosslingual detoxification transfer.We take a multilingual LM where resource-rich and target languages are available.Then, for the training, we perform multitask procedure which is based on the following tasks: (i) translation between the resourcerich language and target language; (ii) paraphrasing for the target language; (iii) detoxification for the resource-rich language for which original Pa-raDetox is available; (iv) detoxification for the target language based on translated data.
Even if the LM is already multilingual, we suggest that the translation task data help strengthen the bond between languages.As the detoxification task can be seen as a paraphrasing task as well, the paraphrasing data for the target language can add knowledge to the model of how paraphrasing works for this language.Then, the model is basically trained for the detoxification task on the available data.
To eliminate the translation step, we present a new approach based on the Adapter Layer idea (Houlsby et al., 2019).The usual pipeline of seq2seq generation process is: We add an additional Adapter layer in the model: where Adapter = Linear(ReLU (Linear(x))) and gets as input the output embeddings from encoder.
Any multilingual pre-trained model can be taken for a base seq2seq model.Then, we integrate the Adapter layer between the encoder and decoder blocks.For the training procedure, we train the model on a monolingual ParaDetox corpus available.However, we do not update all the weights of all model blocks, only the Adapter.As a result, we force the Adapter layer to learn the information about detoxification while the rest of the blocks save the knowledge about multiple languages.We can now input the text in the target language during inference and obtain the corresponding detoxified output (Figure 3).Compared to previous approaches, the Adapter training requires only one model fine-tuning procedure and one inference step.While in (Lai et al., 2022b) there were used several Adapter layers pre-trained specifically for the language, we propose to use only one layer between the encoder and decoder of multilingual LM that will incorporate the knowledge about the task.
For this approach, we experiment with the M2M100 and mBART-50 models.While the M2M100 model is already trained for the translation task, this version of mBART is pre-trained only on the denoising task.Thus, we additionally pre-train this model on paraphrasing and translation corpora used for the Multitask approach.During the training and inference with the mBART model, we explicitly identify which language the input and output are given or expected with special tokens.

Detox&Translation
The setup of simultaneous detoxification and translation occurs when the toxic and non-toxic parts of the training parallel dataset are in different languages.For instance, a toxic sentence in a pair is in English, while its non-toxic paraphrase is in Russian.
The baseline approach to address text detoxification from one language to another can be to perform step-by-step detoxification and translation.However, that will be two inference procedures, each potentially with a computationally heavy seq2seq model.To save resources for one inference, in this section, we explore the models that can perform detoxification and translation in one step.While for cross-lingual text summarization, parallel datasets were obtained, there are no such data for text detoxification.The proposed approach is creating a synthetic cross-lingual detoxification dataset (Figure 4).Then, we train simultaneously model for detoxification as well as for translation.The models described in the section above were also used for the translation step of parallel corpora.

Evaluation Setups
There are plenty of work developing systems for text detoxification.Yet, in each work, the comparison between models is made by automatic metrics that are not unified, and their choice may be arbitrary (Ostheimer et al., 2023).There are several recent works that studied the correlation between automatic and manual evaluation for text style transfer tasks -formality (Lai et al., 2022a) and toxicity (Logacheva et al., 2022a).Our work presents a new set of metrics for automatic evaluation for English and Russian languages, confirming our choice with correlations with manual metrics.
For all languages, the automatic evaluation consists of three main parameters: • Style transfer accuracy (STA a ): percentage of non-toxic outputs identified by a style classifier.In our case, we train for each language corresponding toxicity classifier.
• Content preservation (SIM a ): measurement of the extent to which the content of the original text is preserved.
• Fluency (FL a ): percentage of fluent sentences in the output.
The aforementioned metrics must be properly combined to get one Joint metric to rank models.We calculate J as following: where the scores STA(x i ), SIM(x i ), FL(x i ) ∈ {0, 1} meaning the belonging to the corresponding class.

Automatic Evaluation for English
Our setup is mostly based on metrics previously used by (Logacheva et al., 2022b): only the content similarity metric is updated as other metrics obtain high correlations with human judgments.
Style accuracy STA a metric is calculated with a RoBERTa-based (Liu et al., 2019) style classifier trained on the union of three Jigsaw datasets (Jigsaw, 2018).
Content similarity Before, SIM old a was estimated as cosine similarity between the embeddings of the original text and the output computed with the model of (Wieting et al., 2019).This model is trained on paraphrase pairs extracted from ParaNMT (Wieting and Gimpel, 2018) corpus.
We propose to estimate SIM a as BLEURT score (Sellam et al., 2020).In (Babakov et al., 2022), a large investigation on similarity metrics for paraphrasing and style transfer tasks.The results showed that the BLEURT metric has the highest correlations with human assessments for text style transfer tasks for the English language.
Fluency FL a is the percentage of fluent sentences identified by a RoBERTa-based classifier of linguistic acceptability trained on the CoLA dataset (Warstadt et al., 2019).

Automatic Evaluation for Russian
The set of previous and our proposed metrics is listed below (the setup to compare with is based on (Dementieva et al., 2022)): Style accuracy In (Dementieva et al., 2022), STA old a is computed with a RuBERT Conversational classifier (Kuratov and Arkhipov, 2019) fine-tuned on Russian Language Toxic Comments dataset collected from 2ch.hk and Toxic Russian Comments dataset collected from ok.ru.
In our updated metric STA a , we change the toxicity classifier using the more robust to adversarial attacks version presented in (Gusev, 2022).
Content similarity Previous implementation of SIM old a is evaluated as a cosine similarity of LaBSE (Feng et al., 2022) sentence embeddings.
We still calculate SIM a as cosine similarity, but we use for embeddings RuBERT Conversational fine-tuned on three additional datasets: Russian Paraphrase Corpus (Gudkov et al., 2020), Ru-PAWS (Martynov et al., 2022), and content eval-uation part from Russian parallel corpus (Dementieva et al., 2022).
Fluency Previous metric FL old a is measured with a BERT-based classifier (Devlin et al., 2019) trained to distinguish real texts from corrupted ones.The model was trained on Russian texts and their corrupted (random word replacement, word deletion, insertion, word shuffling, etc.) versions.
In our updated metric FL a , to make it symmetric with the English setup, fluency for the Russian language is also evaluated as a RoBERTabased classifier fine-tuned on the language acceptability dataset for the Russian language RuCoLA (Mikhailov et al., 2022).We use the manual assessments available from (Dementieva et al., 2022) to calculate correlations with manual assessments.We have 850 toxic samples in the test set evaluated manually via crowdsourcing by three parameters -toxicity, content, and fluency.We can see in Table 3 the correlations between human assessments and new metrics are higher than for the previous evaluation setup (see details in Appendix C).
To calculate SIM metric for Detox&Translation task we use the monolingual version of SIM for the target language, comparing the output with the input translated into the target language.For instance, if Detox&Translation is done from English to Russian, we translate English toxic input to Russian language and compare it with the output using Russian SIM a .

Manual Evaluation
As the correlation between automatic and manual scores still has room for improvement, we also evaluate selected models manually.We invited three annotators fluent in both languages to markup the corresponding three parameters of

Results
The automatic evaluation results are presented in Table 5. Together with the metrics evaluation, we also assess the proposed methods based on the required resources (Table 4).We take test sets provided for both English and Russian datasets for evaluation (as presented in Table 2).Firstly, we report scores of humans reference and trivial duplication of the input toxic text.

Cross-lingual Detoxification Transfer
From Table 5, we see that backtranslation approach performed with SOTA monolingual detoxification models yields the best TST scores.This is the only approach that does not require additional model fine-tuning.However, as we can see from Table 4, it is dependent on the constant availability of translation system which concludes in three inference steps.
Training Data Translation approach for both languages shows the J score at the level of cond-BERT baseline.While SIM and FL scores are the same or even higher than monolingual SOTA, the STA scores drop significantly.Some toxic parts in translated sentences can be lost while translating the toxic part of the parallel corpus.It is an advantage for the Backtranslation approach as we want to reduce toxicity only in output, while for training parallel detox corpus, we lose some of the toxicity representation.However, this approach can be used as a baseline for monolingual detoxification (examples of translation outputs in Ap- The adapter for the M2M100 model successfully compresses detoxification knowledge but fails to transfer it to another language.The results are completely different for additionally finetuned mBART.This configuration outperforms all unsupervised baselines and the Training Data Translation approach.Still, the weak point for this approach and the STA score, while not all toxicity types, can be easily transferred.However, Adapter Training is the most resource-conserving approach: it does not require additional data creation and has only one inference step.The finetuning procedure should be cost-efficient as we freeze the layes of the base language model and back-propagate through only adapter layers.The adapter approach can be the optimal solution for cross-lingual detoxification transfer.
Finally, according to manual evaluations in Table 6, Backtranslation is the best choice if we want to transfer knowledge to the English language.However, for another low-resource language, the Adapter approach seems to be more beneficial.In the Backtranlsation approach for the Russian language, we have observed a huge loss of content.That can be a case of more toxic expressions in Russian, which are hard to translate precisely into English before detoxification.As a result, we can claim that the Adapter approach is the most efficient and precise way to transfer detoxification knowledge transfer from English to other languages.

Detox&Translation
At the bottom of Table 5, we report experiments of baseline approaches: detoxification with monolingual detoxification SOTA, then translation into the target language.
We can observe that our proposed approaches for this task for English perform better than the baselines.While for Russian, the results are slightly worse; our models require fewer computational resources during inference.Thus, we can claim that simultaneous style transfer with translation is possible with multilingual LM.

Conclusion
We present the first of our knowledge extensive study of cross-lingual text detoxification approaches.The automatic evaluation shows that the Backtranslation approach achieves the highest performance.However, this approach is bounded to the translation system availability and requires three steps during inference.The Training Data Translation approach can be a good baseline for a separate detoxification system in the target language.On the other hand, the Adapter approach requires only one inference step and performs slightly worse than Backtranslation.The adapter method showed the best manual evaluation scores when transferring from English to Russian.However, the open challenge is the capturing of the whole scope of toxicity types in the language.
We present the first study of detoxification and translation in one step.We show that the generation of a synthetic parallel corpus where the toxic part is in one language, the non-toxic is in another using NMT is effective for this task.Trained on such a corpus, multilingual LMs perform at the level of the backtranslation requiring fewer computations.
All information about datasets, models, and evaluation metrics can be found online. 6

Limitations
One limitation of this work is the usage of only two languages for our experiments -English and Russian.There is a great opportunity for improvement to experiment with more languages and their pairs to transfer knowledge in a cross-lingual style.
The possibility of solving the detoxification task, requires the presence of a corpus of toxicity classification for the language.Firstly, creating a test set and building a classifier for STA evaluation is necessary.Also, having some embedding model for the language is important to calculate the SIM score for evaluation.For FL, in this work, we use classifiers.However, such classifiers can not be present in other languages.

Ethical Considerations
Text detoxification has various applications, e.g.moderating output of generative neural networks to prevent reputation losses of companies.Think of a chatbot responding rudely.On the other hand, completely automatic text detoxification of usergenerated content should be done with extreme care.Instead, a viable use-case is to suggest that the user rewrite a toxic comment (e.g., to save her digital reputation as the 'internet remembers everything').It is crucial to leave the freedom to a person to express comment in the way she wants, given legal boundaries.(Ng et al., 2019) бл**ь, ты хоть себя слышишь?) ты говоришь что я экстрасенс, а потом говоришь, что нет Do you even hear yourself?)You say I'm a psychic, and then you say no.FSMT (Ng et al., 2019) лично я хочу чтоб мр*зи сели на пожизненое

Figure 2 :
Figure 2: Training Data Translation approach: (i) translate available dataset into the target language; (ii) train detoxification model for the target language.

Figure 3 :
Figure 3: Adapter approach: (i) insert Adapter layer into Multilingual LM; (ii) train only Adapter for detoxification task on the available corpus.

Figure 4 :
Figure 4: Simultaneous Detox&Translate approach is based on synthetic cross-lingual parallel corpus.

Table 1 :
Two new text detoxification setups explored in this work compared to the monolingual setup.

Table 2 :
Parallel datasets for text detoxification used in our cross-lingual detoxification experiments.

Table 3 :
Ours vs old evaluation setups.Spearman's correlation between automatic vs manual setups for each old and new evaluation parameter based on systems scores for Russian language.All numbers denote the statistically significant correlation (p-value ≤ 0.05).

Table 4 :
Comparison of the proposed approaches for cross-lingual detoxification transfer based on required computational and data resources.As one may observe, backtranslation approach requires 3 runs of seq2seq models, while other approaches are based on a single (end2end) model and require only one run.

Table 5 :
Automatic evaluation results.Numbers in bold indicate the best results in the sub-sections.Rows in green indicate the best models per tasks.In (brackets), the method of translation used for the approach is indicated.EN or RU denotes training corpus language -original monolingual ParaDetox, while EN-Tr or RU-Tr denotes translated versions of ParaDetox.mBART* states that the version of mBART fine-tuned on paraphrasing and translation data.

Table 8 :
Examples of translations from English to Russian.

Table 9 :
Examples of translations from Russian to English.