The topic of this paper is neural multi-task training for text style transfer. We present an efficient method for neutral-to-style transformation using the transformer framework. We demonstrate how to prepare a robust model utilizing large paraphrases corpora together with a small parallel style transfer corpus. We study how much style transfer data is needed for a model on the example of two transformations: neutral-to-cute on internal corpus and modern-to-antique on publicly available Bible corpora. Additionally, we propose a synthetic measure for the automatic evaluation of style transfer models. We hope our research is a step towards replacing common but limited rule-based style transfer systems by more flexible machine learning models for both public and commercial usage.
The paper presents a system developed for the SemEval-2019 competition Task 5 hat- Eval Basile et al. (2019) (team name: LU Team) and Task 6 OffensEval Zampieri et al. (2019b) (team name: NLPR@SRPOL), where we achieved 2nd position in Subtask C. The system combines in an ensemble several models (LSTM, Transformer, OpenAI’s GPT, Random forest, SVM) with various embeddings (custom, ELMo, fastText, Universal Encoder) together with additional linguistic features (number of blacklisted words, special characters, etc.). The system works with a multi-tier blacklist and a large corpus of crawled data, annotated for general offensiveness. In the paper we do an extensive analysis of our results and show how the combination of features and embedding affect the performance of the models.