From Disjoint Sets to Parallel Data to Train Seq2Seq Models for Sentiment Transfer

Paulo Cavalin, Marisa Vasconcelos, Marcelo Grave, Claudio Pinhanez, Victor Henrique Alves Ribeiro


Abstract
We present a method for creating parallel data to train Seq2Seq neural networks for sentiment transfer. Most systems for this task, which can be viewed as monolingual machine translation (MT), have relied on unsupervised methods, such as Generative Adversarial Networks (GANs)-inspired approaches, for coping with the lack of parallel corpora. Given that the literature shows that Seq2Seq methods have been consistently outperforming unsupervised methods in MT-related tasks, in this work we exploit the use of semantic similarity computation for converting non-parallel data onto a parallel corpus. That allows us to train a transformer neural network for the sentiment transfer task, and compare its performance against unsupervised approaches. With experiments conducted on two well-known public datasets, i.e. Yelp and Amazon, we demonstrate that the proposed methodology outperforms existing unsupervised methods very consistently in fluency, and presents competitive results in terms of sentiment conversion and content preservation. We believe that this works opens up an opportunity for seq2seq neural networks to be better exploited in problems for which they have not been applied owing to the lack of parallel training data.
Anthology ID:
2020.findings-emnlp.61
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
689–698
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.61
DOI:
10.18653/v1/2020.findings-emnlp.61
Bibkey:
Cite (ACL):
Paulo Cavalin, Marisa Vasconcelos, Marcelo Grave, Claudio Pinhanez, and Victor Henrique Alves Ribeiro. 2020. From Disjoint Sets to Parallel Data to Train Seq2Seq Models for Sentiment Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 689–698, Online. Association for Computational Linguistics.
Cite (Informal):
From Disjoint Sets to Parallel Data to Train Seq2Seq Models for Sentiment Transfer (Cavalin et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.61.pdf