Balancing the Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer

Debarati Das; David Ma; Dongyeop Kang

doi:10.18653/v1/2023.findings-acl.243

Balancing the Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer

Abstract

Text style transfer is an exciting task within the field of natural language generation that is often plagued by the need for high-quality paired datasets. Furthermore, training a model for multi-attribute text style transfer requires datasets with sufficient support across all combinations of the considered stylistic attributes, adding to the challenges of training a style transfer model. This paper explores the impact of training data input diversity on the quality of the generated text from the multi-style transfer model. We construct a pseudo-parallel dataset by devising heuristics to adjust the style distribution in the training samples. We balance our training dataset using marginal and joint distributions to train our style transfer models. We observe that a balanced dataset produces more effective control effects over multiple styles than an imbalanced or skewed one. Through quantitative analysis, we explore the impact of multiple style distributions in training data on style-transferred output. These findings will better inform the design of style-transfer datasets.

Anthology ID:: 2023.findings-acl.243
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3932–3943
Language:
URL:: https://aclanthology.org/2023.findings-acl.243
DOI:: 10.18653/v1/2023.findings-acl.243
Bibkey:
Cite (ACL):: Debarati Das, David Ma, and Dongyeop Kang. 2023. Balancing the Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3932–3943, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Balancing the Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer (Das et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.243.pdf

PDF Cite Search