A Call for Standardization and Validation of Text Style Transfer Evaluation

Phil Ostheimer, Mayank Kumar Nagda, Marius Kloft, Sophie Fellenz


Abstract
Text Style Transfer (TST) evaluation is, in practice, inconsistent. Therefore, we conduct a meta-analysis on human and automated TST evaluation and experimentation that thoroughly examines existing literature in the field. The meta-analysis reveals a substantial standardization gap in human and automated evaluation. In addition, we also find a validation gap: only few automated metrics have been validated using human experiments. To this end, we thoroughly scrutinize both the standardization and validation gap and reveal the resulting pitfalls. This work also paves the way to close the standardization and validation gap in TST evaluation by calling out requirements to be met by future research.
Anthology ID:
2023.findings-acl.687
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10791–10815
Language:
URL:
https://aclanthology.org/2023.findings-acl.687
DOI:
10.18653/v1/2023.findings-acl.687
Bibkey:
Cite (ACL):
Phil Ostheimer, Mayank Kumar Nagda, Marius Kloft, and Sophie Fellenz. 2023. A Call for Standardization and Validation of Text Style Transfer Evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10791–10815, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Call for Standardization and Validation of Text Style Transfer Evaluation (Ostheimer et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.687.pdf