StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell L Gordon, Zaid Harchaoui, Yejin Choi


Abstract
Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is important yet challenging. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall.To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite inputs along various stylistic axes (e.g., formality, length) while maintaining low computational costs. StyleRemix outperforms state-of-the-art baselines and much larger LLMs on an array of domains on both automatic and human evaluation.Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions.
Anthology ID:
2024.emnlp-main.241
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4172–4206
Language:
URL:
https://aclanthology.org/2024.emnlp-main.241
DOI:
10.18653/v1/2024.emnlp-main.241
Bibkey:
Cite (ACL):
Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell L Gordon, Zaid Harchaoui, and Yejin Choi. 2024. StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 4172–4206, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements (Fisher et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.241.pdf