ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Haitao Li; Chunxiang Jin; Chenglin Li; Wenhao Guan; Zhengxing Huang; Xie Chen

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Haitao Li, Chunxiang Jin, Chenglin Li, Wenhao Guan, Zhengxing Huang, Xie Chen

Abstract

Zero-shot text-to-speech models can clone a speaker’s timbre from a short reference audio, but they also strongly inherit the speaking style present in the reference. As a result, synthesizing speech with a desired style often requires carefully selecting reference audio, which is impractical when only limited or mismatched references are available. While recent controllable TTS methods attempt to address this issue, they typically rely on absolute style targets and discrete textual prompts, and therefore do not support continuous and reference-relative style control. We propose ReStyle-TTS, a framework that enables continuous and reference-relative style control in zero-shot TTS. Our key insight is that effective style control requires first reducing the model’s implicit dependence on reference style before introducing explicit control mechanisms. To this end, we introduce Decoupled Classifier-Free Guidance (DCFG), which independently controls text and reference guidance, reducing reliance on reference style while preserving text fidelity. On top of this, we apply style-specific LoRAs together with Orthogonal LoRA Fusion to enable continuous and disentangled multi-attribute control, and introduce a Timbre Consistency Optimization module to mitigate timbre drift caused by weakened reference guidance. Experiments show that ReStyle-TTS enables user-friendly, continuous, and relative control over pitch, energy, and multiple emotions while maintaining intelligibility and speaker timbre, and performs robustly in challenging mismatched reference–target style scenarios. Code and data are available in supplementary materials.

Anthology ID:: 2026.findings-acl.451
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9257–9269
Language:
URL:: https://aclanthology.org/2026.findings-acl.451/
DOI:
Bibkey:
Cite (ACL):: Haitao Li, Chunxiang Jin, Chenglin Li, Wenhao Guan, Zhengxing Huang, and Xie Chen. 2026. ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9257–9269, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis (Li et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.451.pdf
Checklist:: 2026.findings-acl.451.checklist.pdf

PDF Cite Search Checklist Fix data