AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

Jiacheng Shi; Hongfei Du; Xinyuan Song; Y. Alicia Hong; Yanfu Zhang; Ashley Gao

AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

Jiacheng Shi, Hongfei Du, Xinyuan Song, Y. Alicia Hong, Yanfu Zhang, Ashley Gao

Abstract

Neural speech codecs provide discrete representations for speech language models, but emotional cues are often degraded during quantization. Existing codecs mainly optimize acoustic reconstruction, leaving emotion expressiveness insufficiently modeled at the representation level. We propose an emotion-guided neural speech codec that explicitly preserves emotional information while maintaining semantic fidelity and prosodic naturalness. Our framework combines emotion–semantic guided latent modulation, relation-preserving emotional–semantic distillation, and emotion-weighted semantic alignment to retain emotionally salient cues under compression. Extensive evaluations across speech reconstruction, emotion recognition, and downstream text to speech generation demonstrate improved emotion consistency and perceptual quality without sacrificing content accuracy.

Anthology ID:: 2026.findings-acl.442
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9102–9124
Language:
URL:: https://aclanthology.org/2026.findings-acl.442/
DOI:
Bibkey:
Cite (ACL):: Jiacheng Shi, Hongfei Du, Xinyuan Song, Y. Alicia Hong, Yanfu Zhang, and Ashley Gao. 2026. AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9102–9124, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling (Shi et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.442.pdf
Checklist:: 2026.findings-acl.442.checklist.pdf

PDF Cite Search Checklist Fix data