Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations

Deok-Hyeon Cho; Hyung-Seok Oh; Seung-Bin Kim; Seong-Whan Lee

Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations

Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Seong-Whan Lee

Abstract

Nonverbal vocalizations (NVs), such as laughter and sighs, are central to the expression of affective cues in emotional speech synthesis. However, learning diverse and contextually aligned NVs remains challenging in open settings due to limited NV data and the lack of explicit supervision. Motivated by this challenge, we propose Affectron as a framework for affective and contextually aligned NV generation. Built on a small-scale open and decoupled corpus, Affectron introduces an NV-augmented training strategy that expands the distribution of NV types and insertion locations. We further incorporate NV structural masking into a speech backbone pre-trained on purely verbal speech to enable diverse and natural NV synthesis. Experimental results demonstrate that Affectron produces more expressive and diverse NVs than baseline systems while preserving the naturalness of the verbal speech stream.

Anthology ID:: 2026.findings-acl.1369
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27502–27525
Language:
URL:: https://aclanthology.org/2026.findings-acl.1369/
DOI:
Bibkey:
Cite (ACL):: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, and Seong-Whan Lee. 2026. Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27502–27525, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations (Cho et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1369.pdf
Checklist:: 2026.findings-acl.1369.checklist.pdf

PDF Cite Search Checklist Fix data