Computational Narrative Understanding for Expressive Text-to-Speech

Gaspard Michel; Elena V. Epure; Christophe Cerisara

Computational Narrative Understanding for Expressive Text-to-Speech

Gaspard Michel, Elena V. Epure, Christophe Cerisara

Abstract

Recent advances in text-to-speech (TTS) have been driven by large, multi-domain speech corpora, yet the expressive potential of audiobook data remains underexamined. We argue that human-narrated audiobooks, particularly fictional works, contain rich and diverse prosodic cues arising from the natural alternation between neutral narration and expressive character dialogue. Building from this observation, we introduce LibriQuote, a large-scale 5.3K hours of expressive speech drawn from character quotations.Each quote is supplemented with contextual pseudo-labels for speech verbs and adverbs that characterize the intended delivery of direct speech (e.g., “he whispered softly”).We found that fine-tuning a flow-matching model on LibriQuote yields substantial improvements in expressivity and intelligibility, while training from scratch enhances expressiveness of an autoregressive TTS model.Benchmarking on LibriQuote-test highlights significant variability across systems in generating expressive speech.We publicly release the dataset, code, and evaluation resources to facilitate reproducibility.Audio samples can be found at https://libriquote.github.io/.

Anthology ID:: 2026.findings-acl.308
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6194–6215
Language:
URL:: https://aclanthology.org/2026.findings-acl.308/
DOI:
Bibkey:
Cite (ACL):: Gaspard Michel, Elena V. Epure, and Christophe Cerisara. 2026. Computational Narrative Understanding for Expressive Text-to-Speech. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6194–6215, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Computational Narrative Understanding for Expressive Text-to-Speech (Michel et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.308.pdf
Checklist:: 2026.findings-acl.308.checklist.pdf

PDF Cite Search Checklist Fix data