Romain Legrand


2024

pdf bib
Emotags: Computer-Assisted Verbal Labelling of Expressive Audiovisual Utterances for Expressive Multimodal TTS
Gérard Bailly | Romain Legrand | Martin Lenglet | Frédéric Elisei | Maëva Hueber | Olivier Perrotin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We developped a web app for ascribing verbal descriptions to expressive audiovisual utterances. These descriptions are limited to lists of adjectives that are either suggested via a navigation in emotional latent spaces built using discriminant analysis of BERT embeddings or entered freely by subjects. We show that such verbal descriptions collected on-line via Prolific on massive data (310 participants, 12620 labelled utterances up-to-now) provide Expressive Multimodal Text-to-Speech Synthesis with precise verbal control over desired emotional content