EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Maureen de Seyssel; Antony D’Avirro; Adina Williams; Emmanuel Dupoux

doi:10.18653/v1/2024.emnlp-main.30

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Maureen de Seyssel, Antony D’Avirro, Adina Williams, Emmanuel Dupoux

Abstract

We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

Anthology ID:: 2024.emnlp-main.30
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 495–507
Language:
URL:: https://aclanthology.org/2024.emnlp-main.30/
DOI:: 10.18653/v1/2024.emnlp-main.30
Bibkey:
Cite (ACL):: Maureen de Seyssel, Antony D’Avirro, Adina Williams, and Emmanuel Dupoux. 2024. EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 495–507, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models (de Seyssel et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.30.pdf
Software:: 2024.emnlp-main.30.software.zip

PDF Cite Search Software Fix data