Aligning Speech Segments Beyond Pure Semantics

Kevin Heffernan, Artyom Kozhevnikov, Loic Barrault, Alexandre Mourachko, Holger Schwenk


Abstract
Multilingual parallel data for speech-to-speech translation is scarce and expensive to create from scratch. This is all the more true for expressive speech translation, which aims at preserving not only the semantics, but also the overall prosody (e.g. style, emotion, rate-of-speech). Existing corpora contain speech utterances with the same meaning, yet the overall prosody is typically different, as human annotators are not tasked with reproducing these aspects, or crowed-sourced efforts do not specifically target this kind of alignment in priority. In this paper, we propose a novel alignment algorithm, which automatically forms pairs of speech segments aligned not only in meaning, but also in expressivity. In order to validate our approach, we train an expressive multilingual speech-to-speech translation system on the automatically aligned data. Our experiments show that in comparison to semantic-only approaches, expressively aligned data yields large improvements in source expressivity preservation (e.g. 43% uplift in speech rate preservation on average), while still maintaining content translation quality. In some scenarios, results also indicate that this alignment algorithm can outperform standard, semantic-focused approaches even on content translation quality.
Anthology ID:
2024.findings-acl.216
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3626–3635
Language:
URL:
https://aclanthology.org/2024.findings-acl.216
DOI:
10.18653/v1/2024.findings-acl.216
Bibkey:
Cite (ACL):
Kevin Heffernan, Artyom Kozhevnikov, Loic Barrault, Alexandre Mourachko, and Holger Schwenk. 2024. Aligning Speech Segments Beyond Pure Semantics. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3626–3635, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Aligning Speech Segments Beyond Pure Semantics (Heffernan et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.216.pdf