BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Mingda Chen; Paul-Ambroise Duquenne; Pierre Andrews; Justine Kao; Alexandre Mourachko; Holger Schwenk; Marta R. Costa-jussà

doi:10.18653/v1/2023.acl-long.504

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà

Abstract

End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.

Anthology ID:: 2023.acl-long.504
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9064–9079
Language:
URL:: https://aclanthology.org/2023.acl-long.504/
DOI:: 10.18653/v1/2023.acl-long.504
Bibkey:
Cite (ACL):: Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, and Marta R. Costa-jussà. 2023. BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9064–9079, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric (Chen et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.504.pdf
Video:: https://aclanthology.org/2023.acl-long.504.mp4

PDF Cite Search Video Fix data