An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation

Pengzhi Gao, Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang


Abstract
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. Experiments on the MuST-C benchmark show that our approaches achieve state-of-the-art (SOTA) performance in most translation directions. The analyses prove that regularization brought by the intra-modal consistency, instead of the modality gap, is crucial for the regular E2E ST, and the cross-modal consistency could close the modality gap and boost the zero-shot E2E ST performance.
Anthology ID:
2024.naacl-long.14
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
242–256
Language:
URL:
https://aclanthology.org/2024.naacl-long.14
DOI:
10.18653/v1/2024.naacl-long.14
Bibkey:
Cite (ACL):
Pengzhi Gao, Ruiqing Zhang, Zhongjun He, Hua Wu, and Haifeng Wang. 2024. An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 242–256, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation (Gao et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.14.pdf