Consistency is Key: On Data-Efficient Modality Transfer in Speech Translation

Hojin Lee, Changmin Lee, Seung-won Hwang


Abstract
End-to-end approaches have shown promising results for speech translation (ST), but they suffer from its data scarcity compared to machine translation (MT). To address this, progressive training has become a common practice, of using external MT data during the fine-tuning phase. Despite of its prevalence and computational overhead, its validity is not extensively corroborated yet. This paper conducts an empirical investigation and finds that progressive training is ineffective. We identify learning-forgetting trade-off as a critical obstacle, then hypothesize and verify that consistency learning (CL) breaks the dilemma of learning-forgetting. The proposed method, which combines knowledge distillation (KD) and CL, outperforms the previous methods on MuST-C dataset even without additional data, and our proposed consistency-informed KD achieves additional improvements against KD+CL. Code and models are availble at https://github.com/hjlee1371/consistency-s2tt.
Anthology ID:
2023.findings-emnlp.905
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13572–13581
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.905
DOI:
10.18653/v1/2023.findings-emnlp.905
Bibkey:
Cite (ACL):
Hojin Lee, Changmin Lee, and Seung-won Hwang. 2023. Consistency is Key: On Data-Efficient Modality Transfer in Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13572–13581, Singapore. Association for Computational Linguistics.
Cite (Informal):
Consistency is Key: On Data-Efficient Modality Transfer in Speech Translation (Lee et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.905.pdf