Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu


Abstract
Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace ”modality gap” between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the ”capacity gap”: high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset.
Anthology ID:
2023.acl-short.115
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1340–1348
Language:
URL:
https://aclanthology.org/2023.acl-short.115
DOI:
10.18653/v1/2023.acl-short.115
Bibkey:
Cite (ACL):
Yuchen Han, Chen Xu, Tong Xiao, and Jingbo Zhu. 2023. Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1340–1348, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation (Han et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.115.pdf
Video:
 https://aclanthology.org/2023.acl-short.115.mp4