Nianwen Si


2022

pdf bib
FCGCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation
Hao Zhang | Nianwen Si | Yaqi Chen | Zhen Li | Tong Niu | Xukui Yang | Dan Qu
Findings of the Association for Computational Linguistics: EMNLP 2022

It is notoriously difficult to implement end-to-end speech translation (E2E-ST) model because of the task complexity and data scarcity. Existing techniques often attempt to carry out implicit knowledge transfer from machine translation (MT) to ST model by imposing various constraints. However, in this transfer scenario, a significant problem is that the performance of the MT will drop significantly and the final transfer effect is also restricted. In this article, we recommend Fine and Coarse Granularity Contrastive Learning (FCGCL), which conduct explicit knowledge transfer from MT to ST model. Specially, we ensure through multi granularity contrastive learning that inputs with similar semantic between different modalities are encoded closely in the shared semantic space while inputs with different semantics are kept apart. Experiments on the MuST-C datasets on all 8 languages and further analysis show that our method can effectively improve the E2E-ST performance and achieves an average BLEU of 29.0.