Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe


Abstract
A conventional approach to improving the performance of end-to-end speech translation (E2E-ST) models is to leverage the source transcription via pre-training and joint training with automatic speech recognition (ASR) and neural machine translation (NMT) tasks. However, since the input modalities are different, it is difficult to leverage source language text successfully. In this work, we focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models. To leverage the full potential of the source language information, we propose backward SeqKD, SeqKD from a target-to-source backward NMT model. To this end, we train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder. The paraphrases are generated from the translations in bitext via back-translation. We further propose bidirectional SeqKD in which SeqKD from both forward and backward NMT models is combined. Experimental evaluations on both autoregressive and non-autoregressive models show that SeqKD in each direction consistently improves the translation performance, and the effectiveness is complementary regardless of the model capacity.
Anthology ID:
2021.naacl-main.150
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1872–1881
Language:
URL:
https://aclanthology.org/2021.naacl-main.150
DOI:
10.18653/v1/2021.naacl-main.150
Bibkey:
Cite (ACL):
Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe. 2021. Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1872–1881, Online. Association for Computational Linguistics.
Cite (Informal):
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation (Inaguma et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.150.pdf
Video:
 https://aclanthology.org/2021.naacl-main.150.mp4
Data
MuST-C