Data Augmentation for End-to-End Speech Translation: FBK@IWSLT ‘19

Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, Marco Turchi


Abstract
This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. The task consists in the “direct” translation (i.e. without intermediate discrete representation) of English speech data derived from TED Talks or lectures into German texts. Our participation had a twofold goal: i) testing our latest models, and ii) eval- uating the contribution to model training of different data augmentation techniques. On the model side, we deployed our recently proposed S-Transformer with logarithmic distance penalty, an ST-oriented adaptation of the Transformer architecture widely used in machine translation (MT). On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR). In particular, we exploited augmented data in different ways and at different stages of the process. We first trained an end-to-end ASR system and used the weights of its encoder to initialize the decoder of our ST model (transfer learning). Then, we used an English-German MT system trained on large data to translate the English side of the English-French training set into German, and used this newly-created data as additional training material. Finally, we trained our models using SpecAugment, an augmentation technique that randomly masks portions of the spectrograms in order to make them different at every training epoch. Our synthetic corpus and SpecAugment resulted in an improvement of 5 BLEU points over our baseline model on the test set of MuST-C En-De, reaching the score of 22.3 with a single end-to-end system.
Anthology ID:
2019.iwslt-1.14
Volume:
Proceedings of the 16th International Conference on Spoken Language Translation
Month:
November 2-3
Year:
2019
Address:
Hong Kong
Editors:
Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/2019.iwslt-1.14
DOI:
Bibkey:
Cite (ACL):
Mattia A. Di Gangi, Matteo Negri, Viet Nhat Nguyen, Amirhossein Tebbifakhr, and Marco Turchi. 2019. Data Augmentation for End-to-End Speech Translation: FBK@IWSLT ‘19. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for End-to-End Speech Translation: FBK@IWSLT ‘19 (Di Gangi et al., IWSLT 2019)
Copy Citation:
PDF:
https://aclanthology.org/2019.iwslt-1.14.pdf
Data
MuST-C