%0 Conference Proceedings %T NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022 %A Hrinchuk, Oleksii %A Noroozi, Vahid %A Khattar, Abhinav %A Peganov, Anton %A Subramanian, Sandeep %A Majumdar, Somshubra %A Kuchaiev, Oleksii %Y Salesky, Elizabeth %Y Federico, Marcello %Y Costa-jussà, Marta %S Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022) %D 2022 %8 May %I Association for Computational Linguistics %C Dublin, Ireland (in-person and online) %F hrinchuk-etal-2022-nvidia %X This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks. Our end-to-end model has less parameters and consists of Conformer encoder and Transformer decoder. It relies on the cascade system by re-using its pre-trained ASR encoder and training on synthetic translations generated with the ensemble of NMT models. Our En-\textgreaterDe cascade and end-to-end systems achieve 29.7 and 26.2 BLEU on the 2020 test set correspondingly, both outperforming the previous year’s best of 26 BLEU. %R 10.18653/v1/2022.iwslt-1.18 %U https://aclanthology.org/2022.iwslt-1.18 %U https://doi.org/10.18653/v1/2022.iwslt-1.18 %P 225-231