Oleksii Kuchaiev


pdf bib
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022
Oleksii Hrinchuk | Vahid Noroozi | Abhinav Khattar | Anton Peganov | Sandeep Subramanian | Somshubra Majumdar | Oleksii Kuchaiev
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks. Our end-to-end model has less parameters and consists of Conformer encoder and Transformer decoder. It relies on the cascade system by re-using its pre-trained ASR encoder and training on synthetic translations generated with the ensemble of NMT models. Our En->De cascade and end-to-end systems achieve 29.7 and 26.2 BLEU on the 2020 test set correspondingly, both outperforming the previous year’s best of 26 BLEU.


pdf bib
NVIDIA NeMo’s Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian | Oleksii Hrinchuk | Virginia Adams | Oleksii Kuchaiev
Proceedings of the Sixth Conference on Machine Translation

This paper provides an overview of NVIDIA NeMo’s neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks. Our news task submissions for English-German (En-De) and English-Russian (En-Ru) are built on top of a baseline transformer-based sequence-to-sequence model (CITATION). Specifically, we use a combination of 1) checkpoint averaging 2) model scaling 3) data augmentation with backtranslation and knowledge distillation from right-to-left factorized models 4) finetuning on test sets from previous years 5) model ensembling 6) shallow fusion decoding with transformer language models and 7) noisy channel re-ranking. Additionally, our biomedical task submission for English Russian uses a biomedically biased vocabulary and is trained from scratch on news task data, medically relevant text curated from the news task dataset, and biomedical data provided by the shared task. Our news system achieves a sacreBLEU score of 39.5 on the WMT’20 En-De test set outperforming the best submission from last year’s task of 38.8. Our biomedical task Ru-En and En-Ru systems reach BLEU scores of 43.8 and 40.3 respectively on the WMT’20 Biomedical Task Test set, outperforming the previous year’s best submissions.


pdf bib
OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models
Oleksii Kuchaiev | Boris Ginsburg | Igor Gitman | Vitaly Lavrukhin | Carl Case | Paulius Micikevicius
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

We present OpenSeq2Seq – an open-source toolkit for training sequence-to-sequence models. The main goal of our toolkit is to allow researchers to most effectively explore different sequence-to-sequence architectures. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq provides building blocks for training encoder-decoder models for neural machine translation and automatic speech recognition. We plan to extend it with other modalities in the future.