Peter Polák

Also published as: Peter Polak


2022

pdf bib
CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
Peter Polák | Ngoc-Quan Pham | Tuan Nam Nguyen | Danni Liu | Carlos Mullov | Jan Niehues | Ondřej Bojar | Alexander Waibel
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available.

2021

pdf bib
Explainable Quality Estimation: CUNI Eval4NLP Submission
Peter Polák | Muskaan Singh | Ondřej Bojar
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

This paper describes our participating system in the shared task Explainable quality estimation of 2nd Workshop on Evaluation & Comparison of NLP Systems. The task of quality estimation (QE, a.k.a. reference-free evaluation) is to predict the quality of MT output at inference time without access to reference translations. In this proposed work, we first build a word-level quality estimation model, then we finetune this model for sentence-level QE. Our proposed models achieve near state-of-the-art results. In the word-level QE, we place 2nd and 3rd on the supervised Ro-En and Et-En test sets. In the sentence-level QE, we achieve a relative improvement of 8.86% (Ro-En) and 10.6% (Et-En) in terms of the Pearson correlation coefficient over the baseline model.

pdf bib
ELITR Multilingual Live Subtitling: Demo and Strategy
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Peter Polák | Ebrahim Ansari | Mohammad Mahmoudi | Rishu Kumar | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stüker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents an automatic speech translation system aimed at live subtitling of conference presentations. We describe the overall architecture and key processing components. More importantly, we explain our strategy for building a complex system for end-users from numerous individual components, each of which has been tested only in laboratory conditions. The system is a working prototype that is routinely tested in recognizing English, Czech, and German speech and presenting it translated simultaneously into 42 target languages.

2020

pdf bib
CUNI Neural ASR with Phoneme-Level Intermediate Step for~Non-Native~SLT at IWSLT 2020
Peter Polák | Sangeet Sagar | Dominik Macháček | Ondřej Bojar
Proceedings of the 17th International Conference on Spoken Language Translation

In this paper, we present our submission to the Non-Native Speech Translation Task for IWSLT 2020. Our main contribution is a proposed speech recognition pipeline that consists of an acoustic model and a phoneme-to-grapheme model. As an intermediate representation, we utilize phonemes. We demonstrate that the proposed pipeline surpasses commercially used automatic speech recognition (ASR) and submit it into the ASR track. We complement this ASR with off-the-shelf MT systems to take part also in the speech translation track.

pdf bib
Large Corpus of Czech Parliament Plenary Hearings
Jonas Kratochvil | Peter Polak | Ondrej Bojar
Proceedings of the 12th Language Resources and Evaluation Conference

We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 1200 hours of speech data and corresponding text transcriptions. The whole corpus has been segmented to short audio segments making it suitable for both training and evaluation of automatic speech recognition (ASR) systems. The source language of the corpus is Czech, which makes it a valuable resource for future research as only a few public datasets are available in the Czech language. We complement the data release with experiments of two baseline ASR systems trained on the presented data: the more traditional approach implemented in the Kaldi ASRtoolkit which combines hidden Markov models and deep neural networks (NN) and a modern ASR architecture implemented in Jaspertoolkit which uses deep NNs in an end-to-end fashion.