Jindřich Libovický


2022

pdf bib
Neural String Edit Distance
Jindřich Libovický | Alexander Fraser
Proceedings of the Sixth Workshop on Structured Prediction for NLP

We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We evaluate on cognate detection, transliteration, and grapheme-to-phoneme conversion, and show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we match the performance of state-of-the-art string-pair matching models. Using static embeddings and a slightly different loss function, we force interpretability, at the expense of an accuracy drop.

pdf bib
Combining Static and Contextualised Multilingual Embeddings
Katharina Hämmerl | Jindřich Libovický | Alexander Fraser
Findings of the Association for Computational Linguistics: ACL 2022

Static and contextual multilingual embeddings have complementary strengths. Static embeddings, while less expressive than contextual language models, can be more straightforwardly aligned across multiple languages. We combine the strengths of static and contextual models to improve multilingual representations. We extract static embeddings for 40 languages from XLM-R, validate those embeddings with cross-lingual word retrieval, and then align them using VecMap. This results in high-quality, highly multilingual static embeddings. Then we apply a novel continued pre-training approach to XLM-R, leveraging the high quality alignment of our static embeddings to better align the representation space of XLM-R. We show positive results for multiple complex semantic tasks. We release the static embeddings and the continued pre-training code. Unlike most previous work, our continued pre-training approach does not require parallel text.

pdf bib
Why don’t people use character-level machine translation?
Jindřich Libovický | Helmut Schmid | Alexander Fraser
Findings of the Association for Computational Linguistics: ACL 2022

We present a literature and empirical survey that critically assesses the state of the art in character-level modeling for machine translation (MT). Despite evidence in the literature that character-level systems are comparable with subword systems, they are virtually never used in competitive setups in WMT competitions. We empirically show that even with recent modeling innovations in character-level natural language processing, character-level MT systems still struggle to match their subword-based counterparts. Character-level MT systems show neither better domain robustness, nor better morphological generalization, despite being often so motivated. However, we are able to show robustness towards source side noise and that translation quality does not degrade with increasing beam size at decoding time.

pdf bib
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch | Jindřich Libovický
Findings of the Association for Computational Linguistics: NAACL 2022

In most Vision-Language models (VL), the understanding of the image structure is enabled by injecting the position information (PI) about objects in the image. In our case study of LXMERT, a state-of-the-art VL model, we probe the use of the PI in the representation and study its effect on Visual Question Answering. We show that the model is not capable of leveraging the PI for the image-text matching task on a challenge set where only position differs. Yet, our experiments with probing confirm that the PI is indeed present in the representation. We introduce two strategies to tackle this: (i) Positional Information Pre-training and (ii) Contrastive Learning on PI using Cross-Modality Matching. Doing so, the model can correctly classify if images with detailed PI statements match. Additionally to the 2D information from bounding boxes, we introduce the object’s depth as new feature for a better object localization in the space. Even though we were able to improve the model properties as defined by our probes, it only has a negligible effect on the downstream performance. Our results thus highlight an important issue of multimodal modeling: the mere presence of information detectable by a probing classifier is not a guarantee that the information is available in a cross-modal setup.

2021

pdf bib
The LMU Munich System for the WMT 2021 Large-Scale Multilingual Machine Translation Shared Task
Wen Lai | Jindřich Libovický | Alexander Fraser
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission of LMU Munich to the WMT 2021 multilingual machine translation task for small track #1, which studies translation between 6 languages (Croatian, Hungarian, Estonian, Serbian, Macedonian, English) in 30 directions. We investigate the extent to which bilingual translation systems can influence multilingual translation systems. More specifically, we trained 30 bilingual translation systems, covering all language pairs, and used data augmentation technologies such as back-translation and knowledge distillation to improve the multilingual translation systems. Our best translation system scores 5 to 6 BLEU higher than a strong baseline system provided by the organizers. As seen in the dynalab leaderboard, our submission is the only fully constrained submission that uses only the corpus provided by the organizers and does not use any pre-trained models.

pdf bib
Findings of the WMT 2021 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT
Jindřich Libovický | Alexander Fraser
Proceedings of the Sixth Conference on Machine Translation

We present the findings of the WMT2021 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT. Within the task, the community studied very low resource translation between German and Upper Sorbian, unsupervised translation between German and Lower Sorbian and low resource translation between Russian and Chuvash, all minority languages with active language communities working on preserving the languages, who are partners in the evaluation. Thanks to this, we were able to obtain most digital data available for these languages and offer them to the task participants. In total, six teams participated in the shared task. The paper discusses the background, presents the tasks and results, and discusses best practices for the future.

pdf bib
The LMU Munich Systems for the WMT21 Unsupervised and Very Low-Resource Translation Task
Jindřich Libovický | Alexander Fraser
Proceedings of the Sixth Conference on Machine Translation

We present our submissions to the WMT21 shared task in Unsupervised and Very Low Resource machine translation between German and Upper Sorbian, German and Lower Sorbian, and Russian and Chuvash. Our low-resource systems (German↔Upper Sorbian, Russian↔Chuvash) are pre-trained on high-resource pairs of related languages. We fine-tune those systems using the available authentic parallel data and improve by iterated back-translation. The unsupervised German↔Lower Sorbian system is initialized by the best Upper Sorbian system and improved by iterated back-translation using monolingual data only.

2020

pdf bib
On the Language Neutrality of Pre-trained Multilingual Representations
Jindřich Libovický | Rudolf Rosa | Alexander Fraser
Findings of the Association for Computational Linguistics: EMNLP 2020

Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings, which are explicitly trained for language neutrality. Contextual embeddings are still only moderately language-neutral by default, so we propose two simple methods for achieving stronger language neutrality: first, by unsupervised centering of the representation for each language and second, by fitting an explicit projection on small parallel data. Besides, we show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences without using parallel data.

pdf bib
The LMU Munich System for the WMT20 Very Low Resource Supervised MT Task
Jindřich Libovický | Viktor Hangya | Helmut Schmid | Alexander Fraser
Proceedings of the Fifth Conference on Machine Translation

We present our systems for the WMT20 Very Low Resource MT Task for translation between German and Upper Sorbian. For training our systems, we generate synthetic data by both back- and forward-translation. Additionally, we enrich the training data with German-Czech translated from Czech to Upper Sorbian by an unsupervised statistical MT system incorporating orthographically similar word pairs and transliterations of OOV words. Our best translation system between German and Sorbian is based on transfer learning from a Czech-German system and scores 12 to 13 BLEU higher than a baseline system built using the available parallel data only.

pdf bib
Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems
Jindřich Libovický | Alexander Fraser
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Applying the Transformer architecture on the character level usually requires very deep architectures that are difficult and slow to train. These problems can be partially overcome by incorporating a segmentation into tokens in the model. We show that by initially training a subword model and then finetuning it on characters, we can obtain a neural machine translation model that works at the character level without requiring token segmentation. We use only the vanilla 6-layer Transformer Base architecture. Our character-level models better capture morphological phenomena and show more robustness to noise at the expense of somewhat worse overall translation quality. Our study is a significant step towards high-performance and easy to train character-based models that are not extremely large.

pdf bib
Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task
Jindřich Libovický | Zdeněk Kasner | Jindřich Helcl | Ondřej Dušek
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present our submission to the Simultaneous Translation And Paraphrase for Language Education (STAPLE) challenge. We used a standard Transformer model for translation, with a crosslingual classifier predicting correct translations on the output n-best list. To increase the diversity of the outputs, we used additional data to train the translation model, and we trained a paraphrasing model based on the Levenshtein Transformer architecture to generate further synonymous translations. The paraphrasing results were again filtered using our classifier. While the use of additional data and our classifier filter were able to improve results, the paraphrasing model produced too many invalid outputs to further improve the output quality. Our model without the paraphrasing component finished in the middle of the field for the shared task, improving over the best baseline by a margin of 10-22 % weighted F1 absolute.

2019

pdf bib
Multimodal Abstractive Summarization for How2 Videos
Shruti Palaskar | Jindřich Libovický | Spandana Gella | Florian Metze
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we study abstractive summarization for open-domain videos. Unlike the traditional text news summarization, the goal is less to “compress” text information but rather to provide a fluent textual summary of information that has been collected and fused from different source modalities, in our case video and audio transcripts (or text). We show how a multi-source sequence-to-sequence model with hierarchical attention can integrate information from different modalities into a coherent output, compare various models trained with different modalities and present pilot experiments on the How2 corpus of instructional videos. We also propose a new evaluation metric (Content F1) for abstractive summarization task that measures semantic adequacy rather than fluency of the summaries, which is covered by metrics like ROUGE and BLEU.

pdf bib
CUNI System for the WMT19 Robustness Task
Jindřich Helcl | Jindřich Libovický | Martin Popel
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

2018

pdf bib
Neural Monkey: The Current State and Beyond
Jindřich Helcl | Jindřich Libovický | Tom Kocmi | Tomáš Musil | Ondřej Cífka | Dušan Variš | Ondřej Bojar
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Input Combination Strategies for Multi-Source Transformer Decoder
Jindřich Libovický | Jindřich Helcl | David Mareček
Proceedings of the Third Conference on Machine Translation: Research Papers

In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.

pdf bib
CUNI System for the WMT18 Multimodal Translation Task
Jindřich Helcl | Jindřich Libovický | Dušan Variš
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present our submission to the WMT18 Multimodal Translation Task. The main feature of our submission is applying a self-attentive network instead of a recurrent neural network. We evaluate two methods of incorporating the visual features in the model: first, we include the image representation as another input to the network; second, we train the model to predict the visual features and use it as an auxiliary objective. For our submission, we acquired both textual and multimodal additional data. Both of the proposed methods yield significant improvements over recurrent networks and self-attentive textual baselines.

pdf bib
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification
Jindřich Libovický | Jindřich Helcl
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models.

2017

pdf bib
CUNI System for the WMT17 Multimodal Translation Task
Jindřich Helcl | Jindřich Libovický
Proceedings of the Second Conference on Machine Translation

pdf bib
Results of the WMT17 Neural MT Training Task
Ondřej Bojar | Jindřich Helcl | Tom Kocmi | Jindřich Libovický | Tomáš Musil
Proceedings of the Second Conference on Machine Translation

pdf bib
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Jindřich Libovický | Jindřich Helcl
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Modeling attention in neural multi-source sequence-to-sequence learning remains a relatively unexplored area, despite its usefulness in tasks that incorporate multiple source languages or modalities. We propose two novel approaches to combine the outputs of attention mechanisms over each source sequence, flat and hierarchical. We compare the proposed methods with existing techniques and present results of systematic evaluation of those methods on the WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the proposed methods achieve competitive results on both tasks.

2016

pdf bib
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks
Jindřich Libovický | Jindřich Helcl | Marek Tlustý | Ondřej Bojar | Pavel Pecina
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Neural Scoring Function for MST Parser
Jindřich Libovický
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Continuous word representations appeared to be a useful feature in many natural language processing tasks. Using fixed-dimension pre-trained word embeddings allows avoiding sparse bag-of-words representation and to train models with fewer parameters. In this paper, we use fixed pre-trained word embeddings as additional features for a neural scoring function in the MST parser. With the multi-layer architecture of the scoring function we can avoid handcrafting feature conjunctions. The continuous word representations on the input also allow us to reduce the number of lexical features, make the parser more robust to out-of-vocabulary words, and reduce the total number of parameters of the model. Although its accuracy stays below the state of the art, the model size is substantially smaller than with the standard features set. Moreover, it performs well for languages where only a smaller treebank is available and the results promise to be useful in cross-lingual parsing.

2014

pdf bib
IBM’s Belief Tracker: Results On Dialog State Tracking Challenge Datasets
Rudolf Kadlec | Jindřich Libovický | Jan Macek | Jan Kleindienst
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf bib
Tolerant BLEU: a Submission to the WMT14 Metrics Task
Jindřich Libovický | Pavel Pecina
Proceedings of the Ninth Workshop on Statistical Machine Translation