Christof Monz


2021

pdf bib
NLQuAD: A Non-Factoid Long Question Answering Data Set
Amir Soleimani | Christof Monz | Marcel Worring
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We introduce NLQuAD, the first data set with baseline methods for non-factoid long question answering, a task requiring document-level language understanding. In contrast to existing span detection question answering data sets, NLQuAD has non-factoid questions that are not answerable by a short span of text and demanding multiple-sentence descriptive answers and opinions. We show the limitation of the F1 score for evaluation of long answers and introduce Intersection over Union (IoU), which measures position-sensitive overlap between the predicted and the target answer spans. To establish baseline performances, we compare BERT, RoBERTa, and Longformer models. Experimental results and human evaluations show that Longformer outperforms the other architectures, but results are still far behind a human upper bound, leaving substantial room for improvements. NLQuAD’s samples exceed the input limitation of most pre-trained Transformer-based models, encouraging future research on long sequence language models.

2020

pdf bib
Optimizing Transformer for Low-Resource Neural Machine Translation
Ali Araabi | Christof Monz
Proceedings of the 28th International Conference on Computational Linguistics

Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. While the Transformer model has achieved significant improvements for many language pairs and has become the de facto mainstream architecture, its capability under low-resource conditions has not been fully investigated yet. Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper-parameter settings. Our experiments show that using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.

pdf bib
Proceedings of the Fifth Conference on Machine Translation
Loïc Barrault | Ondřej Bojar | Fethi Bougares | Rajen Chatterjee | Marta R. Costa-jussà | Christian Federmann | Mark Fishel | Alexander Fraser | Yvette Graham | Paco Guzman | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Makoto Morishita | Christof Monz | Masaaki Nagata | Toshiaki Nakazawa | Matteo Negri
Proceedings of the Fifth Conference on Machine Translation

pdf bib
Findings of the 2020 Conference on Machine Translation (WMT20)
Loïc Barrault | Magdalena Biesialska | Ondřej Bojar | Marta R. Costa-jussà | Christian Federmann | Yvette Graham | Roman Grundkiewicz | Barry Haddow | Matthias Huck | Eric Joanis | Tom Kocmi | Philipp Koehn | Chi-kiu Lo | Nikola Ljubešić | Christof Monz | Makoto Morishita | Masaaki Nagata | Toshiaki Nakazawa | Santanu Pal | Matt Post | Marcos Zampieri
Proceedings of the Fifth Conference on Machine Translation

This paper presents the results of the news translation task and the similar language translation task, both organised alongside the Conference on Machine Translation (WMT) 2020. In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories. The task was also opened up to additional test suites to probe specific aspects of translation. In the similar language translation task, participants built machine translation systems for translating between closely related pairs of languages.

pdf bib
The Unreasonable Volatility of Neural Machine Translation Models
Marzieh Fadaee | Christof Monz
Proceedings of the Fourth Workshop on Neural Generation and Translation

Recent works have shown that Neural Machine Translation (NMT) models achieve impressive performance, however, questions about understanding the behavior of these models remain unanswered. We investigate the unexpected volatility of NMT models where the input is semantically and syntactically correct. We discover that with trivial modifications of source sentences, we can identify cases where unexpected changes happen in the translation and in the worst case lead to mistranslations. This volatile behavior of translating extremely similar sentences in surprisingly different ways highlights the underlying generalization problem of current NMT models. We find that both RNN and Transformer models display volatile behavior in 26% and 19% of sentence variations, respectively.

2019

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

pdf bib
Findings of the 2019 Conference on Machine Translation (WMT19)
Loïc Barrault | Ondřej Bojar | Marta R. Costa-jussà | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Philipp Koehn | Shervin Malmasi | Christof Monz | Mathias Müller | Santanu Pal | Matt Post | Marcos Zampieri
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

pdf bib
An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures
Hamidreza Ghader | Christof Monz
Proceedings of Machine Translation Summit XVII: Research Track

pdf bib
Improving Neural Machine Translation Using Noisy Parallel Data through Distillation
Praveen Dakwale | Christof Monz
Proceedings of Machine Translation Summit XVII: Research Track

2018

pdf bib
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Marzieh Fadaee | Arianna Bisazza | Christof Monz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Evaluation of Machine Translation Performance Across Multiple Genres and Languages
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the Third Conference on Machine Translation: Research Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Research Papers

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Findings of the 2018 Conference on Machine Translation (WMT18)
Ondřej Bojar | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Philipp Koehn | Christof Monz
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test sets to probe specific aspects of translation.

pdf bib
Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation
Marzieh Fadaee | Christof Monz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural Machine Translation has achieved state-of-the-art performance for several language pairs using a combination of parallel and synthetic data. Synthetic data is often generated by back-translating sentences randomly sampled from monolingual data using a reverse translation model. While back-translation has been shown to be very effective in many cases, it is not entirely clear why. In this work, we explore different aspects of back-translation, and show that words with high prediction loss during training benefit most from the addition of synthetic data. We introduce several variations of sampling strategies targeting difficult-to-predict words using prediction losses and frequencies of words. In addition, we also target the contexts of difficult words and sample sentences that are similar in context. Experimental results for the WMT news translation task show that our method improves translation quality by up to 1.7 and 1.2 Bleu points over back-translation using random sampling for German-English and English-German, respectively.

pdf bib
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent work has shown that recurrent neural networks (RNNs) can implicitly capture and exploit hierarchical information when trained to solve common natural language processing tasks (Blevins et al., 2018) such as language modeling (Linzen et al., 2016; Gulordava et al., 2018) and neural machine translation (Shi et al., 2016). In contrast, the ability to model structured data with non-recurrent neural networks has received little attention despite their success in many NLP tasks (Gehring et al., 2017; Vaswani et al., 2017). In this work, we compare the two architectures—recurrent versus non-recurrent—with respect to their ability to model hierarchical structure and find that recurrency is indeed important for this purpose. The code and data used in our experiments is available at https://github.com/ ketranm/fan_vs_rnn

2017

pdf bib
Findings of the 2017 Conference on Machine Translation (WMT17)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Shujian Huang | Matthias Huck | Philipp Koehn | Qun Liu | Varvara Logacheva | Christof Monz | Matteo Negri | Matt Post | Raphael Rubino | Lucia Specia | Marco Turchi
Proceedings of the Second Conference on Machine Translation

pdf bib
Learning Topic-Sensitive Word Representations
Marzieh Fadaee | Arianna Bisazza | Christof Monz
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for each document we obtain representations that are able to distinguish between different meanings of a given word. Our models yield statistically significant improvements for the lexical substitution task indicating that commonly used single word representations, even when combined with contextual information, are insufficient for this task.

pdf bib
Data Augmentation for Low-Resource Neural Machine Translation
Marzieh Fadaee | Arianna Bisazza | Christof Monz
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

pdf bib
Dynamic Data Selection for Neural Machine Translation
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection (Axelrod et al., 2011) consistently performs well for PBMT, we show that gains are substantially lower for NMT. Next, we introduce ‘dynamic data selection’ for NMT, a method in which we vary the selected subset of training data between different training epochs. Our experiments show that the best results are achieved when applying a technique we call ‘gradual fine-tuning’, with improvements up to +2.6 BLEU over the original data selection approach and up to +3.1 BLEU over a general baseline.

pdf bib
What does Attention in Neural Machine Translation Pay Attention to?
Hamidreza Ghader | Christof Monz
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.

2016

pdf bib
Improving Statistical Machine Translation Performance by Oracle-BLEU Model Re-estimation
Praveen Dakwale | Christof Monz
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Which Words Matter in Defining Phrase Reordering Behavior in Statistical Machine Translation?
Hamidreza Ghader | Christof Monz
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

Lexicalized and hierarchical reordering models use relative frequencies of fully lexicalized phrase pairs to learn phrase reordering distributions. This results in unreliable estimation for infrequent phrase pairs which also tend to be longer phrases. There are some smoothing techniques used to smooth the distributions in these models. But these techniques are unable to address the similarities between phrase pairs and their reordering distributions. We propose two models to use shorter sub-phrase pairs of an original phrase pair to smooth the phrase reordering distributions. In the first model we follow the classic idea of backing off to shorter histories commonly used in language model smoothing. In the second model, we use syntactic dependencies to identify the most relevant words in a phrase to back off to. We show how these models can be easily applied to existing lexicalized and hierarchical reordering models. Our models achieve improvements of up to 0.40 BLEU points in Chinese-English translation compared to a baseline which uses a regular lexicalized reordering model and a hierarchical reordering model. The results show that not all the words inside a phrase pair are equally important in defining phrase reordering behavior and shortening towards important words will decrease the sparsity problem for long phrase pairs.

pdf bib
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

bib
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly ambiguous, which prevents rule-based approaches from achieving good translation results. In this paper, we improve Arabizi-to-English machine translation by presenting a simple but effective Arabizi-to-Arabic transliteration pipeline that does not require knowledge by experts or native Arabic speakers. We incorporate this pipeline into a phrase-based SMT system, and show that translation quality after automatically transliterating Arabizi to Arabic yields results that are comparable to those achieved after human transliteration.

pdf bib
Recurrent Memory Networks for Language Modeling
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Ensemble Learning for Multi-Source Neural Machine Translation
Ekaterina Garmash | Christof Monz
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we describe and evaluate methods to perform ensemble prediction in neural machine translation (NMT). We compare two methods of ensemble set induction: sampling parameter initializations for an NMT system, which is a relatively established method in NMT (Sutskever et al., 2014), and NMT systems translating from different source languages into the same target language, i.e., multi-source ensembles, a method recently introduced by Firat et al. (2016). We are motivated by the observation that for different language pairs systems make different types of mistakes. We propose several methods with different degrees of parameterization to combine individual predictions of NMT systems so that they mutually compensate for each other’s mistakes and improve overall performance. We find that the biggest improvements can be obtained from a context-dependent weighting scheme for multi-source ensembles. This result offers stronger support for the linguistic motivation of using multi-source ensembles than previous approaches. Evaluation is carried out for German and French into English translation. The best multi-source ensemble method achieves an improvement of up to 2.2 BLEU points over the strongest single-source ensemble baseline, and a 2 BLEU improvement over a multi-source ensemble baseline.

pdf bib
Measuring the Effect of Conversational Aspects on Machine Translation Quality
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Research in statistical machine translation (SMT) is largely driven by formal translation tasks, while translating informal text is much more challenging. In this paper we focus on SMT for the informal genre of dialogues, which has rarely been addressed to date. Concretely, we investigate the effect of dialogue acts, speakers, gender, and text register on SMT quality when translating fictional dialogues. We first create and release a corpus of multilingual movie dialogues annotated with these four dialogue-specific aspects. When measuring translation performance for each of these variables, we find that BLEU fluctuations between their categories are often significantly larger than randomly expected. Following this finding, we hypothesize and show that SMT of fictional dialogues benefits from adaptation towards dialogue acts and registers. Finally, we find that male speakers are harder to translate and use more vulgar language than female speakers, and that vulgarity is often not preserved during translation.

2015

pdf bib
Bilingual Structured Language Models for Statistical Machine Translation
Ekaterina Garmash | Christof Monz
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Translation Model Adaptation Using Genre-Revealing Text Features
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Findings of the 2015 Workshop on Statistical Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Barry Haddow | Matthias Huck | Chris Hokamp | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Matt Post | Carolina Scarton | Lucia Specia | Marco Turchi
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Five Shades of Noise: Analyzing Machine Translation Errors in User-Generated Text
Marlies van der Wees | Arianna Bisazza | Christof Monz
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation
Marlies van der Wees | Arianna Bisazza | Wouter Weerkamp | Christof Monz
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
A distributed inflection model for translating into morphologically rich languages
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of Machine Translation Summit XV: Papers

2014

pdf bib
Maximizing Component Quality in Bilingual Word-Aligned Segmentations
Spyros Martzoukos | Christof Monz | Christophe Costa Florêncio
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the Ninth Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Lucia Specia
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2014 Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Johannes Leveling | Christof Monz | Pavel Pecina | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia | Aleš Tamchyna
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Class-Based Language Modeling for Translating into Morphologically Rich Languages
Arianna Bisazza | Christof Monz
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Word Translation Prediction for Morphologically Rich Languages with Bilingual Neural Networks
Ke M. Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Dependency-Based Bilingual Language Models for Reordering in Statistical Machine Translation
Ekaterina Garmash | Christof Monz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Proceedings of the Eighth Workshop on Statistical Machine Translation
Ondrej Bojar | Christian Buck | Chris Callison-Burch | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2013 Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Chris Callison-Burch | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation
Spyros Martzoukos | Christophe Costa Florêncio | Christof Monz
Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13)

2012

pdf bib
Proceedings of the Seventh Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Findings of the 2012 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora
Spyros Martzoukos | Christof Monz
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context
Vassilina Nikoulina | Bogomil Kovachev | Nikolaos Lagos | Christof Monz
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
User Edits Classification Using Document Revision Histories
Amit Bronner | Christof Monz
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Proceedings of the Sixth Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Omar F. Zaidan
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2011 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Omar Zaidan
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Statistical Machine Translation with Local Language Models
Christof Monz
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Chris Callison-Burch | Philipp Koehn | Christof Monz | Kay Peterson | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Kay Peterson | Mark Przybocki | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
The QMUL system description for IWSLT 2010
Sirvan Yahyaei | Christof Monz
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
The UvA system description for IWSLT 2010
Spyros Martzoukos | Christof Monz
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Discriminative Syntactic Reranking for Statistical Machine Translation
Simon Carter | Christof Monz
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes a method that successfully exploits simple syntactic features for n-best translation candidate reranking using perceptrons. Our approach uses discriminative language modelling to rerank the n-best translations generated by a statistical machine translation system. The performance is evaluated for Arabic-to-English translation using NIST’s MT-Eval benchmarks. Whilst parse trees do not consistently help, we show how features extracted from a simple Part-of-Speech annotation layer outperform two competitive baselines, leading to significant BLEU improvements on three different test sets.

2009

pdf bib
Decoding by Dynamic Chunking for Statistical Machine Translation
Sirvan Yahyaei | Christof Monz
Proceedings of Machine Translation Summit XII: Papers

pdf bib
Automatic Single-Document Key Fact Extraction from Newswire Articles
Itamar Kastner | Christof Monz
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Proceedings of the Fourth Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2009 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib
Proceedings of the Third Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder | Cameron Shaw Fordyce
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Further Meta-Evaluation of Machine Translation
Chris Callison-Burch | Cameron Fordyce | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
TheQMUL system description for IWSLT 2008.
Simon Carter | Christof Monz | Sirvan Yahyaei
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

The QMUL system to the IWSLT 2008 evaluation campaign is a phrase-based statistical MT system implemented in C++. The decoder employs a multi-stack architecture, and uses a beam to manage the search space. We participated in both BTEC Arabic → English and Chinese → English tracks, as well as the PIVOT task. In our first submission to IWSLT, we are particularly interested in seeing how our SMT system performs with speech input, having so far only worked with and translated newswire data sets.

2007

pdf bib
Proceedings of the Second Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Cameron Shaw Fordyce | Christof Monz
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
(Meta-) Evaluation of Machine Translation
Chris Callison-Burch | Cameron Fordyce | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
Proceedings on the Workshop on Statistical Machine Translation
Philipp Koehn | Christof Monz
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Manual and Automatic Evaluation of Machine Translation between European Languages
Philipp Koehn | Christof Monz
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Challenges in Building an Arabic-English GHMT System with SMT Components
Nizar Habash | Bonnie Dorr | Christof Monz
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

The research context of this paper is developing hybrid machine translation (MT) systems that exploit the advantages of linguistic rule-based and statistical MT systems. Arabic, as a morphologically rich language, is especially challenging even without addressing the hybridization question. In this paper, we describe the challenges in building an Arabic-English generation-heavy machine translation (GHMT) system and boosting it with statistical machine translation (SMT) components. We present an extensive evaluation of multiple system variants and report positive results on the advantages of hybridization.

2005

pdf bib
NeurAlign: Combining Word Alignments Using Neural Networks
Necip Fazil Ayan | Bonnie J. Dorr | Christof Monz
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Alignment Link Projection Using Transformation-Based Learning
Necip Fazil Ayan | Bonnie J. Dorr | Christof Monz
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
The Hiero Machine Translation System: Extensions, Evaluation, and Analysis
David Chiang | Adam Lopez | Nitin Madnani | Christof Monz | Philip Resnik | Michael Subotin
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the ACL Workshop on Building and Using Parallel Texts
Philipp Koehn | Joel Martin | Rada Mihalcea | Christof Monz | Ted Pedersen
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Shared Task: Statistical Machine Translation between European Languages
Philipp Koehn | Christof Monz
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate?
Bonnie Dorr | Christof Monz | Stacy President | Richard Schwartz | David Zajic
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

2001

pdf bib
Document Fusion for Comprehensive Event Description
Christof Monz
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

Search