Arul Menezes


2021

pdf bib
The Curious Case of Hallucinations in Neural Machine Translation
Vikas Raunak | Arul Menezes | Marcin Junczys-Dowmunt
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman, and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation. We have released the datasets and code to replicate our results.

2015

pdf bib
Pre-Computable Multi-Layer Neural Network Language Models
Jacob Devlin | Chris Quirk | Arul Menezes
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus
Lucy Vanderwende | Arul Menezes | Chris Quirk
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf bib
Speech translation for everyone – breaking down the barriers
Arul Menezes
Proceedings of the 11th International Workshop on Spoken Language Translation: Keynotes

2013

pdf bib
Social Text Normalization using Contextual Graph Random Walks
Hany Hassan | Arul Menezes
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2008

pdf bib
Syntactic Models for Structural Word Insertion and Deletion during Translation
Arul Menezes | Chris Quirk
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Using Dependency Order Templates to Improve Generality in Translation
Arul Menezes | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Generative models of noisy translations with applications to parallel fragment extraction
Chris Quirk | Raghavendra Udupa U. | Arul Menezes
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation
Arul Menezes | Kristina Toutanova | Chris Quirk
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
Chris Quirk | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Effectively Using Syntax for Recognizing False Entailment
Rion Snow | Lucy Vanderwende | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
MindNet: An Automatically-Created Lexical Resource
Lucy Vanderwende | Gary Kacmarcik | Hisami Suzuki | Arul Menezes
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

pdf bib
Microsoft Research Treelet Translation System: IWSLT Evaluation
Arul Menezes | Chris Quirk
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib
Dependency Treelet Translation: The Convergence of Statistical and Example-based Machine-translation?
Arul Menezes | Chris Quirk
Workshop on example-based machine translation

We describe a novel approach to machine translation that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated BLEU scores with a small human evaluation.

pdf bib
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
Chris Quirk | Arul Menezes | Colin Cherry
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Statistical machine translation using labeled semantic dependency graphs
Anthony Aue | Arul Menezes | Bob Moore | Chris Quirk | Eric Ringger
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2002

pdf bib
English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations
Chris Brockett | Takako Aikawa | Anthony Aue | Arul Menezes | Chris Quirk | Hisami Suzuki
COLING-02: Machine Translation in Asia

pdf bib
Better contextual translation using machine learning
Arul Menezes
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

One of the problems facing translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora is the trade-off between contextual specificity and general applicability of the mappings, which typically results in conflicting mappings without distinguishing context. We present a machine-learning approach to choosing between such mappings, using classifiers that, in effect, selectively expand the context for these mappings using features available in a linguistic representation of the source language input. We show that using these classifiers in our machine translation system significantly improves the quality of the translated output. Additionally, the set of distinguishing features selected by the classifiers provides insight into the relative importance of the various linguistic features in choosing the correct contextual translation.

2001

pdf bib
Achieving commercial-quality translation with example-based methods
Stephen Richardson | William Dolan | Arul Menezes | Jessie Pinkham
Proceedings of Machine Translation Summit VIII

We describe MSR-MT, a large-scale example-based machine translation system under development for several language pairs. Trained on aligned English-Spanish technical prose, a blind evaluation shows that MSR-MT’s integration of rule-based parsers, example based processing, and statistical techniques produces translations whose quality in this domain exceeds that of uncustomized commercial MT systems.

pdf bib
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora
Arul Menezes | Stephen D. Richardson
Workshop on Example-Based machine Translation

Translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora have been hampered by the difficulty of achieving accurate alignment and acquiring high quality mappings. We describe an algorithm that uses a best-first strategy and a small alignment grammar to significantly improve the quality of the mappings extracted. For each mapping, frequencies are computed and sufficient context is retained to distinguish competing mappings during translation. Variants of the algorithm are run against a corpus containing 200K sentence pairs and evaluated based on the quality of resulting translations.

pdf bib
Overcoming the customization bottleneck using example-based MT
Stephen D. Richardson | William B. Dolan | Arul Menezes | Monica Corston-Oliver
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf bib
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora
Arul Menezes | Stephen D. Richardson
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation