Ashish Venugopal


2011

pdf bib
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
Ashish Venugopal | Jakob Uszkoreit | David Talbot | Franz Och | Juri Ganitkevitch
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation
Ashish Venugopal | Andreas Zollmann | Noah A. Smith | Stephan Vogel
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT
Andreas Zollmann | Ashish Venugopal | Franz Och | Jay Ponte
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The CMU syntax-augmented machine translation system: SAMT on Hadoop with n-best alignments.
Andreas Zollmann | Ashish Venugopal | Stephan Vogel
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

We present the CMU Syntax Augmented Machine Translation System that was used in the IWSLT-08 evaluation campaign. We participated in the Full-BTEC data track for Chinese-English translation, focusing on transcript translation. For this year’s evaluation, we ported the Syntax Augmented MT toolkit [1] to the Hadoop MapReduce [2] parallel processing architecture, allowing us to efficiently run experiments evaluating a novel “wider pipelines” approach to integrate evidence from N -best alignments into our translation models. We describe each step of the MapReduce pipeline as it is implemented in the open-source SAMT toolkit, and show improvements in translation quality by using N-best alignments in both hierarchical and syntax augmented translation systems.

pdf bib
Wider Pipelines: N-Best Alignments and Parses in MT Training
Ashish Venugopal | Andreas Zollmann | Noah A. Smith | Stephan Vogel
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOP-inspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

2007

pdf bib
An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT
Ashish Venugopal | Andreas Zollmann | Stephan Vogel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation
Andreas Zollmann | Ashish Venugopal | Matthias Paulik | Stephan Vogel
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
The CMU-UKA statistical machine translation systems for IWSLT 2007
Ian Lane | Andreas Zollmann | Thuy Linh Nguyen | Nguyen Bach | Ashish Venugopal | Stephan Vogel | Kay Rottmann | Ying Zhang | Alex Waibel
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.

2006

pdf bib
The CMU-UKA syntax augmented machine translation system for IWSLT-06
Andreas Zollmann | Ashish Venugopal | Stephan Vogel | Alex Waibel
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Bridging the Inflection Morphology Gap for Arabic Statistical Machine Translation
Andreas Zollmann | Ashish Venugopal | Stephan Vogel
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Syntax Augmented Machine Translation via Chart Parsing
Andreas Zollmann | Ashish Venugopal
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf bib
Training and Evaluating Error Minimization Decision Rules for Statistical Machine Translation
Ashish Venugopal | Andreas Zollmann | Alex Waibel
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2003

pdf bib
Effective Phrase Translation Extraction from Alignment Models
Ashish Venugopal | Stephan Vogel | Alex Waibel
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
The CMU statistical machine translation system
Stephan Vogel | Ying Zhang | Fei Huang | Alicia Tribble | Ashish Venugopal | Bing Zhao | Alex Waibel
Proceedings of Machine Translation Summit IX: Papers

In this paper we describe the components of our statistical machine translation system. This system combines phrase-to-phrase translations extracted from a bilingual corpus using different alignment approaches. Special methods to extract and align named entities are used. We show how a manual lexicon can be incorporated into the statistical system in an optimized way. Experiments on Chinese-to-English and Arabic-to-English translation tasks are presented.