Sushant Narsale

2012

Combining Quality Prediction and System Selection for Improved Automatic Translation Output
Radu Soricut | Sushant Narsale
Proceedings of the Seventh Workshop on Statistical Machine Translation

2010

pdf bib abs

While the web provides a fantastic linguistic resource, collecting and processing data at web-scale is beyond the reach of most academic laboratories. Previous research has relied on search engines to collect online information, but this is hopelessly inefficient for building large-scale linguistic resources, such as lists of named-entity types or clusters of distributionally similar words. An alternative to processing web-scale text directly is to use the information provided in an N-gram corpus. An N-gram corpus is an efficient compression of large amounts of text. An N-gram corpus states how often each sequence of words (up to length N) occurs. We propose tools for working with enhanced web-scale N-gram corpora that include richer levels of source annotation, such as part-of-speech tags. We describe a new set of search tools that make use of these tags, and collectively lower the barrier for lexical learning and ambiguity resolution at web-scale. They will allow novel sources of information to be applied to long-standing natural language challenges.

pdf bib

JHU System Combination Scheme for WMT 2010
Sushant Narsale
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR