Erik Peterson


pdf bib
Unsupervised Russian POS Tagging with Appropriate Context
Li Yang | Erik Peterson | John Chen | Yana Petrova | Rohini Srihari
Proceedings of the Fifth International Workshop On Cross Lingual Information Access


pdf bib
Statistical Transfer Systems for French-English and German-English Machine Translation
Greg Hanneman | Edmund Huber | Abhaya Agarwal | Vamshi Ambati | Alok Parlikar | Erik Peterson | Alon Lavie
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages
Christian Monson | Ariadna Font Llitjós | Vamshi Ambati | Lori Levin | Alon Lavie | Alison Alvarez | Roberto Aranovich | Jaime Carbonell | Robert Frederking | Erik Peterson | Katharina Probst
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.


pdf bib
Semi-Automated Elicitation Corpus Generation
Alison Alvarez | Lori Levin | Robert Frederking | Erik Peterson | Jeff Good
Proceedings of Machine Translation Summit X: Posters

In this document we will describe a semi-automated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.


pdf bib
A trainable transfer-based MT approach for languages with limited resources
Alon Lavie | Katharina Probst | Erik Peterson | Stephan Vogel | Lori Levin | Ariadna Font-Llitjos | Jaime Carbonell
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Rapid prototyping of a transfer-based Hebrew-to-English machine translation system
Alon Lavie | Erik Peterson | Katharina Probst | Shuly Wintner | Yaniv Eytani
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages


pdf bib
Automatic rule learning for resource-limited MT
Jaime Carbonell | Katharina Probst | Erik Peterson | Christian Monson | Alon Lavie | Ralf Brown | Lori Levin
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.


pdf bib
Design and implementation of controlled elicitation for machine translation of low-density languages
Katharina Probst | Ralf Brown | Jaime Carbonell | Alon Lavie | Lori Levin | Erik Peterson
Workshop on MT2010: Towards a Road Map for MT

NICE is a machine translation project for low-density languages. We are building a tool that will elicit a controlled corpus from a bilingual speaker who is not an expert in linguistics. The corpus is intended to cover major typological phenomena, as it is designed to work for any language. Using implicational universals, we strive to minimize the number of sentences that each informant has to translate. From the elicited sentences, we learn transfer rules with a version space algorithm. Our vision for MT in the future is one in which systems can be quickly trained for new languages by native speakers, so that speakers of minor languages can participate in education, health care, government, and internet without having to give up their languages.


pdf bib
Chinese Information Extraction and Retrieval
Sean Boisen | Michael Crystal | Erik Peterson | Ralph Weischedel | John Broglio | Jamie Callan | Bruce Croft | Theresa Hand | Thomas Keenan | Mary Ellen Okurowski
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996

pdf bib
Approaches in MET (Multi-Lingual Entity Task)
Damaris Ayuso | Daniel Bikel | Tasha Hall | Erik Peterson | Ralph Weischedel | Patrick Jost
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996