Robert Frederking

Also published as: Robert E. Frederking


2014

pdf bib
Resources for the Detection of Conventionalized Metaphors in Four Languages
Lori Levin | Teruko Mitamura | Brian MacWhinney | Davida Fromm | Jaime Carbonell | Weston Feely | Robert Frederking | Anatole Gershman | Carlos Ramirez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes a suite of tools for extracting conventionalized metaphors in English, Spanish, Farsi, and Russian. The method depends on three significant resources for each language: a corpus of conventionalized metaphors, a table of conventionalized conceptual metaphors (CCM table), and a set of extraction rules. Conventionalized metaphors are things like “escape from poverty” and “burden of taxation”. For each metaphor, the CCM table contains the metaphorical source domain word (such as “escape”) the target domain word (such as “poverty”) and the grammatical construction in which they can be found. The extraction rules operate on the output of a dependency parser and identify the grammatical configurations (such as a verb with a prepositional phrase complement) that are likely to contain conventional metaphors. We present results on detection rates for conventional metaphors and analysis of the similarity and differences of source domains for conventional metaphors in the four languages.

pdf bib
The CMU METAL Farsi NLP Approach
Weston Feely | Mehdi Manshadi | Robert Frederking | Lori Levin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

While many high-quality tools are available for analyzing major languages such as English, equivalent freely-available tools for important but lower-resourced languages such as Farsi are more difficult to acquire and integrate into a useful NLP front end. We report here on an accurate and efficient Farsi analysis front end that we have assembled, which may be useful to others who wish to work with written Farsi. The pre-existing components and resources that we incorporated include the Carnegie Mellon TurboParser and TurboTagger (Martins et al., 2010) trained on the Dadegan Treebank (Rasooli et al., 2013), the Uppsala Farsi text normalizer PrePer (Seraji, 2013), the Uppsala Farsi tokenizer (Seraji et al., 2012a), and Jon Dehdari’s PerStem (Jadidinejad et al., 2010). This set of tools (combined with additional normalization and tokenization modules that we have developed and made available) achieves a dependency parsing labeled attachment score of 89.49%, unlabeled attachment score of 92.19%, and label accuracy score of 91.38% on a held-out parsing test data set. All of the components and resources used are freely available. In addition to describing the components and resources, we also explain the rationale for our choices.

2012

pdf bib
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Luís Marujo | Anatole Gershman | Jaime Carbonell | Robert Frederking | João P. Neto
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction. These features include the use of signal words and freebase categories. Some of these features lead to significant improvements in the accuracy of the results. We also experimented with 2 forms of document pre-processing that we call light filtering and co-reference normalization. Light filtering removes sentences from the document, which are judged peripheral to its main content. Co-reference normalization unifies several written forms of the same named entity into a unique form. We also needed a “Gold Standard” ― a set of labeled documents for training and evaluation. While the subjective nature of key phrase selection precludes a true “Gold Standard”, we used Amazon's Mechanical Turk service to obtain a useful approximation. Our data indicates that the biggest improvements in performance were due to shallow semantic features, news categories, and rhetorical signals (nDCG 78.47% vs. 68.93%). The inclusion of deeper semantic features such as Freebase sub-categories was not beneficial by itself, but in combination with pre-processing, did cause slight improvements in the nDCG scores.

2010

pdf bib
CONE: Metrics for Automatic Evaluation of Named Entity Co-Reference Resolution
Bo Lin | Rushin Shah | Robert Frederking | Anatole Gershman
Proceedings of the 2010 Named Entities Workshop

2008

pdf bib
Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation
Jonathan H. Clark | Robert Frederking | Lori Levin
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Michael McTear | Manny Rayner
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf bib
Speech Translation for Triage of Emergency Phonecalls in Minority Languages
Udhyakumar Nallasamy | Alan Black | Tanja Schultz | Robert Frederking | Jerry Weltman
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf bib
Toward Active Learning in Data Selection: Automatic Discovery of Language Features During Elicitation
Jonathan Clark | Robert Frederking | Lori Levin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given task. This paper describes deductive feature detection, one component of a data selection system for machine translation. Feature detection determines whether features such as tense, number, and person are expressed in a language. The database of the The World Atlas of Language Structures provides a gold standard against which to evaluate feature detection. The discovered features can be used as input to a Navigator, which uses active learning to determine which piece of language data is the most important to acquire next.

pdf bib
NineOneOne: Recognizing and Classifying Speech for Handling Minority Language Emergency Calls
Udhyakumar Nallasamy | Alan Black | Tanja Schultz | Robert Frederking
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe NineOneOne (9-1-1), a system designed to recognize and translate Spanish emergency calls for better dispatching. We analyze the research challenges in adapting speech translation technology to 9-1-1 domain. We report our initial research towards building the system and the results of our initial experiments.

pdf bib
Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages
Christian Monson | Ariadna Font Llitjós | Vamshi Ambati | Lori Levin | Alon Lavie | Alison Alvarez | Roberto Aranovich | Jaime Carbonell | Robert Frederking | Erik Peterson | Katharina Probst
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.

2007

pdf bib
An assessment of language elicitation without the supervision of a linguist
Alison Alvarez | Lori Levin | Robert Frederking | Jill Lehman
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2006

pdf bib
The MILE Corpus for Less Commonly Taught Languages
Alison Alvarez | Lori Levin | Robert Frederking | Simon Fung | Donna Gates | Jeff Good
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Proceedings of the First International Workshop on Medical Speech Translation
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Manny Rayner
Proceedings of the First International Workshop on Medical Speech Translation

2005

pdf bib
Semi-Automated Elicitation Corpus Generation
Alison Alvarez | Lori Levin | Robert Frederking | Erik Peterson | Jeff Good
Proceedings of Machine Translation Summit X: Posters

In this document we will describe a semi-automated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.

2003

pdf bib
JAVELIN: A Flexible, Planner-Based Architecture for Question Answering
Eric Nyberg | Robert Frederking
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

pdf bib
Speechalator: Two-Way Speech-to-Speech Translation in Your Hand
Alex Waibel | Ahmed Badran | Alan W. Black | Robert Frederking | Donna Gates | Alon Lavie | Lori Levin | Kevin Lenzo | Laura Mayfield Tomokiyo | Juergen Reichert | Tanja Schultz | Dorcas Wallace | Monika Woszczyna | Jing Zhang
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

pdf bib
Teaching machine translation in a graduate language technologies program
Teruko Mitamura | Eric Nyberg | Robert Frederking
Workshop on Teaching Translation Technologies and Tools

This paper describes a graduate-level machine translation (MT) course taught at the Language Technologies Institute at Carnegie Mellon University. Most of the students in the course have a background in computer science. We discuss what we teach (the course syllabus), and how we teach it (lectures, homeworks, and projects). The course has evolved steadily over the past several years to incorporate refinements in the set of course topics, how they are taught, and how students “learn by doing”. The course syllabus has also evolved in response to changes in the field of MT and the role that MT plays in various social contexts.

2002

pdf bib
Design and Evolution of a Language Technologies Curriculum
Robert Frederking | Eric H. Nyberg | Teruko Mitamura | Jaime G. Carbonell
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf bib
Speech Translation on a Tight Budget without Enough Data
Robert E. Frederking | Alan W. Black | Ralf D. Brown | Alexander Rudnicky | John Moody | Eric Steinbrecher
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Field Testing the Tongues Speech-to-Speech Machine Translation System
Robert E. Frederking | Alan W. Black | Ralf D. Brown | John Moody | Eric Steinbrecher
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
The NESPOLE! speech-to-speech translation system
Alon Lavie | Lori Levin | Robert Frederking | Fabio Pianesi
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

NESPOLE! is a speech-to-speech machine translation research system designed to provide fully functional speech-to-speech capabilities within real-world settings of common users involved in e-commerce applications. The project is funded jointly by the European Commission and the US NSF. The NESPOLE! system uses a client-server architecture to allow a common user, who is browsing web-pages on the internet, to connect seamlessly in real-time to an agent of the service provider, using a video-conferencing channel and with speech-to-speech translation services mediating the conversation. Shared web pages and annotated images supported via a Whiteboard application are available to enhance the communication.

2001

pdf bib
Adapting an Example-Based Translation System to Chinese
Ying Zhang | Ralf D. Brown | Robert E. Frederking
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Pre-processing of bilingual corpora for Mandarin-English EBMT
Ying Zhang | Ralf Brown | Robert Frederking | Alon Lavie
Proceedings of Machine Translation Summit VIII

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). It uses dynamic programming with a frequency dictionary to segment the text. Although the frequency dictionary is large, it does not completely cover the corpora. In this paper, we describe the work we have done to improve the segmentation for Mandarin and the bracketing process for English to increase the length of English phrases. A statistical dictionary is built from the aligned bilingual corpus. It is used as feedback to segmentation and bracketing to re-segment / re-bracket the corpus. The process iterates several times to achieve better results. The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary. We achieved positive results by increasing the average length of Chinese terms about 60% and 10% for English. The statistical dictionary gained about a 30% increase in coverage.

2000

pdf bib
WebDIPLOMAT: A Web-Based Interactive Machine Translation System
Christopher Hogan | Robert Frederking
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf bib
A new approach to the translating telephone
Robert Frederking | Christopher Hogan | Alexander Rudnicky
Proceedings of Machine Translation Summit VII

The Translating Telephone has been a major goal of speech translation for many years. Previous approaches have attempted to work from limited-domain, fully-automatic translation towards broad-coverage, fully-automatic translation. We are approaching the problem from a different direction: starting with a broad-coverage but not fully-automatic system, and working towards full automation. We believe that working in this direction will provide us with better feedback, by observing users and collecting language data under realistic conditions, and thus may allow more rapid progress towards the same ultimate goal. Our initial approach relies on the wide-spread availability of Internet connections and web browsers to provide a user interface. We describe our initial work, which is an extension of the Diplomat wearable speech translator.

1998

pdf bib
An evaluation of the multi-engine MT architecture
Christopher Hogan | Robert E. Frederking
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

The Multi-Engine MT (MEMT) architecture combines the outputs of multiple MT engines using a statistical language model of the target language. It has been used successfully in a number of MT research systems, for both text and speech translation. Despite its perceived benefits, there has never been a rigorous, published, double-blind evaluation of the claim that the combined output of a MEMT system is in fact better than that of any one of the component MT engines. We report here the results of such an evaluation. The combined MEMT output is shown to indeed be better overall than the output of the component engines in a Croatian ↔ English MT system. This result is consistent in both translation directions, and between different raters.

1997

pdf bib
The DIPLOMAT Rapid Development Speech MT System
Robert E. Frederking | Ralf D. Brown | Christopher Hogan
Proceedings of Machine Translation Summit VI: Systems

pdf bib
Interactive Speech Translation in the DIPLOMAT Project
Robert Frederking | Alexander Rudnicky | Christopher Hogan
Spoken Language Translation

1996

pdf bib
The Pangloss-Lite machine translation system
Robert E. Frederking | Ralf D. Brown
Conference of the Association for Machine Translation in the Americas

1995

pdf bib
Applying Statistical English Language Modelling to Symbolic Machine Translation
Ralf Brown | Robert Frederking
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1994

pdf bib
Toward Multi-Engine Machine Translation
Sergei Nirenburg | Robert Frederking
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf bib
Two Types of Adaptive MT Environments
Sergei Nirenburg | Robert Frederking | David Farwell | Yorick Wilks
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
Three Heads are Better than One
Robert Frederking | Sergei Nirenburg
Fourth Conference on Applied Natural Language Processing

pdf bib
Integrating Translations from Multiple Sources within the PANGLOSS Mark III Machine Translation System
Robert Frederking | Sergei Nirenburg | David Farwell | Steven Helmreich | Eduard Hovy | Kevin Knight | Stephen Beale | Constantino Domashnev | Donalee Attardo | Dean Grannes | Ralf Brown
Proceedings of the First Conference of the Association for Machine Translation in the Americas

pdf bib
PANGLOSS
Jaime Carbonell | David Farwell | Robert Frederking | Steven Helmreich | Eduard Hovy | Kevin Knight | Lori Levin | Sergei Nirenburg
Proceedings of the First Conference of the Association for Machine Translation in the Americas

1993

pdf bib
An MAT Tool and Its Effectiveness
Robert Frederking | Dean Grannes | Peter Cousseau | Sergei Nirenburg
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

pdf bib
The PANGLOSS MARK I MAT system
Robert Frederking | Ariel Cohen | Dean Grannes | Peter Cousseau | Sergei Nirenburg
Sixth Conference of the European Chapter of the Association for Computational Linguistics

1981

pdf bib
A Rule-based Conversation Participant
Robert E. Frederking
19th Annual Meeting of the Association for Computational Linguistics