Charles Schafer


2006

pdf bib
Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling
Charles Schafer
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

We present and empirically compare a range of novel probabilistic finite-state transducer (PFST) models targeted at two major natural language string transduction tasks, transliteration selection and cognate translation selection. Evaluation is performed on 10 distinct language pair data sets, and in each case novel models consistently and substantially outperform a well-established standard reference algorithm.

pdf bib
An Overview of Statistical Machine Translation
David Smith | Charles Schafer
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Tutorials

2005

pdf bib
Models for Inuktitut-English Word Alignment
Charles Schafer | Elliott Drábek
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Exploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters
Charles Schafer | David Yarowsky
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2003

pdf bib
Statistical Machine Translation Using Coercive Two-Level Syntactic Transduction
Charles Schafer | David Yarowsky
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

pdf bib
A two-level syntax-based approach to Arabic-English statistical machine translation
Charles Schafer | David Yarowsky
Workshop on Machine Translation for Semitic languages: issues and approaches

We formulate an original model for statistical machine translation (SMT) inspired by characteristics of the Arabic-English translation task. Our approach incorporates part-of-speech tags and linguistically motivated phrase chunks in a 2-level shallow syntactic model of reordering. We implement and evaluate this model, showing it to have advantageous properties and to be competitive with an existing SMT baseline. We also describe cross-categorial lexical translation coercion, an interesting component and side-effect of our approach. Finally, we discuss the novel implementation of decoding for this model which saves much development work by constructing finite-state machine (FSM) representations of translation probability distributions and using generic FSM operations for search. Algorithmic details, examples and results focus on Arabic, and the paper includes discussion on the issues and challenges of Arabic statistical machine translation.

2002

pdf bib
Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages
Charles Schafer | David Yarowsky
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
Inducing Information Extraction Systems for New Languages via Cross-language Projection
Ellen Riloff | Charles Schafer | David Yarowsky
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
The John Hopkins SENSEVAL-2 System Descriptions
David Yarowsky | Silviu Cucerzan | Radu Florian | Charles Schafer | Richard Wicentowski
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems