Alon Lavie

Also published as: A. Lavie


2021

pdf bib
MT-Telescope: An interactive platform for contrastive evaluation of MT systems
Ricardo Rei | Ana C Farinha | Craig Stewart | Luisa Coheur | Alon Lavie
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present MT-Telescope, a visualization platform designed to facilitate comparative analysis of the output quality of two Machine Translation (MT) systems. While automated MT evaluation metrics are commonly used to evaluate MT systems at a corpus-level, our platform supports fine-grained segment-level analysis and interactive visualisations that expose the fundamental differences in the performance of the compared systems. MT-Telescope also supports dynamic corpus filtering to enable focused analysis on specific phenomena such as; translation of named entities, handling of terminology, and the impact of input segment length on translation quality. Furthermore, the platform provides a bootstrapped t-test for statistical significance as a means of evaluating the rigor of the resulting system ranking. MT-Telescope is open source, written in Python, and is built around a user friendly and dynamic web interface. Complementing other existing tools, our platform is designed to facilitate and promote the broader adoption of more rigorous analysis practices in the evaluation of MT quality.

2020

pdf bib
Unbabel’s Participation in the WMT20 Metrics Shared Task
Ricardo Rei | Craig Stewart | Ana C Farinha | Alon Lavie
Proceedings of the Fifth Conference on Machine Translation

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segmentlevel, document-level and system-level tracks on all language pairs, as well as the “QE as a Metric” track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: we train several estimator models to regress on different humangenerated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.

pdf bib
COMET: A Neural Framework for MT Evaluation
Ricardo Rei | Craig Stewart | Ana C Farinha | Alon Lavie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metric. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

bib
COMET - Deploying a New State-of-the-art MT Evaluation Metric in Production
Craig Stewart | Ricardo Rei | Catarina Farinha | Alon Lavie
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

2016

pdf bib
Synthesizing Compound Words for Machine Translation
Austin Matthews | Eva Schlinger | Alon Lavie | Chris Dyer
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Humor Recognition and Humor Anchor Extraction
Diyi Yang | Alon Lavie | Chris Dyer | Eduard Hovy
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation
Michael Denkowski | Chris Dyer | Alon Lavie
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Cognitive demand and cognitive effort in post-editing
Isabel Lacruz | Michael Denkowski | Alon Lavie
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas

The pause to word ratio, the number of pauses per word in a post-edited MT segment, is an indicator of cognitive effort in post-editing (Lacruz and Shreve, 2014). We investigate how low the pause threshold can reasonably be taken, and we propose that 300 ms is a good choice, as pioneered by Schilperoord (1996). We then seek to identify a good measure of the cognitive demand imposed by MT output on the post-editor, as opposed to the cognitive effort actually exerted by the post-editor during post-editing. Measuring cognitive demand is closely related to measuring MT utility, the MT quality as perceived by the post-editor. HTER, an extrinsic edit to word ratio that does not necessarily correspond to actual edits per word performed by the post-editor, is a well-established measure of MT quality, but it does not comprehensively capture cognitive demand (Koponen, 2012). We investigate intrinsic measures of MT quality, and so of cognitive demand, through edited-error to word metrics. We find that the transfer-error to word ratio predicts cognitive effort better than mechanical-error to word ratio (Koby and Champe, 2013). We identify specific categories of cognitively challenging MT errors whose error to word ratios correlate well with cognitive effort.

pdf bib
Real time adaptive machine translation: cdec and TransCenter
Michael Denkowski | Alon Lavie | Isabel Lacruz | Chris Dyer
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas

cdec Realtime and TransCenter provide an end-to-end experimental setup for machine translation post-editing research. Realtime provides a framework for building adaptive MT systems that learn from post-editor feedback while TransCenter incorporates a web-based translation interface that connects users to these systems and logs post-editing activity. This combination allows the straightforward deployment of MT systems specifically for post-editing and analysis of translator productivity when working with adaptive systems. Both toolkits are freely available under open source licenses.

pdf bib
Locally Non-Linear Learning for Statistical Machine Translation via Discretization and Structured Regularization
Jonathan H. Clark | Chris Dyer | Alon Lavie
Transactions of the Association for Computational Linguistics, Volume 2

Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective. In this work, we introduce a technique for learning arbitrary, rule-local, non-linear feature transforms that improve model expressivity, but do not sacrifice the efficient inference and learning associated with linear models. To demonstrate the value of our technique, we discard the customary log transform of lexical probabilities and drop the phrasal translation probability in favor of raw counts. We observe that our algorithm learns a variation of a log transform that leads to better translation quality compared to the explicit log transform. We conclude that non-linear responses play an important role in SMT, an observation that we hope will inform the efforts of feature engineers.

pdf bib
Real Time Adaptive Machine Translation for Post-Editing with cdec and TransCenter
Michael Denkowski | Alon Lavie | Isabel Lacruz | Chris Dyer
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

pdf bib
The CMU Machine Translation Systems at WMT 2014
Austin Matthews | Waleed Ammar | Archna Bhatia | Weston Feely | Greg Hanneman | Eva Schlinger | Swabha Swayamdipta | Yulia Tsvetkov | Alon Lavie | Chris Dyer
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Meteor Universal: Language Specific Translation Evaluation for Any Target Language
Michael Denkowski | Alon Lavie
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic
Serena Jeblee | Weston Feely | Houda Bouamor | Alon Lavie | Nizar Habash | Kemal Oflazer
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

2013

pdf bib
Improving Syntax-Augmented Machine Translation by Coarsening the Label Set
Greg Hanneman | Alon Lavie
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Grouping Language Model Boundary Words to Speed K–Best Extraction from Hypergraphs
Kenneth Heafield | Philipp Koehn | Alon Lavie
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References
Waleed Ammar | Victor Chahuneau | Michael Denkowski | Greg Hanneman | Wang Ling | Austin Matthews | Kenton Murray | Nicola Segall | Alon Lavie | Chris Dyer
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Analyzing and Predicting MT Utility and Post-Editing Productivity in Enterprise-scale Translation Projects
Alon Lavie | Olga Beregovaya | Michael Denkowski | David Clarke
Proceedings of Machine Translation Summit XIV: User track

2012

pdf bib
Language Model Rest Costs and Space-Efficient Storage
Kenneth Heafield | Philipp Koehn | Alon Lavie
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
The CMU-Avenue French-English Translation System
Michael Denkowski | Greg Hanneman | Alon Lavie
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation
Jonathan Clark | Alon Lavie | Chris Dyer
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

In this paper, we introduce a simple technique for incorporating domain information into a statistical machine translation system that significantly improves translation quality when test data comes from multiple domains. Our approach augments (conjoins) standard translation model and language model features with domain indicator features and requires only minimal modifications to the optimization and decoding procedures. We evaluate our method on two language pairs with varying numbers of domains, and observe significant improvements of up to 1.0 BLEU.

pdf bib
Challenges in Predicting Machine Translation Utility for Human Post-Editors
Michael Denkowski | Alon Lavie
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

As machine translation quality continues to improve, the idea of using MT to assist human translators becomes increasingly attractive. In this work, we discuss and provide empirical evidence of the challenges faced when adapting traditional MT systems to provide automatic translations for human post-editors to correct. We discuss the differences between this task and traditional adequacy-based tasks and the challenges that arise when using automatic metrics to predict the amount of effort required to post-edit translations. A series of experiments simulating a real-world localization scenario shows that current metrics under-perform on this task, even when tuned to maximize correlation with expert translator judgments, illustrating the need to rethink traditional MT pipelines when addressing the challenges of this translation task.

2011

pdf bib
Automatic Category Label Coarsening for Syntax-Based Machine Translation
Greg Hanneman | Alon Lavie
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
A General-Purpose Rule Extractor for SCFG-Based Machine Translation
Greg Hanneman | Michelle Burroughs | Alon Lavie
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
Michael Denkowski | Alon Lavie
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
CMU System Combination in WMT 2011
Kenneth Heafield | Alon Lavie
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
CMU Syntax-Based Machine Translation at WMT 2011
Greg Hanneman | Alon Lavie
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluating the Output of Machine Translation Systems
Alon Lavie
Proceedings of Machine Translation Summit XIII: Tutorial Abstracts

This half-day tutorial provides a broad overview of how to evaluate translations that are produced by machine translation systems. The range of issues covered includes a broad survey of both human evaluation measures and commonly-used automated metrics, and a review of how these are used for various types of evaluation tasks, such as assessing the translation quality of MT-translated sentences, comparing the performance of alternative MT systems, or measuring the productivity gains of incorporating MT into translation workflows.

pdf bib
Unsupervised Word Alignment with Arbitrary Features
Chris Dyer | Jonathan H. Clark | Alon Lavie | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
Jonathan H. Clark | Chris Dyer | Alon Lavie | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Extending the METEOR Machine Translation Evaluation Metric to the Phrase Level
Michael Denkowski | Alon Lavie
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Exploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk
Michael Denkowski | Alon Lavie
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Michael Denkowski | Hassan Al-Haj | Alon Lavie
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Improved Features and Grammar Selection for Syntax-Based MT
Greg Hanneman | Jonathan Clark | Alon Lavie
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
CMU Multi-Engine Machine Translation for WMT 2010
Kenneth Heafield | Alon Lavie
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages
Michael Denkowski | Alon Lavie
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
The Impact of Arabic Morphological Segmentation on Broad-coverage English-to-Arabic Statistical Machine Translation
Hassan Al-Haj | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

Morphologically rich languages pose a challenge for statistical machine translation (SMT). This challenge is magnified when translating into a morphologically rich language. In this work we address this challenge in the framework of a broad-coverage English-to-Arabic phrase based statistical machine translation (PBSMT). We explore the full spectrum of Arabic segmentation schemes ranging from full word form to fully segmented forms and examine the effects on system performance. Our results show a difference of 2.61 BLEU points between the best and worst segmentation schemes indicating that the choice of the segmentation scheme has a significant effect on the performance of a PBSMT system in a large data scenario. We also show that a simple segmentation scheme can perform as good as the best and more complicated segmentation scheme. We also report results on a wide set of techniques for recombining the segmented Arabic output.

pdf bib
Using Variable Decoding Weight for Language Model in Statistical Machine Translation
Behrang Mohit | Rebecca Hwa | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper investigates varying the decoder weight of the language model (LM) when translating different parts of a sentence. We determine the condition under which the LM weight should be adapted. We find that a better translation can be achieved by varying the LM weight when decoding the most problematic spot in a sentence, which we refer to as a difficult segment. Two adaptation strategies are proposed and compared through experiments. We find that adapting a different LM weight for every difficult segment resulted in the largest improvement in translation quality.

pdf bib
Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks
Michael Denkowski | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper examines the motivation, design, and practical results of several types of human evaluation tasks for machine translation. In addition to considering annotator performance and task informativeness over multiple evaluations, we explore the practicality of tuning automatic evaluation metrics to each judgment type in a comprehensive experiment using the METEOR-NEXT metric. We present results showing clear advantages of tuning to certain types of judgments and discuss causes of inconsistency when tuning to various judgment data, as well as sources of difficulty in the human evaluation tasks themselves.

pdf bib
Voting on N-grams for Machine Translation System Combination
Kenneth Heafield | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

System combination exploits differences between machine translation systems to form a combined translation from several system outputs. Core to this process are features that reward n-gram matches between a candidate combination and each system output. Systems differ in performance at the n-gram level despite similar overall scores. We therefore advocate a new feature formulation: for each system and each small n, a feature counts n-gram matches between the system and candidate. We show post-evaluation improvement of 6.67 BLEU over the best system on NIST MT09 Arabic-English test data. Compared to a baseline system combination scheme from WMT 2009, we show improvement in the range of 1 BLEU point.

pdf bib
Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions
Reshef Shilon | Nizar Habash | Alon Lavie | Shuly Wintner
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Student Research Workshop

Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications for machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of such a system that we are currently developing.

pdf bib
Evaluating the Output of Machine Translation Systems
Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials

pdf bib
LoonyBin: Keeping Language Technologists Sane through Automated Management of Experimental (Hyper)Workflows
Jonathan H. Clark | Alon Lavie
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Many contemporary language technology systems are characterized by long pipelines of tools with complex dependencies. Too often, these workflows are implemented by ad hoc scripts; or, worse, tools are run manually, making experiments difficult to reproduce. These practices are difficult to maintain in the face of rapidly evolving workflows while they also fail to expose and record important details about intermediate data. Further complicating these systems are hyperparameters, which often cannot be directly optimized by conventional methods, requiring users to determine which combination of values is best via trial and error. We describe LoonyBin, an open-source tool that addresses these issues by providing: 1) a visual interface for the user to create and modify workflows; 2) a well-defined mechanism for tracking metadata and provenance; 3) a script generator that compiles visual workflows into shell scripts; and 4) a new workflow representation we call a HyperWorkflow, which intuitively and succinctly encodes small experimental variations within a larger workflow.

2009

pdf bib
Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Vamshi Ambati | Alon Lavie | Jaime Carbonell
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Machine Translation System Combination with Flexible Word Ordering
Kenneth Heafield | Greg Hanneman | Alon Lavie
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
An Improved Statistical Transfer System for French-English Machine Translation
Greg Hanneman | Vamshi Ambati | Jonathan H. Clark | Alok Parlikar | Alon Lavie
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Decoding with Syntactic and Non-Syntactic Phrases in a Syntax-Based Machine Translation System
Greg Hanneman | Alon Lavie
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

2008

pdf bib
Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output
Abhaya Agarwal | Alon Lavie
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Statistical Transfer Systems for French-English and German-English Machine Translation
Greg Hanneman | Edmund Huber | Abhaya Agarwal | Vamshi Ambati | Alok Parlikar | Erik Peterson | Alon Lavie
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Syntax-Driven Learning of Sub-Sentential Translation Equivalents and Translation Rules from Parsed Parallel Corpora
Alon Lavie | Alok Parlikar | Vamshi Ambati
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Evaluating an Agglutinative Segmentation Model for ParaMor
Christian Monson | Alon Lavie | Jaime Carbonell | Lori Levin
Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology

pdf bib
Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages
Christian Monson | Ariadna Font Llitjós | Vamshi Ambati | Lori Levin | Alon Lavie | Alison Alvarez | Roberto Aranovich | Jaime Carbonell | Robert Frederking | Erik Peterson | Katharina Probst
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.

pdf bib
Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures
Vamshi Ambati | Alon Lavie
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Student Research Workshop

Syntax-based approaches to statistical MT require syntax-aware methods for acquiring their underlying translation models from parallel data. This acquisition process can be driven by syntactic trees for either the source or target language, or by trees on both sides. Work to date has demonstrated that using trees for both sides suffers from severe coverage problems. This is primarily due to the highly restrictive space of constituent segmentations that the trees on two sides introduce, which adversely affects the recall of the resulting translation models. Approaches that project from trees on one side, on the other hand, have higher levels of recall, but suffer from lower precision, due to the lack of syntactically-aware word alignments. In this paper we explore the issue of lexical coverage of the translation models learned in both of these scenarios. We specifically look at how the non-isomorphic nature of the parse trees for the two languages affects recall and coverage. We then propose a novel technique for restructuring target parse trees, that generates highly isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We evaluate the translation models learned from these restructured trees and show that they are significantly better than those learned using trees on both sides and trees on one side.

2007

pdf bib
High-accuracy Annotation and Parsing of CHILDES Transcripts
Kenji Sagae | Eric Davis | Alon Lavie | Brian MacWhinney | Shuly Wintner
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

pdf bib
METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments
Alon Lavie | Abhaya Agarwal
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation
Idan Szpektor | Ido Dagan | Alon Lavie | Danny Shacham | Shuly Wintner
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis
Christian Monson | Jaime Carbonell | Alon Lavie | Lori Levin
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

pdf bib
Improving transfer-based MT systems with automatic refinements
Ariadna Font Llitjós | Jaime Carbonell | Alon Lavie
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Experiments with a noun-phrase driven statistical machine translation system
Sanjika Hewavitharana | Alon Lavie | Stephan Vogel
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
A Best-First Probabilistic Shift-Reduce Parser
Kenji Sagae | Alon Lavie
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Parser Combination by Reparsing
Kenji Sagae | Alon Lavie
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf bib
BLANC: Learning Evaluation Metrics for MT
Lucian Lita | Monica Rogati | Alon Lavie
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization
Jade Goldstein | Alon Lavie | Chin-Yew Lin | Clare Voss
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

pdf bib
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
Satanjeev Banerjee | Alon Lavie
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

pdf bib
A Classifier-Based Parser with Linear Run-Time Complexity
Kenji Sagae | Alon Lavie
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Automatic Measurement of Syntactic Development in Child Language
Kenji Sagae | Alon Lavie | Brian MacWhinney
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Multi-Engine Machine Translation Guided by Explicit Word Matching
Shyamsundar Jayaraman | Alon Lavie
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
A framework for interactive and automatic refinement of transfer-based machine translation
Ariadna Font Llitjós | Jaime G. Carbonell | Alon Lavie
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib
Multi-engine machine translation guided by explicit word matching
Shyamsundar Jayaraman | Alon Lavie
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Unsupervised Induction of Natural Language Morphology Inflection Classes
Christian Monson | Alon Lavie | Jaime Carbonell | Lori Levin
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology

pdf bib
Rapid prototyping of a transfer-based Hebrew-to-English machine translation system
Alon Lavie | Erik Peterson | Katharina Probst | Shuly Wintner | Yaniv Eytani
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf bib
The significance of recall in automatic metrics for MT evaluation
Alon Lavie | Kenji Sagae | Shyamsundar Jayaraman
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

Recent research has shown that a balanced harmonic mean (F1 measure) of unigram precision and recall outperforms the widely used BLEU and NIST metrics for Machine Translation evaluation in terms of correlation with human judgments of translation quality. We show that significantly better correlations can be achieved by placing more weight on recall than on precision. While this may seem unexpected, since BLEU and NIST focus on n-gram precision and disregard recall, our experiments show that correlation with human judgments is highest when almost all of the weight is assigned to recall. We also show that stemming is significantly beneficial not just to simpler unigram precision and recall based metrics, but also to BLEU and NIST.

pdf bib
A structurally diverse minimal corpus for eliciting structural mappings between languages
Katharina Probst | Alon Lavie
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and word-align it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.

pdf bib
A trainable transfer-based MT approach for languages with limited resources
Alon Lavie | Katharina Probst | Erik Peterson | Stephan Vogel | Lori Levin | Ariadna Font-Llitjos | Jaime Carbonell
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Data Collection and Analysis of Mapudungun Morphology for Spelling Correction
Christian Monson | Lori Levin | Rodolfo Vega | Ralf Brown | Ariadna Font Llitjos | Alon Lavie | Jaime Carbonell | Eliseo Cañulef | Rosendo Huisca
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs
Kenji Sagae | Brian MacWhinney | Alon Lavie
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Speechalator: Two-Way Speech-to-Speech Translation in Your Hand
Alex Waibel | Ahmed Badran | Alan W. Black | Robert Frederking | Donna Gates | Alon Lavie | Lori Levin | Kevin Lenzo | Laura Mayfield Tomokiyo | Juergen Reichert | Tanja Schultz | Dorcas Wallace | Monika Woszczyna | Jing Zhang
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

pdf bib
Domain Specific Speech Acts for Spoken Language Translation
Lori Levin | Chad Langley | Alon Lavie | Donna Gates | Dorcas Wallace | Kay Peterson
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf bib
Parsing Domain Actions with Phrase-Level Grammars and Memory-Based Learners
Chad Langley | Alon Lavie
Proceedings of the Eighth International Conference on Parsing Technologies

In this paper, we describe an approach to analysis for spoken language translation that combines phrase-level grammar-based parsing and automatic domain action classification. The job of the analyzer is to transform utterances into a shallow semantic task-oriented interlingua representation. The goal of our hybrid approach is to provide accurate real-time analyses and to improve robustness and portability to new domains and languages.

pdf bib
Combining Rule-based and Data-driven Techniques for Grammatical Relation Extraction in Spoken Language
Kenji Sagae | Alon Lavie
Proceedings of the Eighth International Conference on Parsing Technologies

We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based system is used as the teacher (or an automatic data annotator) to a corpus-based system, outperforms either system in isolation.

2002

pdf bib
Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers
Chad Langley | Alon Lavie | Lori Levin | Dorcas Wallace | Donna Gates | Kay Peterson
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue
Lori Levin | Donna Gates | Dorcas Pianta | Roldano Cattoni | Nadia Mana | Kay Peterson | Alon Lavie | Fabio Pianesi
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System
Alon Lavie | Florian Metze | Roldano Cattoni | Erica Costantini
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Automatic rule learning for resource-limited MT
Jaime Carbonell | Katharina Probst | Erik Peterson | Christian Monson | Alon Lavie | Ralf Brown | Lori Levin
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.

pdf bib
The NESPOLE! speech-to-speech translation system
Alon Lavie | Lori Levin | Robert Frederking | Fabio Pianesi
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

NESPOLE! is a speech-to-speech machine translation research system designed to provide fully functional speech-to-speech capabilities within real-world settings of common users involved in e-commerce applications. The project is funded jointly by the European Commission and the US NSF. The NESPOLE! system uses a client-server architecture to allow a common user, who is browsing web-pages on the internet, to connect seamlessly in real-time to an agent of the service provider, using a video-conferencing channel and with speech-to-speech translation services mediating the conversation. Shared web pages and annotated images supported via a Whiteboard application are available to enhance the communication.

pdf bib
Rapid adaptive development of semantic analysis grammars
Alicia Tribble | Alon Lavie | Lori Levin
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2001

pdf bib
Pre-processing of bilingual corpora for Mandarin-English EBMT
Ying Zhang | Ralf Brown | Robert Frederking | Alon Lavie
Proceedings of Machine Translation Summit VIII

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). It uses dynamic programming with a frequency dictionary to segment the text. Although the frequency dictionary is large, it does not completely cover the corpora. In this paper, we describe the work we have done to improve the segmentation for Mandarin and the bracketing process for English to increase the length of English phrases. A statistical dictionary is built from the aligned bilingual corpus. It is used as feedback to segmentation and bracketing to re-segment / re-bracket the corpus. The process iterates several times to achieve better results. The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary. We achieved positive results by increasing the average length of Chinese terms about 60% and 10% for English. The statistical dictionary gained about a 30% increase in coverage.

pdf bib
Design and implementation of controlled elicitation for machine translation of low-density languages
Katharina Probst | Ralf Brown | Jaime Carbonell | Alon Lavie | Lori Levin | Erik Peterson
Workshop on MT2010: Towards a Road Map for MT

NICE is a machine translation project for low-density languages. We are building a tool that will elicit a controlled corpus from a bilingual speaker who is not an expert in linguistics. The corpus is intended to cover major typological phenomena, as it is designed to work for any language. Using implicational universals, we strive to minimize the number of sentences that each informant has to translate. From the elicited sentences, we learn transfer rules with a version space algorithm. Our vision for MT in the future is one in which systems can be quickly trained for new languages by native speakers, so that speakers of minor languages can participate in education, health care, government, and internet without having to give up their languages.

pdf bib
Parsing the CHILDES Database: Methodology and Lessons Learned
Kenji Sagae | Alon Lavie | Brian MacWhinney
Proceedings of the Seventh International Workshop on Parsing Technologies

pdf bib
Architecture and Design Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications
Alon Lavie | Chad Langley | Alex Waibel | Fabio Pianesi | Gianni Lazzari | Paolo Coletti | Loredana Taddei | Franco Balducci
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Domain Portability in Speech-to-Speech Translation
Alon Lavie | Lori Levin | Tanja Schultz | Chad Langley | Benjamin Han | Alicia Tribble | Donna Gates | Dorcas Wallace | Kay Peterson
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Lessons Learned from a Task-based Evaluation of Speech-to-Speech Machine Translation
Lori Levin | Boris Bartlog | Ariadna Font Llitjos | Donna Gates | Alon Lavie | Dorcas Wallace | Taro Watanabe | Monika Woszczyna
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Shallow Discourse Genre Annotation in CallHome Spanish
Klaus Ries | Lori Levin | Liza Valle | Alon Lavie | Alex Waibel
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Optimal Ambiguity Packing in Context-free Parsers with Interleaved Unification
Alon Lavie | Carolyn Penstein Rosé
Proceedings of the Sixth International Workshop on Parsing Technologies

Ambiguity packing is a well known technique for enhancing the efficiency of context-free parsers. However, in the case of unification-augmented context-free parsers where parsing is interleaved with feature unification, the propagation of feature structures imposes difficulties on the ability of the parser to effectively perform ambiguity packing. We demonstrate that a clever heuristic for prioritizing the execution order of grammar rules and parsing actions can achieve a high level of ambiguity packing that is provably optimal. We present empirical evaluations of the proposed technique, performed with both a Generalized LR parser and a chart parser, that demonstrate its effectiveness.

pdf bib
Evaluation of a Practical Interlingua for Task-Oriented Dialogue
Lori Levin | Donna Gates | Alon Lavie | Fabio Pianesi | Dorcas Wallace | Taro Watanabe
NAACL-ANLP 2000 Workshop: Applied Interlinguas: Practical Applications of Interlingual Approaches to NLP

1999

pdf bib
Tagging of Speech Acts and Dialogue Games in Spanish Call Home
Lori Levin | Klaus Ries | Ann Thyme-Gobbel | Alon Lavie
Towards Standards and Tools for Discourse Tagging

1998

pdf bib
A modular approach to spoken language translation for large domains
Monika Woszczcyna | Matthew Broadhead | Donna Gates | Marsal Gavaldá | Alon Lavie | Lori Levin | Alex Waibel
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

The MT engine of the JANUS speech-to-speech translation system is designed around four main principles: 1) an interlingua approach that allows the efficient addition of new languages, 2) the use of semantic grammars that yield low cost high quality translations for limited domains, 3) modular grammars that support easy expansion into new domains, and 4) efficient integration of multiple grammars using multi-domain parse lattices and domain re-scoring. Within the framework of the C-STAR-II speech-to-speech translation effort, these principles are tested against the challenge of providing translation for a number of domains and language pairs with the additional restriction of a common interchange format.

1997

pdf bib
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
Carolyn Penstein Rose | Alon Lavie
Second Conference on Empirical Methods in Natural Language Processing

pdf bib
Expanding the Domain of a Multi-lingual Speech-to-Speech Translation System
Alon Lavie | Lori Levin | Puming Zhan | Maite Taboada | Donna Gates | Mirella Lapata | Cortis Clark | Matthew Broadhead | Alex Waibel
Spoken Language Translation

1996

pdf bib
JANUS: multi-lingual translation of spontaneous speech in limited domain
Alon Lavie | Lori Levin | Alex Waibel | Donna Gates | Marsal Gavalda | Laura Mayfield
Conference of the Association for Machine Translation in the Americas

pdf bib
Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain
Alon Lavie | Donna Gates | Marsal Gavalda | Laura Mayfield | Alex Waibel | Lori Levin
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

pdf bib
Using Context in Machine Translation of Spoken Language
Lori Levin | Oren Glickman | Yan Qu | Carolyn P. Rose | Donna Gates | Alon Lavie | Alex Waibel | Carol Van Ess-Dykema
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1994

pdf bib
An Integrated Heuristic Scheme for Partial Parse Evaluation
Alon Lavie
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
Recent Advances in Janus: A Speech Translation System
M. Woszczyna | N. Coccaro | A. Eisele | A. Lavie | A. McNair | T. Polzin | I. Rogina | C. P. Rose | T. Sloboda | M. Tomita | J. Tsutsumi | N. Aoki-Waibel | A. Waibel | W. Ward
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

pdf bib
GLR* – An Efficient Noise-skipping Parsing Algorithm For Context Free Grammars
Alon Lavie | Masaru Tomita
Proceedings of the Third International Workshop on Parsing Technologies

This paper describes GLR*, a parser that can parse any input sentence by ignoring unrecognizable parts of the sentence. In case the standard parsing procedure fails to parse an input sentence, the parser nondeterministically skips some word(s) in the sentence, and returns the parse with fewest skipped words. Therefore, the parser will return some parse(s) with any input sentence, unless no part of the sentence can be recognized at all. The problem can be defined in the following way: Given a context-free grammar G and a sentence S, find and parse S' – the largest subset of words of S, such that S' ∈ L(G). The algorithm described in this paper is a modification of the Generalized LR (Tomita) parsing algorithm [Tomita, 1986] . The parser accommodates the skipping of words by allowing shift operations to be performed from inactive state nodes of the Graph Structured Stack. A heuristic similar to beam search makes the algorithm computationally tractable. There have been several other approaches to the problem of robust parsing, most of which are special purpose algorithms [Carbonell and Hayes, 1984] , [Ward, 1991] and others. Because our approach is a modification to a standard context-free parsing algorithm, all the techniques and grammars developed for the standard parser can be applied as they are. Also, in case the input sentence is by itself grammatical, our parser behaves exactly as the standard GLR parser. The modified parser, GLR*, has been implemented and integrated with the latest version of the Generalized LR Parser/Compiler [Tomita et al , 1988], [Tomita, 1990]. We discuss an application of the GLR* parser to spontaneous speech understanding and present some preliminary tests on the utility of the GLR* parser in such settings.
Search
Co-authors