Steve DeNeefe


2024

pdf bib
Domain adapted machine translation: What does catastrophic forgetting forget and why?
Danielle Saunders | Steve DeNeefe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Neural Machine Translation (NMT) models can be specialized by domain adaptation, often involving fine-tuning on a dataset of interest. This process risks catastrophic forgetting: rapid loss of generic translation quality. Forgetting has been widely observed, with many mitigation methods proposed. However, the causes of forgetting and the relationship between forgetting and adaptation data are underexplored.This paper takes a novel approach to understanding catastrophic forgetting during NMT adaptation by investigating the impact of the data. We provide a first investigation of what is forgotten, and why. We examine the relationship between forgetting and the in-domain data, and show that the amount and type of forgetting is linked to that data’s target vocabulary coverage. Our findings pave the way toward better informed NMT domain adaptation.

2023

pdf bib
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
Melissa Roemmele | Kyle Shaffer | Katrina Olsen | Yiyi Wang | Steve DeNeefe
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.

2021

pdf bib
AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents
Melissa Roemmele | Deep Sidhpura | Steve DeNeefe | Ling Tsou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

One strategy for facilitating reading comprehension is to present information in a question-and-answer format. We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents. We report some experiments for QA and QG that yield improvements on both tasks, and assess how they interact to produce a list of Q&A items for a text. The demo is accessible at qna.sdl.com.

2011

pdf bib
Two Easy Improvements to Lexical Weighting
David Chiang | Steve DeNeefe | Michael Pust
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
A Decoder for Probabilistic Synchronous Tree Insertion Grammars
Steve DeNeefe | Kevin Knight | Heiko Vogler
Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing

2009

pdf bib
Synchronous Tree Adjoining Machine Translation
Steve DeNeefe | Kevin Knight
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms
David Chiang | Steve DeNeefe | Yee Seng Chan | Hwee Tou Ng
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Overcoming Vocabulary Sparsity in MT Using Lattices
Steve DeNeefe | Ulf Hermjakob | Kevin Knight
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Source languages with complex word-formation rules present a challenge for statistical machine translation (SMT). In this paper, we take on three facets of this challenge: (1) common stems are fragmented into many different forms in training data, (2) rare and unknown words are frequent in test data, and (3) spelling variation creates additional sparseness problems. We present a novel, lightweight technique for dealing with this fragmentation, based on bilingual data, and we also present a combination of linguistic and statistical techniques for dealing with rare and unknown words. Taking these techniques together, we demonstrate +1.3 and +1.6 BLEU increases on top of strong baselines for Arabic-English machine translation.

2007

pdf bib
What Can Syntax-Based MT Learn from Phrase-Based MT?
Steve DeNeefe | Kevin Knight | Wei Wang | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Scalable Inference and Training of Context-Rich Syntactic Translation Models
Michel Galley | Jonathan Graehl | Kevin Knight | Daniel Marcu | Steve DeNeefe | Wei Wang | Ignacio Thayer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
ISI’s 2005 Statistical Machine Translation Entries
Steve DeNeefe | Kevin Knight
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib
Interactively Exploring a Machine Translation Model
Steve DeNeefe | Kevin Knight | Hayward H. Chan
Proceedings of the ACL Interactive Poster and Demonstration Sessions