Daniel Marcu


2017

pdf bib
Biomedical Event Extraction using Abstract Meaning Representation
Sudha Rao | Daniel Marcu | Kevin Knight | Hal Daumé III
BioNLP 2017

We propose a novel, Abstract Meaning Representation (AMR) based approach to identifying molecular events/interactions in biomedical text. Our key contributions are: (1) an empirical validation of our hypothesis that an event is a subgraph of the AMR graph, (2) a neural network-based model that identifies such an event subgraph given an AMR, and (3) a distant supervision based approach to gather additional training data. We evaluate our approach on the 2013 Genia Event Extraction dataset and show promising results.

2016

pdf bib
Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
Boliang Zhang | Xiaoman Pan | Tianlu Wang | Ashish Vaswani | Heng Ji | Kevin Knight | Daniel Marcu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Natural Language Communication with Robots
Yonatan Bisk | Deniz Yuret | Daniel Marcu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Neural Hidden Markov Models
Ke M. Tran | Yonatan Bisk | Ashish Vaswani | Daniel Marcu | Kevin Knight
Proceedings of the Workshop on Structured Prediction for NLP

pdf bib
Extracting Structured Scholarly Information from the Machine Translation Literature
Eunsol Choi | Matic Horvat | Jonathan May | Kevin Knight | Daniel Marcu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Understanding the experimental results of a scientific paper is crucial to understanding its contribution and to comparing it with related work. We introduce a structured, queryable representation for experimental results and a baseline system that automatically populates this representation. The representation can answer compositional questions such as: “Which are the best published results reported on the NIST 09 Chinese to English dataset?” and “What are the most important methods for speeding up phrase-based decoding?” Answering such questions usually involves lengthy literature surveys. Current machine reading for academic papers does not usually consider the actual experiments, but mostly focuses on understanding abstracts. We describe annotation work to create an initial hscientific paper; experimental results representationi corpus. The corpus is composed of 67 papers which were manually annotated with a structured representation of experimental results by domain experts. Additionally, we present a baseline algorithm that characterizes the difficulty of the inference task.

2015

pdf bib
Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation
Michael Pust | Ulf Hermjakob | Kevin Knight | Daniel Marcu | Jonathan May
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
HyTER: Meaning-Equivalent Semantics for Translation Evaluation
Markus Dreyer | Daniel Marcu
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Automatic Parallel Fragment Extraction from Noisy Data
Jason Riesa | Daniel Marcu
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

bib
A New Method for Automatic Translation Scoring-HyTER
Daniel Marcu
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program

It is common knowledge that translation is an ambiguous, 1-to-n mapping process, but to date, our community has produced no empirical estimates of this ambiguity. We have developed an annotation tool that enables us to create representations that compactly encode an exponential number of correct translations for a sentence. Our findings show that naturally occurring sentences have billions of translations. Having access to such large sets of meaning-equivalent translations enables us to develop a new metric, HyTER, for translation accuracy. We show that our metric provides better estimates of machine and human translation accuracy than alternative evaluation metrics using data from the most recent Open MT NIST evaluation and we discuss how HyTER representations can be used to inform a data-driven inquiry into natural language semantics.

2011

pdf bib
Meaning-equivalent semantics forunderstanding, generation, translation, and evaluation
Daniel Marcu
Proceedings of the 8th International Workshop on Spoken Language Translation: Keynotes

pdf bib
Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Jason Riesa | Ann Irvine | Daniel Marcu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Creating Value at the Boundary Between Humans and Machines
Daniel Marcu
Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry

For a long time, machine translation and professional translation vendors have had a contentious relation. However, new tools, computing platforms, and business models are changing the fundamentals of this relationship. I will review the main trends in the area while emphasizing both past causes of failure and main drivers of success.

pdf bib
Hierarchical Search for Word Alignment
Jason Riesa | Daniel Marcu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

bib
Trusted Translations Deliver Compelling Results for the Travel Industry
Daniel Marcu
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program

bib
Utilizing Automated Translation with Quality Scores to Increase Productivity
Daniel Marcu | Kathleen Egan | Chuck Simmons | Ning-Ning Mahlmann
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

Automated translation can assist with a variety of translation needs in government, from speeding up access to information for intelligence work to helping human translators increase their productivity. However, government entities need to have a mechanism in place so that they know whether or not they can trust the output from automated translation solutions. In this presentation, Language Weaver will present a new capability "TrustScore": an automated scoring algorithm that communicates how good the automated translation is, using a meaningful metric. With this capability, each translation is automatically assigned a score from 1 to 5 in the TrustScore. A score of 1 would indicate that the translation is unintelligible; a score of 3 would indicate that meaning has been conveyed and that the translated content is actionable. A score approaching 4 or higher would indicate that meaning and nuance have been carried through. This automatic prediction of quality has been validated by testing done across significant numbers of data points in different companies and on different types of content. After outlining TrustScore, and how it works, Language Weaver will discuss how a scoring mechanism like TrustScore could be used in a translation productivity workflow in government to assist linguists with day to day translation work. This would enable them to further benefit from their investments in automated translation software. Language Weaver would also share how TrustScore is used in commercial deployments to cost effectively publish information in near real time.

pdf bib
Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
Wei Wang | Jonathan May | Kevin Knight | Daniel Marcu
Computational Linguistics, Volume 36, Number 2, June 2010

2008

pdf bib
Language Translation Solutions for Community Content
Daniel Marcu
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

2007

pdf bib
Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation
Alexander Fraser | Daniel Marcu
Computational Linguistics, Volume 33, Number 3, September 2007

pdf bib
Getting the Structure Right for Word Alignment: LEAF
Alexander Fraser | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Wei Wang | Kevin Knight | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
What Can Syntax-Based MT Learn from Phrase-Based MT?
Steve DeNeefe | Kevin Knight | Wei Wang | Daniel Marcu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora
Dragos Stefan Munteanu | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Bayesian Query-Focused Summarization
Hal Daumé III | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Semi-Supervised Training for Statistical Word Alignment
Alexander Fraser | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Scalable Inference and Training of Context-Rich Syntactic Translation Models
Michel Galley | Jonathan Graehl | Kevin Knight | Daniel Marcu | Steve DeNeefe | Wei Wang | Ignacio Thayer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Stochastic Language Generation Using WIDL-Expressions and its Application in Machine Translation and Summarization
Radu Soricut | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discourse Generation Using Utility-Trained Coherence Models
Radu Soricut | Daniel Marcu
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
The Potential and Limitations of MT Paradigm
Daniel Marcu | Alan Melby
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Invited Talks

pdf bib
Capitalizing Machine Translation
Wei Wang | Kevin Knight | Daniel Marcu
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
SPMT: Statistical Machine Translation with Syntactified Target Language Phrases
Daniel Marcu | Wei Wang | Abdessamad Echihabi | Kevin Knight
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
ISI‘s Participation in the Romanian-English Alignment Task
Alexander Fraser | Daniel Marcu
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Dragos Stefan Munteanu | Daniel Marcu
Computational Linguistics, Volume 31, Number 4, December 2005

pdf bib
Induction of Word and Phrase Alignments for Automatic Document Summarization
Hal Daumé III | Daniel Marcu
Computational Linguistics, Volume 31, Number 4, December 2005

pdf bib
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model
Hal Daumé III | Daniel Marcu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Translation Exercise Assistant: Automated Generation of Translation
Jill Burstein | Daniel Marcu
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

pdf bib
Towards Developing Generation Algorithms for Text-to-Text Applications
Radu Soricut | Daniel Marcu
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogs
Robert Belvin | Emil Ettelaie | Sudeep Gandhe | Panayiotis Georgiou | Kevin Knight | Daniel Marcu | Scott Millward | Shrikanth Narayanan | Howard Neely | David Traum
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2004

pdf bib
The ISI/USC MT system
Emil Ettelaie | Kevin Knight | Daniel Marcu | Dragos Stefan Munteanu | Franz J. Och | Ignacio Thayer | Quamrul Tipu
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
(Invited presentation) The Perils and Rewards of Developing Restricted Domain Applications
Daniel Marcu
Proceedings of the Conference on Question Answering in Restricted Domains

pdf bib
Generic Sentence Fusion is an Ill-Defined Summarization Task
Hal Daume III | Daniel Marcu
Text Summarization Branches Out

pdf bib
Language Weaver Arabic->English MT
Daniel Marcu | Alex Fraser | William Wong | Kevin Knight
Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

pdf bib
A Phrase-Based HMM Approach to Document/Abstract Alignment
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
NP Bracketing by Maximum Entropy Tagging and SVM Reranking
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Evaluating Multiple Aspects of Coherence in Student Essays
Derrick Higgins | Jill Burstein | Daniel Marcu | Claudia Gentile
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora
Dragos Stefan Munteanu | Alexander Fraser | Daniel Marcu
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
What’s in a translation rule?
Michel Galley | Mark Hopkins | Kevin Knight | Daniel Marcu
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf bib
Statistical Phrase-Based Translation
Philipp Koehn | Franz J. Och | Daniel Marcu
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences
Bo Pang | Kevin Knight | Daniel Marcu
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Sentence Level Discourse Parsing using Syntactic and Lexical Information
Radu Soricut | Daniel Marcu
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Cognates Can Improve Statistical Translation Models
Grzegorz Kondrak | Daniel Marcu | Kevin Knight
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
A Noisy-Channel Approach to Question Answering
Abdessamad Echihabi | Daniel Marcu
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Language Weaver: the next generation of machine translation
Bryce Benjamin | Laurie Gerber | Kevin Knight | Daniel Marcu
Proceedings of Machine Translation Summit IX: System Presentations

We introduce a new generation of commercial translation software, based primarily on statistical learning and statistical language models.

2002

pdf bib
A Phrase-Based,Joint Probability Model for Statistical Machine Translation
Daniel Marcu | Daniel Wong
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Processing Comparable Corpora With Bilingual Suffix Trees
Dragos Stefan Munteanu | Daniel Marcu
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks
Hal Daume III | Kevin Knight | Irene Langkilde-Geary | Daniel Marcu | Kenji Yamada
Proceedings of the International Natural Language Generation Conference

pdf bib
An Unsupervised Approach to Recognizing Discourse Relations
Daniel Marcu | Abdessamad Echihabi
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Noisy-Channel Model for Document Compression
Hal Daume III | Daniel Marcu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Using a large monolingual corpus to improve translation accuracy
Radu Soricut | Kevin Knight | Daniel Marcu
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

The existence of a phrase in a large monolingual corpus is very useful information, and so is its frequency. We introduce an alternative approach to automatic translation of phrases/sentences that operationalizes this observation. We use a statistical machine translation system to produce alternative translations and a large monolingual corpus to (re)rank these translations. Our results show that this combination yields better translations, especially when translating out-of-domain phrases/sentences. Our approach can be also used to automatically construct parallel corpora from monolingual resources.

pdf bib
Translation by the numbers: Language Weaver
Bryce Benjamin | Kevin Knight | Daniel Marcu
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

Pre-market prototype - to be available commercially in the second or third quarter of 2003.

2001

pdf bib
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
Lynn Carlson | Daniel Marcu | Mary Ellen Okurovsky
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

pdf bib
Towards Automatic Classification of Discourse Elements in Essays
Jill Burstein | Daniel Marcu | Slava Andreyev | Martin Chodorow
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Fast Decoding and Optimal Decoding for Machine Translation
Ulrich Germann | Michael Jahr | Kevin Knight | Daniel Marcu | Kenji Yamada
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Towards a Unified Approach to Memory- and Statistical-Based Machine Translation
Daniel Marcu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
An empirical study of multilingual natural language generation: What Should a Text Planner Do?
Daniel Marcu | Lynn Carlson | Maki Watanabe
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

pdf bib
Benefits of Modularity in an Automated Essay Scoring System
Jill Burstein | Daniel Marcu
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

pdf bib
The Automatic Translation of Discourse Structures
Daniel Marcu | Lynn Carlson | Maki Watanabe
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
The rhetorical parsing of unrestricted texts: a surface-based approach
Daniel Marcu
Computational Linguistics, Volume 26, Number 3, September 2000

pdf bib
An Empirical Investigation of the Relation Between Discourse Structure and Co-Reference
Dan Cristea | Nancy Ide | Daniel Marcu | Valentin Tablan
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Extending a Formal and Computational Model of Rhetorical Structure Theory with Intentional Structures la Grosz and Sidner
Daniel Marcu
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1999

pdf bib
A Decision-Based Approach to Rhetorical Parsing
Daniel Marcu
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discourse Structure and Co-Reference: An Empirical Study
Dan Cristea | Nancy Ide | Daniel Marcu | Valentin Tablan
The Relation of Discourse/Dialogue Structure and Reference

pdf bib
Experiments in Constructing a Corpus of Discourse Trees
Daniel Marcu | Estibaliz Amorrortu | Magdalena Romera
Towards Standards and Tools for Discourse Tagging

1998

pdf bib
A surface-based approach to identifying discourse markers and elementary textual units in unrestricted texts
Daniel Marcu
Discourse Relations and Discourse Markers

pdf bib
Improving summarization through rhetorical parsing tuning
Daniel Marcu
Sixth Workshop on Very Large Corpora

1997

pdf bib
From discourse structures to text summaries
Daniel Marcu
Intelligent Scalable Text Summarization

pdf bib
The Rhetorical Parsing of Unrestricted Natural Language Texts
Daniel Marcu
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1995

pdf bib
A Uniform Treatment of Pragmatic Inferences in Simple and Complex Utterances and Sequences of Utterances
Daniel Marcu | Graeme Hirst
33rd Annual Meeting of the Association for Computational Linguistics