Oren Etzioni - ACL Anthology

Oren Etzioni

2021

Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training.

2020

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.

2018

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.

2016

IKE - An Interactive Tool for Knowledge Extraction
Bhavana Dalvi | Sumithra Bhakthavatsalam | Chris Clark | Peter Clark | Oren Etzioni | Anthony Fader | Dirk Groeneveld
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

Exploring Markov Logic Networks for Question Answering
Tushar Khot | Niranjan Balasubramanian | Eric Gribkoff | Ashish Sabharwal | Peter Clark | Oren Etzioni
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Solving Geometry Problems: Combining Text and Diagram Interpretation
Minjoon Seo | Hannaneh Hajishirzi | Ali Farhadi | Oren Etzioni | Clint Malcolm
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Parsing Algebraic Word Problems into Equations
Rik Koncel-Kedziorski | Hannaneh Hajishirzi | Ashish Sabharwal | Oren Etzioni | Siena Dumas Ang
Transactions of the Association for Computational Linguistics, Volume 3

This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees and score their likelihood by learning local and global discriminative models. These models are trained on a small set of word problems and their answers, without any manual annotation, in order to choose the equation that best matches the problem text. We refer to the overall system as Alges. We compare Alges with previous work and show that it covers the full gamut of arithmetic operations whereas Hosseini et al. (2014) only handle addition and subtraction. In addition, Alges overcomes the brittleness of the Kushman et al. (2014) approach on single-equation problems, yielding a 15% to 50% reduction in error.

2014

Learning to Solve Arithmetic Word Problems with Verb Categorization
Mohammad Javad Hosseini | Hannaneh Hajishirzi | Oren Etzioni | Nate Kushman
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Chinese Open Relation Extraction for Knowledge Acquisition
Yuen-Hsien Tseng | Lung-Hao Lee | Shu-Yen Lin | Bo-Shun Liao | Mei-Jun Liu | Hsin-Hsi Chen | Oren Etzioni | Anthony Fader
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

Generating Coherent Event Schemas at Scale
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Towards Coherent Multi-Document Summarization
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Paraphrase-Driven Learning for Open Question Answering
Anthony Fader | Luke Zettlemoyer | Oren Etzioni
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter | Luke Zettlemoyer | Mausam | Oren Etzioni
Transactions of the Association for Computational Linguistics, Volume 1

Distant supervision algorithms learn information extraction models given only large readily available databases and text collections. Most previous work has used heuristics for generating labeled data, for example assuming that facts not contained in the database are not mentioned in the text, and facts in the database must be mentioned at least once. In this paper, we propose a new latent-variable approach that models missing data. This provides a natural way to incorporate side information, for instance modeling the intuition that text will often mention rare entities which are likely to be missing in the database. Despite the added complexity introduced by reasoning about missing data, we demonstrate that a carefully designed local search approach to inference is very accurate and scales to large datasets. Experiments demonstrate improved performance for binary and unary relation extraction when compared to learning with heuristic labels, including on average a 27% increase in area under the precision recall curve in the binary case.

2012

Open Language Learning for Information Extraction
Mausam | Michael Schmitz | Stephen Soderland | Robert Bart | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Constructing a Textual KB from a Biology TextBook
Peter Clark | Phil Harrison | Niranjan Balasubramanian | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

Entity Linking at Web Scale
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

Rel-grams: A Probabilistic Model of Relations in Text
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

2011

Named Entity Recognition in Tweets: An Experimental Study
Alan Ritter | Sam Clark | Mausam | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

Identifying Relations for Open Information Extraction
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

Learning First-Order Horn Clauses from Web Text
Stefan Schoenmackers | Jesse Davis | Oren Etzioni | Daniel Weld
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Identifying Functional Relations in Web Text
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

A Latent Dirichlet Allocation Method for Selectional Preferences
Alan Ritter | Mausam | Oren Etzioni
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Extracting Sequences from the Web
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the ACL 2010 Conference Short Papers

Semantic Role Labeling for Open Information Extraction
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

Machine Reading at the University of Washington
Hoifung Poon | Janara Christensen | Pedro Domingos | Oren Etzioni | Raphael Hoffmann | Chloe Kiddon | Thomas Lin | Xiao Ling | Mausam | Alan Ritter | Stefan Schoenmackers | Stephen Soderland | Dan Weld | Fei Wu | Congle Zhang
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

2009

Lemmatic Machine Translation
Stephen Soderland | Christopher Lim | Mausam | Bo Qin | Oren Etzioni | Jonathan Pool
Proceedings of Machine Translation Summit XII: Papers

Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
Mausam | Stephen Soderland | Oren Etzioni | Daniel Weld | Michael Skinner | Jeff Bilmes
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

A Rose is a Roos is a Ruusu: Querying Translations for Web Image Search
Janara Christensen | Mausam | Oren Etzioni
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

It’s a Contradiction – no, it’s not: A Case Study using Functional Relations
Alan Ritter | Stephen Soderland | Doug Downey | Oren Etzioni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Scaling Textual Inference to the Web
Stefan Schoenmackers | Oren Etzioni | Daniel Weld
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

The Tradeoffs Between Open and Traditional Relation Extraction
Michele Banko | Oren Etzioni
Proceedings of ACL-08: HLT

2007

Lexical translation with application to image searching on the web
Oren Etzioni | Kobi Reiter | Stephen Soderland | Marcus Sammer
Proceedings of Machine Translation Summit XI: Papers

Unsupervised Resolution of Objects and Relations on the Web
Alexander Yates | Oren Etzioni
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

TextRunner: Open Information Extraction on the Web
Alexander Yates | Michele Banko | Matthew Broadhead | Michael Cafarella | Oren Etzioni | Stephen Soderland
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

Sparse Information Extraction: Unsupervised Language Models to the Rescue
Doug Downey | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

Ambiguity Reduction for Machine Translation: Human-Computer Collaboration
Marcus Sammer | Kobi Reiter | Stephen Soderland | Katrin Kirchhoff | Oren Etzioni
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Statistical Machine Translation (SMT) accuracy degrades when there is only a limited amount of training, or when the training is not from the same domain or genre of text as the target application. However, cross-domain applications are typical of many real world tasks. We demonstrate that SMT accuracy can be improved in a cross-domain application by using a controlled language (CL) interface to help reduce lexical ambiguity in the input text. Our system, CL-MT, presents a monolingual user with a choice of word senses for each content word in the input text. CL-MT temporarily adjusts the underlying SMT system's phrase table, boosting the scores of translations that include the word senses preferred by the user and lowering scores for disfavored translations. We demonstrate that this improves translation adequacy in 33.8% of the sentences in Spanish to English translation of news stories, where the SMT system was trained on proceedings of the European Parliament.

Detecting Parser Errors Using Web-based Semantic Filters
Alexander Yates | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

BE: A search engine for NLP research
Mike Cafarella | Oren Etzioni
Proceedings of the 2nd International Workshop on Web as Corpus

Expanding the Recall of Relation Extraction by Bootstrapping
Junji Tomita | Stephen Soderland | Oren Etzioni
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)

2005

Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

KnowItNow: Fast, Scalable Information Extraction from the Web
Michael J. Cafarella | Doug Downey | Stephen Soderland | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

OPINE: Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Bao Nguyen | Oren Etzioni
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Ana-Maria Popescu | Alex Armanasu | Oren Etzioni | David Ko | Alexander Yates
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Co-authors

Daniel S. Weld 5

Niranjan Balasubramanian 4

Janara Christensen 4

Hannaneh Hajishirzi 4

Alexander Yates 4

Ana-Maria Popescu 3

Michele Banko 2

Michael J. Cafarella 2

Dirk Groeneveld 2

Sebastian Kohlmeier 2

Ashish Sabharwal 2

Marcus Sammer 2

Sam Skjonsberg 2

Luke Zettlemoyer 2

Siena Dumas Ang 1

Alex Armanasu 1

Chandra Bhagavatula 1

Sumithra Bhakthavatsalam 1

Derrick Bonafilia 1

Matthew Broadhead 1

Mike Cafarella 1

Yoganand Chandrasekhar 1

Hsin-Hsi Chen 1

Jonghyun Choi 1

Christopher Clark 1

Miles Crawford 1

Bhavana Dalvi 1

Pedro Domingos 1

Jason Dunkelberger 1

Ahmed Elgohary 1

Sergey Feldman 1

Eric Gribkoff 1

Phil Harrison 1

Alvaro Herrasti 1

Raphael Hoffmann 1

Mohammad Javad Hosseini 1

Yannis Katsis 1

Aniruddha Kembhavi 1

Chloé Kiddon 1

Rodney Michael Kinney 1

Rodney Kinney 1

Katrin Kirchhoff 1

Rik Koncel-Kedziorski 1

Christopher Lim 1

Clint Malcolm 1

William Merrill 1

Dewey A. Murdick 1

Matthew E. Peters 1

Jonathan Pool 1

Douglas M. Raymond 1

Devvret Rishi 1

Jordi Salvador 1

Michael Schmitz 1

Carissa Schoenick 1

Dustin Schwenk 1

Jerry Sheehan 1

Michael Skinner 1

Brandon Stilson 1

Yuen-Hsien Tseng 1

Nancy Xin Ru Wang 1

Christopher Wilhelm 1

Chris Wilhelm 1

Jiangjiang Yang 1

Madeleine van Zuylen 1

Venues