Oren Etzioni


2021

pdf bib
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark | Jordi Salvador | Dustin Schwenk | Derrick Bonafilia | Mark Yatskar | Eric Kolve | Alvaro Herrasti | Jonghyun Choi | Sachin Mehta | Sam Skjonsberg | Carissa Schoenick | Aaron Sarnat | Hannaneh Hajishirzi | Aniruddha Kembhavi | Oren Etzioni | Ali Farhadi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training.

2020

pdf bib
CORD-19: The COVID-19 Open Research Dataset
Lucy Lu Wang | Kyle Lo | Yoganand Chandrasekhar | Russell Reas | Jiangjiang Yang | Doug Burdick | Darrin Eide | Kathryn Funk | Yannis Katsis | Rodney Michael Kinney | Yunyao Li | Ziyang Liu | William Merrill | Paul Mooney | Dewey A. Murdick | Devvret Rishi | Jerry Sheehan | Zhihong Shen | Brandon Stilson | Alex D. Wade | Kuansan Wang | Nancy Xin Ru Wang | Christopher Wilhelm | Boya Xie | Douglas M. Raymond | Daniel S. Weld | Oren Etzioni | Sebastian Kohlmeier
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for COVID-19.

2018

pdf bib
Construction of the Literature Graph in Semantic Scholar
Waleed Ammar | Dirk Groeneveld | Chandra Bhagavatula | Iz Beltagy | Miles Crawford | Doug Downey | Jason Dunkelberger | Ahmed Elgohary | Sergey Feldman | Vu Ha | Rodney Kinney | Sebastian Kohlmeier | Kyle Lo | Tyler Murray | Hsu-Han Ooi | Matthew Peters | Joanna Power | Sam Skjonsberg | Lucy Lu Wang | Chris Wilhelm | Zheng Yuan | Madeleine van Zuylen | Oren Etzioni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.

2016

pdf bib
IKE - An Interactive Tool for Knowledge Extraction
Bhavana Dalvi | Sumithra Bhakthavatsalam | Chris Clark | Peter Clark | Oren Etzioni | Anthony Fader | Dirk Groeneveld
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
Exploring Markov Logic Networks for Question Answering
Tushar Khot | Niranjan Balasubramanian | Eric Gribkoff | Ashish Sabharwal | Peter Clark | Oren Etzioni
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Solving Geometry Problems: Combining Text and Diagram Interpretation
Minjoon Seo | Hannaneh Hajishirzi | Ali Farhadi | Oren Etzioni | Clint Malcolm
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Parsing Algebraic Word Problems into Equations
Rik Koncel-Kedziorski | Hannaneh Hajishirzi | Ashish Sabharwal | Oren Etzioni | Siena Dumas Ang
Transactions of the Association for Computational Linguistics, Volume 3

This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees and score their likelihood by learning local and global discriminative models. These models are trained on a small set of word problems and their answers, without any manual annotation, in order to choose the equation that best matches the problem text. We refer to the overall system as Alges. We compare Alges with previous work and show that it covers the full gamut of arithmetic operations whereas Hosseini et al. (2014) only handle addition and subtraction. In addition, Alges overcomes the brittleness of the Kushman et al. (2014) approach on single-equation problems, yielding a 15% to 50% reduction in error.

2014

pdf bib
Learning to Solve Arithmetic Word Problems with Verb Categorization
Mohammad Javad Hosseini | Hannaneh Hajishirzi | Oren Etzioni | Nate Kushman
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Chinese Open Relation Extraction for Knowledge Acquisition
Yuen-Hsien Tseng | Lung-Hao Lee | Shu-Yen Lin | Bo-Shun Liao | Mei-Jun Liu | Hsin-Hsi Chen | Oren Etzioni | Anthony Fader
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Generating Coherent Event Schemas at Scale
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Paraphrase-Driven Learning for Open Question Answering
Anthony Fader | Luke Zettlemoyer | Oren Etzioni
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards Coherent Multi-Document Summarization
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Modeling Missing Data in Distant Supervision for Information Extraction
Alan Ritter | Luke Zettlemoyer | Mausam | Oren Etzioni
Transactions of the Association for Computational Linguistics, Volume 1

Distant supervision algorithms learn information extraction models given only large readily available databases and text collections. Most previous work has used heuristics for generating labeled data, for example assuming that facts not contained in the database are not mentioned in the text, and facts in the database must be mentioned at least once. In this paper, we propose a new latent-variable approach that models missing data. This provides a natural way to incorporate side information, for instance modeling the intuition that text will often mention rare entities which are likely to be missing in the database. Despite the added complexity introduced by reasoning about missing data, we demonstrate that a carefully designed local search approach to inference is very accurate and scales to large datasets. Experiments demonstrate improved performance for binary and unary relation extraction when compared to learning with heuristic labels, including on average a 27% increase in area under the precision recall curve in the binary case.

2012

pdf bib
Constructing a Textual KB from a Biology TextBook
Peter Clark | Phil Harrison | Niranjan Balasubramanian | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Entity Linking at Web Scale
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Rel-grams: A Probabilistic Model of Relations in Text
Niranjan Balasubramanian | Stephen Soderland | Mausam | Oren Etzioni
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Open Language Learning for Information Extraction
Mausam | Michael Schmitz | Stephen Soderland | Robert Bart | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Named Entity Recognition in Tweets: An Experimental Study
Alan Ritter | Sam Clark | Mausam | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Relations for Open Information Extraction
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
A Latent Dirichlet Allocation Method for Selectional Preferences
Alan Ritter | Mausam | Oren Etzioni
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Extracting Sequences from the Web
Anthony Fader | Stephen Soderland | Oren Etzioni
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Learning First-Order Horn Clauses from Web Text
Stefan Schoenmackers | Jesse Davis | Oren Etzioni | Daniel Weld
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Functional Relations in Web Text
Thomas Lin | Mausam | Oren Etzioni
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Semantic Role Labeling for Open Information Extraction
Janara Christensen | Mausam | Stephen Soderland | Oren Etzioni
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib
Machine Reading at the University of Washington
Hoifung Poon | Janara Christensen | Pedro Domingos | Oren Etzioni | Raphael Hoffmann | Chloe Kiddon | Thomas Lin | Xiao Ling | Mausam | Alan Ritter | Stefan Schoenmackers | Stephen Soderland | Dan Weld | Fei Wu | Congle Zhang
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

2009

pdf bib
Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
Mausam | Stephen Soderland | Oren Etzioni | Daniel Weld | Michael Skinner | Jeff Bilmes
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Rose is a Roos is a Ruusu: Querying Translations for Web Image Search
Janara Christensen | Mausam | Oren Etzioni
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Lemmatic Machine Translation
Stephen Soderland | Christopher Lim | Mausam | Bo Qin | Oren Etzioni | Jonathan Pool
Proceedings of Machine Translation Summit XII: Papers

2008

pdf bib
The Tradeoffs Between Open and Traditional Relation Extraction
Michele Banko | Oren Etzioni
Proceedings of ACL-08: HLT

pdf bib
It’s a Contradiction – no, it’s not: A Case Study using Functional Relations
Alan Ritter | Stephen Soderland | Doug Downey | Oren Etzioni
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Scaling Textual Inference to the Web
Stefan Schoenmackers | Oren Etzioni | Daniel Weld
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Unsupervised Resolution of Objects and Relations on the Web
Alexander Yates | Oren Etzioni
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
TextRunner: Open Information Extraction on the Web
Alexander Yates | Michele Banko | Matthew Broadhead | Michael Cafarella | Oren Etzioni | Stephen Soderland
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

pdf bib
Sparse Information Extraction: Unsupervised Language Models to the Rescue
Doug Downey | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Lexical translation with application to image searching on the web
Oren Etzioni | Kobi Reiter | Stephen Soderland | Marcus Sammer
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
Ambiguity Reduction for Machine Translation: Human-Computer Collaboration
Marcus Sammer | Kobi Reiter | Stephen Soderland | Katrin Kirchhoff | Oren Etzioni
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Statistical Machine Translation (SMT) accuracy degrades when there is only a limited amount of training, or when the training is not from the same domain or genre of text as the target application. However, cross-domain applications are typical of many real world tasks. We demonstrate that SMT accuracy can be improved in a cross-domain application by using a controlled language (CL) interface to help reduce lexical ambiguity in the input text. Our system, CL-MT, presents a monolingual user with a choice of word senses for each content word in the input text. CL-MT temporarily adjusts the underlying SMT system's phrase table, boosting the scores of translations that include the word senses preferred by the user and lowering scores for disfavored translations. We demonstrate that this improves translation adequacy in 33.8% of the sentences in Spanish to English translation of news stories, where the SMT system was trained on proceedings of the European Parliament.

pdf bib
Detecting Parser Errors Using Web-based Semantic Filters
Alexander Yates | Stefan Schoenmackers | Oren Etzioni
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
BE: A search engine for NLP research
Mike Cafarella | Oren Etzioni
Proceedings of the 2nd International Workshop on Web as Corpus

pdf bib
Expanding the Recall of Relation Extraction by Bootstrapping
Junji Tomita | Stephen Soderland | Oren Etzioni
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)

2005

pdf bib
Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
KnowItNow: Fast, Scalable Information Extraction from the Web
Michael J. Cafarella | Doug Downey | Stephen Soderland | Oren Etzioni
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
OPINE: Extracting Product Features and Opinions from Reviews
Ana-Maria Popescu | Bao Nguyen | Oren Etzioni
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
Ana-Maria Popescu | Alex Armanasu | Oren Etzioni | David Ko | Alexander Yates
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Search
Co-authors