Alon Halevy


pdf bib
Database reasoning over text
James Thorne | Majid Yazdani | Marzieh Saeidi | Fabrizio Silvestri | Sebastian Riedel | Alon Halevy
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as “List/Count all female athletes who were born in 20th century”, which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer models perform very well for small databases, they exhibit limitations in processing noisy data, numerical operations, and queries that aggregate facts. We propose a modular architecture to answer these database-style queries over multiple spans from text and aggregating these at scale. We evaluate the architecture using WikiNLDB, a novel dataset for exploring such queries. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded. In direct comparison on small databases, our approach increases overall answer accuracy from 85% to 90%. On larger databases, our approach retains its accuracy whereas transformer baselines could not encode the context.


pdf bib
Open Information Extraction from Question-Answer Pairs
Nikita Bhutani | Yoshihiko Suhara | Wang-Chiew Tan | Alon Halevy | H. V. Jagadish
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Open Information Extraction (OpenIE) extracts meaningful structured tuples from free-form text. Most previous work on OpenIE considers extracting data from one sentence at a time. We describe NeurON, a system for extracting tuples from question-answer pairs. One of the main motivations for NeurON is to be able to extend knowledge bases in a way that considers precisely the information that users care about. NeurON addresses several challenges. First, an answer text is often hard to understand without knowing the question, and second, relevant information can span multiple sentences. To address these, NeurON formulates extraction as a multi-source sequence-to-sequence learning task, wherein it combines distributed representations of a question and an answer to generate knowledge facts. We describe experiments on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.


pdf bib
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Akari Asai | Sara Evensen | Behzad Golshan | Alon Halevy | Vivian Li | Andrei Lopatenko | Daniela Stepanov | Yoshihiko Suhara | Wang-Chiew Tan | Yinzhan Xu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
FrameIt: Ontology Discovery for Noisy User-Generated Text
Dan Iter | Alon Halevy | Wang-Chiew Tan
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

A common need of NLP applications is to extract structured data from text corpora in order to perform analytics or trigger an appropriate action. The ontology defining the structure is typically application dependent and in many cases it is not known a priori. We describe the FrameIt System that provides a workflow for (1) quickly discovering an ontology to model a text corpus and (2) learning an SRL model that extracts the instances of the ontology from sentences in the corpus. FrameIt exploits data that is obtained in the ontology discovery phase as weak supervision data to bootstrap the SRL model and then enables the user to refine the model with active learning. We present empirical results and qualitative analysis of the performance of FrameIt on three corpora of noisy user-generated text.


pdf bib
ReNoun: Fact Extraction for Nominal Attributes
Mohamed Yahya | Steven Whang | Rahul Gupta | Alon Halevy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)