Andrew Hickl


2010

2008

This paper describes the creation of a state-of-the-art answer type detection system capable of recognizing more than 200 different expected answer types with greater than 85% precision and recall. After describing how we constructed a new, multi-tiered answer type hierarchy from the set of entity types recognized by Language Computer Corporation’s CICEROLITE named entity recognition system, we describe how we used this hierarchy to annotate a new corpus of more than 10,000 English factoid questions. We show how an answer type detection system trained on this corpus can be used to enhance the accuracy of a state-of-the-art question-answering system (Hickl et al., 2007; Hickl et al., 2006b) by more than 7% overall.
This paper explores how a battery of unsupervised techniques can be used in order to create large, high-quality corpora for textual inference applications, such as systems for recognizing textual entailment (TE) and textual contradiction (TC). We show that it is possible to automatically generate sets of positive and negative instances of textual entailment and contradiction from textual corpora with greater than 90% precision. We describe how we generated more than 1 million TE pairs - and a corresponding set of and 500,000 TC pairs - from the documents found in the 2 GB AQUAINT-2 newswire corpus.

2007

2006

This paper describes the development of CiceroArabic, the first wide coverage named entity recognition (NER) system for Modern Standard Arabic. Capable of classifying 18 different named entity classes with over 85% F, CiceroArabic utilizes a new 800,000-word annotated Arabic newswire corpus in order to achieve high performance without the need for hand-crafted rules or morphological information. In addition to describing results from our system, we show that accurate named entity annotation for a large number of semantic classes is feasible, even for very large corpora, and we discuss new techniques designed to boost agreement and consistency among annotators over a long-term annotation effort.
Generating answers to complex questions in the form of multi-document summaries requires access to question decomposition methods. In this paper we present three methods for decomposing complex questions and we evaluate their impact on the responsiveness of the answers they enable.

2005

2004