2018
pdf
bib
abs
Comparing Bayesian Models of Annotation
Silviu Paun
|
Bob Carpenter
|
Jon Chamberlain
|
Dirk Hovy
|
Udo Kruschwitz
|
Massimo Poesio
Transactions of the Association for Computational Linguistics, Volume 6
The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.
2014
pdf
bib
abs
The Benefits of a Model of Annotation
Rebecca J. Passonneau
|
Bob Carpenter
Transactions of the Association for Computational Linguistics, Volume 2
Standard agreement measures for interannotator reliability are neither necessary nor sufficient to ensure a high quality corpus. In a case study of word sense annotation, conventional methods for evaluating labels from trained annotators are contrasted with a probabilistic annotation model applied to crowdsourced data. The annotation model provides far more information, including a certainty measure for each gold standard label; the crowdsourced data was collected at less than half the cost of the conventional approach.
2013
pdf
bib
The Benefits of a Model of Annotation
Rebecca J. Passonneau
|
Bob Carpenter
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
2008
pdf
bib
Software Engineering, Testing, and Quality Assurance for Natural Language Processing
K. Bretonnel Cohen
|
Bob Carpenter
Software Engineering, Testing, and Quality Assurance for Natural Language Processing
2007
pdf
bib
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)
Bob Carpenter
|
Amanda Stent
|
Jason D. Williams
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)
2006
pdf
bib
Character Language Models for Chinese Word Segmentation and Named Entity Recognition
Bob Carpenter
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing
2005
pdf
bib
Scaling High-Order Character Language Models to Gigabytes
Bob Carpenter
Proceedings of Workshop on Software
pdf
bib
Switch Graphs for Parsing Type Logical Grammars
Bob Carpenter
|
Glyn Morrill
Proceedings of the Ninth International Workshop on Parsing Technology
2004
pdf
bib
Head-Driven Parsing for Word Lattices
Christopher Collins
|
Bob Carpenter
|
Gerald Penn
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)
2003
pdf
bib
Alias-i Threat Trackers
Breck Baldwin
|
Bob Carpenter
|
Aaron Ross
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations
1999
pdf
bib
Vector-based Natural Language Call Routing
Jennifer Chu-Carroll
|
Bob Carpenter
Computational Linguistics, Volume 25, Number 3, September 1999
1998
pdf
bib
Dialogue Management in Vector-Based Call Routing
Jennifer Chu-Carroll
|
Bob Carpenter
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics
pdf
bib
Dialogue Management in Vector-Based Call Routing
Jennifer Chu-Carroll
|
Bob Carpenter
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1
1997
pdf
bib
abs
Probabilistic Parsing using Left Corner Language Models
Christopher D. Manning
|
Bob Carpenter
Proceedings of the Fifth International Workshop on Parsing Technologies
We introduce a novel parser based on a probabilistic version of a left-corner parser. The left-corner strategy is attractive because rule probabilities can be conditioned on both top-down goals and bottom-up derivations. We develop the underlying theory and explain how a grammar can be induced from analyzed data. We show that the left-corner approach provides an advantage over simple top-down probabilistic context-free grammars in parsing the Wall Street Journal using a grammar induced from the Penn Treebank. We also conclude that the Penn Treebank provides a fairly weak tes bed due to the flatness of its bracketings and to the obvious overgeneration and undergeneration of its induced grammar.
1995
pdf
bib
abs
An Abstract Machine for Attribute-Value Logics
Bob Carpenter
|
Yan Qu
Proceedings of the Fourth International Workshop on Parsing Technologies
A direct abstract machine implementation of the core attribute-value logic operations is shown to decrease the number of operations and conserve the amount of storage required when compared to interpreters or indirect compilers. In this paper, we describe the fundamental data structures and compilation techniques that we have employed to develop a unification and constraint-resolution engine capable of performance rivaling that of directly compiled Prolog terms while greatly exceeding Prolog in flexibility, expressiveness and modularity. In this paper, we will discuss the core architecture of our machine. We begin with a survey of the data structures supporting the small set of attribute-value logic instructions. These instructions manipulate feature structures by means of features, equality, and typing, and manipulate the program state by search and sequencing operations. We further show how these core operations can be integrated with a broad range of standard parsing techniques. Feature structures improve upon Prolog terms by allowing data to be organized by feature rather than by position. This encourages modular program development through the use of sparse structural descriptions which can be logically conjoined into larger units and directly executed. Standard linguistic representations, even of relatively simple local syntactic and semantic structures, typically run to hundreds of substructures. The type discipline we impose organizes information in an object-oriented manner by the multiple inheritance of classes and their associated features and type value constraints. In practice, this allows the construction of large-scale grammars in a relatively short period of time. At run-time, eager copying and structure-sharing is replaced with lazy, incremental, and localized branch and write operations. In order to allow for applications with parallel search, incremental backtracking can be localized to disjunctive choice points within the description of a single structure, thus supporting the kind of conditional mutual consistency checks used in modern grammatical theories such as HPSG, GB, and LFG. Further attention is paid to the byte-coding of instructions and their efficient indexing and subsequent retrieval, all of which is keyed on type information.
pdf
bib
Computational phonology: A constraint-based approach
Deirdre Wheeler
|
Bob Carpenter
Computational Linguistics, Volume 21, Number 4, December 1995
1994
pdf
bib
Constraint-based Morpho-phonology
Michael Mastroianni
|
Bob Carpenter
Computational Phonology
1993
pdf
bib
abs
Compiling Typed Attribute-Value Logic Grammars
Bob Carpenter
Proceedings of the Third International Workshop on Parsing Technologies
The unification-based approach to processing attribute-value logic grammars, similar to Prolog interpretation, has become the standard. We propose an alternative, embodied in the Attribute-Logic Engine (ALE) (Carpenter 1993) , based on the Warren Abstract Machine (WAM) approach to compiling Prolog (Aït-Kaci 1991). Phrase structure grammars with procedural attachments, similar to Definite Clause Grammars (DCG) (Pereira — Warren 1980), are specified using a typed version of Rounds-Kasper logic (Carpenter 1992). We argue for the benefits of a strong and total version of typing in terms of both clarity and efficiency. Finally, we discuss the compilation of grammars into a few efficient low-level instructions for the basic feature structure operations.
1991
pdf
bib
The Generative Power of Categorial Grammars and Head-Driven Phrase Structure Grammars with Lexical Rules
Bob Carpenter
Computational Linguistics, Volume 17, Number 3, September 1991
pdf
bib
Inclusion, Disjointness and Choice: The Logic of Linguistic Classification
Bob Carpenter
|
Carl Pollard
29th Annual Meeting of the Association for Computational Linguistics