2010
pdf
bib
A Probabilistic Morphological Analyzer for Syriac
Peter McClanahan
|
George Busby
|
Robbie Haertel
|
Kristian Heal
|
Deryle Lonsdale
|
Kevin Seppi
|
Eric Ringger
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Automatic Diacritization for Low-Resource Languages Using a Hybrid Word and Consonant CMM
Robbie Haertel
|
Peter McClanahan
|
Eric K. Ringger
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
pdf
bib
abs
Tag Dictionaries Accelerate Manual Annotation
Marc Carmen
|
Paul Felt
|
Robbie Haertel
|
Deryle Lonsdale
|
Peter McClanahan
|
Owen Merkling
|
Eric Ringger
|
Kevin Seppi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Expert human input can contribute in various ways to facilitate automatic annotation of natural language text. For example, a part-of-speech tagger can be trained on labeled input provided offline by experts. In addition, expert input can be solicited by way of active learning to make the most of annotator expertise. However, hiring individuals to perform manual annotation is costly both in terms of money and time. This paper reports on a user study that was performed to determine the degree of effect that a part-of-speech dictionary has on a group of subjects performing the annotation task. The user study was conducted using a modular, web-based interface created specifically for text annotation tasks. The user study found that for both native and non-native English speakers a dictionary with greater than 60% coverage was effective at reducing annotation time and increasing annotator accuracy. On the basis of this study, we predict that using a part-of-speech tag dictionary with coverage greater than 60% can reduce the cost of annotation in terms of both time and money.
2008
pdf
bib
Assessing the Costs of Sampling Methods in Active Learning for Annotation
Robbie Haertel
|
Eric Ringger
|
Kevin Seppi
|
James Carroll
|
Peter McClanahan
Proceedings of ACL-08: HLT, Short Papers
pdf
bib
abs
Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study
Eric Ringger
|
Marc Carmen
|
Robbie Haertel
|
Kevin Seppi
|
Deryle Lonsdale
|
Peter McClanahan
|
James Carroll
|
Noel Ellison
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Fixed, limited budgets often constrain the amount of expert annotation that can go into the construction of annotated corpora. Estimating the cost of annotation is the first step toward using annotation resources wisely. We present here a study of the cost of annotation. This study includes the participation of annotators at various skill levels and with varying backgrounds. Conducted over the web, the study consists of tests that simulate machine-assisted pre-annotation, requiring correction by the annotator rather than annotation from scratch. The study also includes tests representative of an annotation scenario involving Active Learning as it progresses from a naïve model to a knowledgeable model; in particular, annotators encounter pre-annotation of varying degrees of accuracy. The annotation interface lists tags considered likely by the annotation model in preference to other tags. We present the experimental parameters of the study and report both descriptive and inferential statistics on the results of the study. We conclude with a model for estimating the hourly cost of annotation for annotators of various skill levels. We also present models for two granularities of annotation: sentence at a time and word at a time.
2007
pdf
bib
Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation
Eric Ringger
|
Peter McClanahan
|
Robbie Haertel
|
George Busby
|
Marc Carmen
|
James Carroll
|
Kevin Seppi
|
Deryle Lonsdale
Proceedings of the Linguistic Annotation Workshop