Tae Yano


2012

pdf bib
Textual Predictors of Bill Survival in Congressional Committees
Tae Yano | Noah A. Smith | John D. Wilkerson
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Structured Databases of Named Entities from Bayesian Nonparametrics
Jacob Eisenstein | Tae Yano | William Cohen | Noah Smith | Eric Xing
Proceedings of the First workshop on Unsupervised Learning in NLP

2010

pdf bib
Shedding (a Thousand Points of) Light on Biased Language
Tae Yano | Philip Resnik | Noah A. Smith
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2009

pdf bib
Predicting Response to Political Blog Posts with Topic Models
Tae Yano | William W. Cohen | Noah A. Smith
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain
Rebecca Passonneau | Tom Lippincott | Tae Yano | Judith Klavans
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We discuss factors that affect human agreement on a semantic labeling task in the art history domain, based on the results of four experiments where we varied the number of labels annotators could assign, the number of annotators, the type and amount of training they received, and the size of the text span being labeled. Using the labelings from one experiment involving seven annotators, we investigate the relation between interannotator agreement and machine learning performance. We construct binary classifiers and vary the training and test data by swapping the labelings from the seven annotators. First, we find performance is often quite good despite lower than recommended interannotator agreement. Second, we find that on average, learning performance for a given functional semantic category correlates with the overall agreement among the seven annotators for that category. Third, we find that learning performance on the data from a given annotator does not correlate with the quality of that annotator’s labeling. We offer recommendations for the use of labeled data in machine learning, and argue that learners should attempt to accommodate human variation. We also note implications for large scale corpus annotation projects that deal with similarly subjective phenomena.