Validity, Agreement, Consensuality and Annotated Data Quality
Anaëlle Baledent | Yann Mathet | Antoine Widlöcher | Christophe Couronne | Jean-Luc Manguin
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Reference annotated (or gold-standard) datasets are required for various common tasks such as training for machine learning systems or system validation. They are necessary to analyse or compare occurrences or items annotated by experts, or to compare objects resulting from any computational process to objects annotated by experts. But, even if reference annotated gold-standard corpora are required, their production is known as a difficult problem, from both a theoretical and practical point of view. Many studies devoted to theses issues conclude that multi-annotation is most of the time a necessity. That inter-annotator agreement measure, which is required to check the reliability of data and the reproducibility of an annotation task, and thus to establish a gold standard, is another thorny problem. Fine analysis of available metrics for this specific task then becomes essential. Our work is part of this effort and more precisely focuses on several problems, which are rarely discussed, although they are intrinsically linked with the interpretation of metrics. In particular, we focus here on the complex relations between agreement and reference (of which agreement among annotators is supposed to be an indicator), and the emergence of consensus. We also introduce the notion of consensuality as another relevant indicator.