2022
pdf
bib
abs
A Methodology for the Comparison of Human Judgments With Metrics for Coreference Resolution
Mariya Borovikova
|
Loïc Grobol
|
Anaïs Halftermeyer
|
Sylvie Billot
Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)
We propose a method for investigating the interpretability of metrics used for the coreference resolution task through comparisons with human judgments. We provide a corpus with annotations of different error types and human evaluations of their gravity. Our preliminary analysis shows that metrics considerably overlook several error types and overlook errors in general in comparison to humans. This study is conducted on French texts, but the methodology is language-independent.
2020
pdf
bib
abs
ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees
Ilaine Wang
|
Aurore Pelletier
|
Jean-Yves Antoine
|
Anaïs Halftermeyer
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper describes ODIL Syntax, a French treebank built on spontaneous speech transcripts. The syntactic structure of every speech turn is represented by constituent trees, through a procedure which combines an automatic annotation provided by a parser (here, the Stanford Parser) and a manual revision. ODIL Syntax respects the annotation scheme designed for the French TreeBank (FTB), with the addition of some annotation guidelines that aims at representing specific features of the spoken language such as speech disfluencies. The corpus will be freely distributed by January 2020 under a Creative Commons licence. It will ground a further semantic enrichment dedicated to the representation of temporal entities and temporal relations, as a second phase of the ODIL@Temporal project. The paper details the annotation scheme we followed with a emphasis on the representation of speech disfluencies. We then present the annotation procedure that was carried out on the Contemplata annotation platform. In the last section, we provide some distributional characteristics of the annotated corpus (POS distribution, multiword expressions).
pdf
bib
abs
Contemplata, a Free Platform for Constituency Treebank Annotation
Jakub Waszczuk
|
Ilaine Wang
|
Jean-Yves Antoine
|
Anaïs Halftermeyer
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper describes Contemplata, an annotation platform that offers a generic solution for treebank building as well as treebank enrichment with relations between syntactic nodes. Contemplata is dedicated to the annotation of constituency trees. The framework includes support for syntactic parsers, which provide automatic annotations to be manually revised. The balanced strategy of annotation between automatic parsing and manual revision allows to reduce the annotator workload, which favours data reliability. The paper presents the software architecture of Contemplata, describes its practical use and eventually gives two examples of annotation projects that were conducted on the platform.
2019
pdf
bib
Redonner du sens à l’accord interannotateur : vers une interprétation des mesures d’accord en termes de reproductibilité de l’annotation [Interpreting inter-annotator agreement measures : towards an interpretation in terms of annotation reproducibility]
Dany Bregeon
|
Jean-Yves Antoine
|
Jeanne Villaneau
|
Anaïs Halftermeyer
Traitement Automatique des Langues, Volume 60, Numéro 2 : Corpus annotés [Annotated corpora]