Juliette Kahn


2018

pdf bib
Matics Software Suite: New Tools for Evaluation and Data Exploration
Olivier Galibert | Guillaume Bernard | Agnes Delaborde | Sabrina Lecadre | Juliette Kahn
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
FABIOLE, a Speech Database for Forensic Speaker Comparison
Moez Ajili | Jean-François Bonastre | Juliette Kahn | Solange Rossato | Guillaume Bernard
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

A speech database has been collected for use to highlight the importance of “speaker factor” in forensic voice comparison. FABIOLE has been created during the FABIOLE project funded by the French Research Agency (ANR) from 2013 to 2016. This corpus consists in more than 3 thousands excerpts spoken by 130 French native male speakers. The speakers are divided into two categories: 30 target speakers who everyone has 100 excerpts and 100 “impostors” who everyone has only one excerpt. The data were collected from 10 different French radio and television shows where each utterance turns with a minimum duration of 30s and has a good speech quality. The data set is mainly used for investigating speaker factor in forensic voice comparison and interpreting some unsolved issue such as the relationship between speaker characteristics and system behavior. In this paper, we present FABIOLE database. Then, preliminary experiments are performed to evaluate the effect of the “speaker factor” and the show on a voice comparison system behavior.

pdf bib
Generating Task-Pertinent sorted Error Lists for Speech Recognition
Olivier Galibert | Mohamed Ameur Ben Jannet | Juliette Kahn | Sophie Rosset
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Automatic Speech recognition (ASR) is one of the most widely used components in spoken language processing applications. ASR errors are of varying importance with respect to the application, making error analysis keys to improving speech processing applications. Knowing the most serious errors for the applicative case is critical to build better systems. In the context of Automatic Speech Recognition (ASR) used as a first step towards Named Entity Recognition (NER) in speech, error seriousness is usually determined by their frequency, due to the use of the WER as metric to evaluate the ASR output, despite the emergence of more relevant measures in the literature. We propose to use a different evaluation metric form the literature in order to classify ASR errors according to their seriousness for NER. Our results show that the ASR errors importance is ranked differently depending on the used evaluation metric. A more detailed analysis shows that the estimation of the error impact given by the ATENE metric is more adapted to the NER task than the estimation based only on the most used frequency metric WER.

pdf bib
Comparaison de listes d’erreurs de transcription automatique de la parole : quelle complémentarité entre les différentes métriques ? (Comparing error lists for ASR systems : contribution of different metrics)
Olivier Galibert | Juliette Kahn | Sophie Rosset
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Le travail que nous présentons ici s’inscrit dans le domaine de l’évaluation des systèmes de reconnaissance automatique de la parole en vue de leur utilisation dans une tâche aval, ici la reconnaissance des entités nommées. Plus largement, la question que nous nous posons est “que peut apporter une métrique d’évaluation en dehors d’un score ?". Nous nous intéressons particulièrement aux erreurs des systèmes et à leur analyse et éventuellement à l’utilisation de ce que nous connaissons de ces erreurs. Nous étudions dans ce travail les listes ordonnées d’erreurs générées à partir de différentes métriques et analysons ce qui en ressort. Nous avons appliqué la même méthode sur les sorties de différents systèmes de reconnaissance de la parole. Nos expériences mettent en évidence que certaines métriques apportent une information plus pertinente étant donné une tâche et transverse à différents systèmes.

pdf bib
LNE-Visu : a tool to explore and visualize multimedia data
Guillaume Bernard | Juliette Kahn | Olivier Galibert | Rémi Regnier | Séverine Demeyer
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

LNE-Visu : a tool to explore and visualize multimedia data LNE-Visu is a tool to explore and visualize multimedia data created for the LNE evaluation campaigns. 3 functionalities are available: explore and select data, visualize and listen data, apply significance tests

2014

pdf bib
Human annotation of ASR error regions: Is “gravity” a sharable concept for human annotators?
Daniel Luzzati | Cyril Grouin | Ioana Vasilescu | Martine Adda-Decker | Eric Bilinski | Nathalie Camelin | Juliette Kahn | Carole Lailler | Lori Lamel | Sophie Rosset
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the “seriousness” of an ASR error using their own scientific background. Eight human annotators were involved in an annotation task on three distinct corpora, one of the corpora being annotated twice, hiding this annotation in duplicate to the annotators. None of the computed results (inter-annotator agreement, edit distance, majority annotation) allow any strong correlation between the considered criteria and the level of seriousness to be shown, which underlines the difficulty for a human to determine whether a ASR error is serious or not.

pdf bib
ETER : a new metric for the evaluation of hierarchical named entity recognition
Mohamed Ben Jannet | Martine Adda-Decker | Olivier Galibert | Juliette Kahn | Sophie Rosset
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper addresses the question of hierarchical named entity evaluation. In particular, we focus on metrics to deal with complex named entity structures as those introduced within the QUAERO project. The intended goal is to propose a smart way of evaluating partially correctly detected complex entities, beyond the scope of traditional metrics. None of the existing metrics are fully adequate to evaluate the proposed QUAERO task involving entity detection, classification and decomposition. We are discussing the strong and weak points of the existing metrics. We then introduce a new metric, the Entity Tree Error Rate (ETER), to evaluate hierarchical and structured named entity detection, classification and decomposition. The ETER metric builds upon the commonly accepted SER metric, but it takes the complex entity structure into account by measuring errors not only at the slot (or complex entity) level but also at a basic (atomic) entity level. We are comparing our new metric to the standard one using first some examples and then a set of real data selected from the ETAPE evaluation results.

2012

pdf bib
Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics
Yann Mathet | Antoine Widlöcher | Karën Fort | Claire François | Olivier Galibert | Cyril Grouin | Juliette Kahn | Sophie Rosset | Pierre Zweigenbaum
Proceedings of COLING 2012: Posters

pdf bib
Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers
Sophie Rosset | Cyril Grouin | Karën Fort | Olivier Galibert | Juliette Kahn | Pierre Zweigenbaum
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
The REPERE Corpus : a multimodal corpus for person recognition
Aude Giraudel | Matthieu Carré | Valérie Mapelli | Juliette Kahn | Olivier Galibert | Ludovic Quintard
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The REPERE Challenge aims to support research on people recognition in multimodal conditions. To assess the technology progression, annual evaluation campaigns will be organized from 2012 to 2014. In this context, the REPERE corpus, a French videos corpus with multimodal annotation, has been developed. This paper presents datasets collected for the dry run test that took place at the beginning of 2012. Specific annotation tools and guidelines are mainly described. At the time being, 6 hours of data have been collected and annotated. Last section presents analyses of annotation distribution and interaction between modalities in the corpus.

pdf bib
Vérification du locuteur : variations de performance (Speaker verification : results variation) [in French]
Juliette Kahn | Nicolas Scheffer | Solange Rossato | Jean-François Bonastre
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf bib
REPERE : premiers résultats d’un défi autour de la reconnaissance multimodale des personnes (REPERE : preliminary results of a multimodal person recognition challenge) [in French]
Juliette Kahn | Aude Giraudel | Matthieu Carré | Olivier Galibert | Ludovic Quintard
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP