Marie Tahon


2022

pdf bib
Overlaps and Gender Analysis in the Context of Broadcast Media
Martin Lebourdais | Marie Tahon | Antoine Laurent | Sylvain Meignier | Anthony Larcher
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Our main goal is to study the interactions between speakers according to their gender and role in broadcast media. In this paper, we propose an extensive study of gender and overlap annotations in various speech corpora mainly dedicated to diarisation or transcription tasks. We point out the issue of the heterogeneity of the annotation guidelines for both overlapping speech and gender categories. On top of that, we analyse how the speech content (casual speech, meetings, debate, interviews, etc.) impacts the distribution of overlapping speech segments. On a small dataset of 93 recordings from LCP French channel, we intend to characterise the interactions between speakers according to their gender. Finally, we propose a method which aims to highlight active speech areas in terms of interactions between speakers. Such a visualisation tool could improve the efficiency of qualitative studies conducted by researchers in human sciences.

pdf bib
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification.
Rémi Uro | David Doukhan | Albert Rilliard | Laetitia Larcher | Anissa-Claire Adgharouamane | Marie Tahon | Antoine Laurent
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker’s age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For each speaker, speech excerpts were extracted from audiovisual documents using an automatic pipeline consisting of speech detection, background music and overlapped speech removal and speaker diarization, used to present clean speaker segments to human annotators identifying target speakers. This pipeline proved highly effective, cutting down manual processing by a factor of ten. Evaluation of the quality of the automatic processing and of the final output is provided. It shows the automatic processing compare to up-to-date process, and that the output provides high quality speech for most of the selected excerpts. This method is thus recommendable for creating large corpora of known target speakers.

2020

pdf bib
Prédiction continue de la satisfaction et de la frustration dans des conversations de centre d’appels (AlloSat : A New Call Center French Corpus for Affect Analysis)
Manon Macary | Marie Tahon | Yannick Estève | Anthony Rousseau
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

Nous présentons un nouveau corpus, nommé AlloSat, composé de conversations en français extraites de centre d’appels, annotées de façon continue en frustration et satisfaction. Dans le contexte des centres d’appels, une conversation vise généralement à résoudre la demande de l’appelant. Ce corpus a été mis en place afin de développer de nouveaux systèmes capables de modéliser l’aspect continu de l’information sémantique et para-linguistique au niveau conversationnel. Nous nous concentrons sur le niveau para-linguistique, plus précisément sur l’expression des émotions. À notre connaissance, la plupart des corpus émotionnels contiennent des annotations en catégories discrètes ou dans des dimensions continues telles que l’activation ou la valence. Nous supposons que ces dimensions ne sont pas suffisamment liées à notre contexte. Pour résoudre ce problème, nous proposons un corpus permettant une connaissance en temps réel de l’axe frustration/satisfaction. AlloSat regroupe 303 conversations pour un total d’environ 37 heures d’audio, toutes enregistrées dans des environnements réels, collectées par Allo-Media (une société spécialisée dans l’analyse automatique d’appels). Les premières expériences de classification montrent que l’évolution de l’axe frustration/satisfaction peut être prédite automatiquement par conversation.

pdf bib
Towards Interactive Annotation for Hesitation in Conversational Speech
Jane Wottawa | Marie Tahon | Apolline Marin | Nicolas Audibert
Proceedings of the Twelfth Language Resources and Evaluation Conference

Manual annotation of speech corpora is expensive in both human resources and time. Furthermore, recognizing affects in spontaneous, non acted speech presents a challenge for humans and machines. The aim of the present study is to automatize the labeling of hesitant speech as a marker of expressed uncertainty. That is why, the NCCFr-corpus was manually annotated for ‘degree of hesitation’ on a continuous scale between -3 and 3 and the affective dimensions ‘activation, valence and control’. In total, 5834 chunks of the NCCFr-corpus were manually annotated. Acoustic analyses were carried out based on these annotations. Furthermore, regression models were trained in order to allow automatic prediction of hesitation for speech chunks that do not have a manual annotation. Preliminary results show that the number of filled pauses as well as vowel duration increase with the degree of hesitation, and that automatic prediction of the hesitation degree reaches encouraging RMSE results of 1.6.

pdf bib
AlloSat: A New Call Center French Corpus for Satisfaction and Frustration Analysis
Manon Macary | Marie Tahon | Yannick Estève | Anthony Rousseau
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a new corpus, named AlloSat, composed of real-life call center conversations in French that is continuously annotated in frustration and satisfaction. This corpus has been set up to develop new systems able to model the continuous aspect of semantic and paralinguistic information at the conversation level. The present work focuses on the paralinguistic level, more precisely on the expression of emotions. In the call center industry, the conversation usually aims at solving the caller’s request. As far as we know, most emotional databases contain static annotations in discrete categories or in dimensions such as activation or valence. We hypothesize that these dimensions are not task-related enough. Moreover, static annotations do not enable to explore the temporal evolution of emotional states. To solve this issue, we propose a corpus with a rich annotation scheme enabling a real-time investigation of the axis frustration / satisfaction. AlloSat regroups 303 conversations with a total of approximately 37 hours of audio, all recorded in real-life environments collected by Allo-Media (an intelligent call tracking company). First regression experiments, with audio features, show that the evolution of frustration / satisfaction axis can be retrieved automatically at the conversation level.

2018

pdf bib
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.
Aghilas Sini | Damien Lolive | Gaëlle Vidal | Marie Tahon | Élisabeth Delais-Roussarie
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2012

pdf bib
Corpus of Children Voices for Mid-level Markers and Affect Bursts Analysis
Marie Tahon | Agnes Delaborde | Laurence Devillers
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article presents a corpus featuring children playing games in interaction with the humanoid robot Nao: children have to express emotions in the course of a storytelling by the robot. This corpus was collected to design an affective interactive system driven by an interactional and emotional representation of the user. We evaluate here some mid-level markers used in our system: reaction time, speech duration and intensity level. We also question the presence of affect bursts, which are quite numerous in our corpus, probably because of the young age of the children and the absence of predefined lexical content.