MEETING: A corpus of French meeting-style conversations

Julie Hunter, Hiroyoshi Yamasaki, Océane Granier, Jérôme Louradour, Roxane Bertrand, Kate Thompson, Laurent Prévot


Abstract
We present the MEETING corpus, a dataset of roughly 95 hours of spontaneous meeting-style conversations in French. The corpus is designed to serve as a foundation for downstream tasks such as meeting summarization. In its current state, it offers 25 hours of manually corrected transcripts that are aligned with the audio signal, making it a valuable resource for evaluating ASR and speaker recognition systems. It also includes automatic transcripts and alignments of the whole corpus which can be used for downstream NLP tasks. The aim of this paper is to describe the conception, production and annotation of the corpus up to the transcription level as well as to provide statistics that shed light on the main linguistic features of the corpus.
Anthology ID:
2024.jeptalnrecital-taln.35
Volume:
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position
Month:
7
Year:
2024
Address:
Toulouse, France
Editors:
Mathieu Balaguer, Nihed Bendahman, Lydia-Mai Ho-dac, Julie Mauclair, Jose G Moreno, Julien Pinquier
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA and AFPC
Note:
Pages:
508–529
Language:
URL:
https://aclanthology.org/2024.jeptalnrecital-taln.35
DOI:
Bibkey:
Cite (ACL):
Julie Hunter, Hiroyoshi Yamasaki, Océane Granier, Jérôme Louradour, Roxane Bertrand, Kate Thompson, and Laurent Prévot. 2024. MEETING: A corpus of French meeting-style conversations. In Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position, pages 508–529, Toulouse, France. ATALA and AFPC.
Cite (Informal):
MEETING: A corpus of French meeting-style conversations (Hunter et al., JEP/TALN/RECITAL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.jeptalnrecital-taln.35.pdf