Salima Mdhaffar


pdf bib
ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks
Antoine Laurent | Souhir Gahbiche | Ha Nguyen | Haroun Elleuch | Fethi Bougares | Antoine Thiol | Hugo Riguidel | Salima Mdhaffar | Gaëlle Laperrière | Lucas Maison | Sameer Khurana | Yannick Estève
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes the ON-TRAC consortium speech translation systems developed for IWSLT 2023 evaluation campaign. Overall, we participated in three speech translation tracks featured in the low-resource and dialect speech translation shared tasks, namely; i) spoken Tamasheq to written French, ii) spoken Pashto to written French, and iii) spoken Tunisian to written English. All our primary submissions are based on the end-to-end speech-to-text neural architecture using a pretrained SAMU-XLSR model as a speech encoder and a mbart model as a decoder. The SAMU-XLSR model is built from the XLS-R 128 in order to generate language agnostic sentence-level embeddings. This building is driven by the LaBSE model trained on multilingual text dataset. This architecture allows us to improve the input speech representations and achieve significant improvements compared to conventional end-to-end speech translation systems.


pdf bib
The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools
Gaëlle Laperrière | Valentin Pelloin | Antoine Caubrière | Salima Mdhaffar | Nathalie Camelin | Sahar Ghannay | Bassam Jabaian | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the emergence of neural end-to-end approaches for spoken language understanding (SLU), a growing number of studies have been presented during these last three years on this topic. The major part of these works addresses the spoken language understanding domain through a simple task like speech intent detection. In this context, new benchmark datasets have also been produced and shared with the community related to this task. In this paper, we focus on the French MEDIA SLU dataset, distributed since 2005 and used as a benchmark dataset for a large number of research works. This dataset has been shown as being the most challenging one among those accessible to the research community. Distributed by ELRA, this corpus is free for academic research since 2019. Unfortunately, the MEDIA dataset is not really used beyond the French research community. To facilitate its use, a complete recipe, including data preparation, training and evaluation scripts, has been built and integrated to SpeechBrain, an already popular open-source and all-in-one conversational AI toolkit based on PyTorch. This recipe is presented in this paper. In addition, based on the feedback of some researchers who have worked on this dataset for several years, some corrections have been brought to the initial manual annotation: the new version of the data will also be integrated into the ELRA catalogue, as the original one. More, a significant amount of data collected during the construction of the MEDIA corpus in the 2000s was never used until now: we present the first results reached on this subset — also included in the MEDIA SpeechBrain recipe — , that will be used for now as the MEDIA test2. Last, we discuss evaluation issues.

pdf bib
Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding
Salima Mdhaffar | Valentin Pelloin | Antoine Caubrière | Gaëlle Laperriere | Sahar Ghannay | Bassam Jabaian | Nathalie Camelin | Yannick Estève
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Pretrained models through self-supervised learning have been recently introduced for both acoustic and language modeling. Applied to spoken language understanding tasks, these models have shown their great potential by improving the state-of-the-art performances on challenging benchmark datasets. In this paper, we present an error analysis reached by the use of such models on the French MEDIA benchmark dataset, known as being one of the most challenging benchmarks for the slot filling task among all the benchmarks accessible to the entire research community. One year ago, the state-of-art system reached a Concept Error Rate (CER) of 13.6% through the use of a end-to-end neural architecture. Some months later, a cascade approach based on the sequential use of a fine-tuned wav2vec2.0 model and a fine-tuned BERT model reaches a CER of 11.2%. This significant improvement raises questions about the type of errors that remain difficult to treat, but also about those that have been corrected using these models pre-trained through self-supervision learning on a large amount of data. This study brings some answers in order to better understand the limits of such models and open new perspectives to continue improving the performance.


pdf bib
ON-TRAC’ systems for the IWSLT 2021 low-resource speech translation and multilingual speech translation shared tasks
Hang Le | Florentin Barbier | Ha Nguyen | Natalia Tomashenko | Salima Mdhaffar | Souhir Gabiche Gahbiche | Benjamin Lecouteux | Didier Schwab | Yannick Estève
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2021, low-resource speech translation and multilingual speech translation. The ON-TRAC Consortium is composed of researchers from three French academic laboratories and an industrial partner: LIA (Avignon Université), LIG (Université Grenoble Alpes), LIUM (Le Mans Université), and researchers from Airbus. A pipeline approach was explored for the low-resource speech translation task, using a hybrid HMM/TDNN automatic speech recognition system fed by wav2vec features, coupled to an NMT system. For the multilingual speech translation task, we investigated the us of a dual-decoder Transformer that jointly transcribes and translates an input speech. This model was trained in order to translate from multiple source languages to multiple target ones.


pdf bib
A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study
Salima Mdhaffar | Yannick Estève | Antoine Laurent | Nicolas Hernandez | Richard Dufour | Delphine Charlet | Geraldine Damnati | Solen Quiniou | Nathalie Camelin
Proceedings of the Twelfth Language Resources and Evaluation Conference

This corpus is part of the PASTEL (Performing Automated Speech Transcription for Enhancing Learning) project aiming to explore the potential of synchronous speech transcription and application in specific teaching situations. It includes 10 hours of different lectures, manually transcribed and segmented. The main interest of this corpus lies in its multimodal aspect: in addition to speech, the courses were filmed and the written presentation supports (slides) are made available. The dataset may then serve researches in multiple fields, from speech and language to image and video processing. The dataset will be freely available to the research community. In this paper, we first describe in details the annotation protocol, including a detailed analysis of the manually labeled data. Then, we propose some possible use cases of the corpus with baseline results. The use cases concern scientific fields from both speech and text processing, with language model adaptation, thematic segmentation and transcription to slide alignment.


pdf bib
Apport de l’adaptation automatique des modèles de langage pour la reconnaissance de la parole: évaluation qualitative extrinsèque dans un contexte de traitement de cours magistraux (Contribution of automatic adaptation of language models for speech recognition : extrinsic qualitative evaluation in a context of educational courses)
Salima Mdhaffar | Yannick Estève | Nicolas Hernandez | Antoine Laurent | Solen Quiniou
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Malgré les faiblesses connues de cette métrique, les performances de différents systèmes de reconnaissance automatique de la parole sont généralement comparées à l’aide du taux d’erreur sur les mots. Les transcriptions automatiques de ces systèmes sont de plus en plus exploitables et utilisées dans des systèmes complexes de traitement automatique du langage naturel, par exemple pour la traduction automatique, l’indexation, la recherche documentaire... Des études récentes ont proposé des métriques permettant de comparer la qualité des transcriptions automatiques de différents systèmes en fonction de la tâche visée. Dans cette étude nous souhaitons mesurer, qualitativement, l’apport de l’adaptation automatique des modèles de langage au domaine visé par un cours magistral. Les transcriptions du discours de l’enseignant peuvent servir de support à la navigation dans le document vidéo du cours magistral ou permettre l’enrichissement de son contenu pédagogique. C’est à-travers le prisme de ces deux tâches que nous évaluons l’apport de l’adaptation du modèle de langage. Les expériences ont été menées sur un corpus de cours magistraux et montrent combien le taux d’erreur sur les mots est une métrique insuffisante qui masque les apports effectifs de l’adaptation des modèles de langage.


pdf bib
Le corpus PASTEL pour le traitement automatique de cours magistraux (PASTEL corpus for automatic processing of lectures)
Salima Mdhaffar | Antoine Laurent | Yannick Estève
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Le projet PASTEL étudie l’acceptabilité et l’utilisabilité des transcriptions automatiques dans le cadre d’enseignements magistraux. Il s’agit d’outiller les apprenants pour enrichir de manière synchrone et automatique les informations auxquelles ils peuvent avoir accès durant la séance. Cet enrichissement s’appuie sur des traitements automatiques du langage naturel effectués sur les transcriptions automatiques. Nous présentons dans cet article un travail portant sur l’annotation d’enregistrements de cours magistraux enregistrés dans le cadre du projet CominOpenCourseware. Ces annotations visent à effectuer des expériences de transcription automatique, segmentation thématique, appariement automatique en temps réel avec des ressources externes... Ce corpus comprend plus de neuf heures de parole annotées. Nous présentons également des expériences préliminaires réalisées pour évaluer l’adaptation automatique de notre système de reconnaissance de la parole.