2020
pdf
bib
abs
Quel type de systèmes utiliser pour la transcription automatique du français ? Les HMM font de la résistance (What system for the automatic transcription of French in audiovisual broadcasts ?)
Paul Deléglise
|
Carole Lailler
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole
Forts d’une utilisation couronnée de succès en traduction automatique, les systèmes end-to-end dont la sortie réside en une suite de caractères, ont vu leur utilisation étendue à la transcription automatique de la parole. De nombreuses comparaisons ont alors été effectuées sur des corpus anglais libres de droits, de parole lue. Nous proposons ici de réaliser une comparaison entre deux systèmes état de l’art, non pas sur de la parole lue mais bel et bien sur un corpus d’émissions audiovisuelles françaises présentant différents degrés de spontanéité. Le premier est un end-to-end et le second est un système hybride (HMM/DNN). L’obtention de résultats satisfaisants pour le end-to-end nécessitant un lexique et modèle de langage dédiés, il est intéressant de constater qu’une meilleure intégration dans les systèmes hybrides (HMM/DNN) est source de performances supérieures, notamment en Français où le contexte est primordial pour capturer un énoncé.
2016
pdf
bib
abs
Estimation de la qualité d’un système de reconnaissance de la parole pour une tâche de compréhension (Quality estimation of a Speech Recognition System for a Spoken Language Understanding task)
Olivier Galibert
|
Nathalie Camelin
|
Paul Deléglise
|
Sophie Rosset
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP
Nous nous intéressons à l’évaluation de la qualité des systèmes de reconnaissance de la parole étant donné une tâche de compréhension. L’objectif de ce travail est de fournir un outil permettant la sélection d’un système de reconnaissance automatique de la parole le plus adapté pour un système de dialogue donné. Nous comparons ici différentes métriques, notamment le WER, NE-WER et ATENE métrique proposée récemment pour l’évaluation des systèmes de reconnaissance de la parole étant donné une tâche de reconnaissance d’entités nommées. Cette dernière métrique montrait une meilleure corrélation avec les résultats de la tâche globale que toutes les autres métriques testées. Nos mesures indiquent une très forte corrélation avec la mesure ATENE et une moins forte avec le WER.
pdf
bib
abs
Des Réseaux de Neurones avec Mécanisme d’Attention pour la Compréhension de la Parole (Exploring the use of Attention-Based Recurrent Neural Networks For Spoken Language Understanding )
Edwin Simonnet
|
Paul Deléglise
|
Nathalie Camelin
|
Yannick Estève
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP
L’étude porte sur l’apport d’un réseau de neurones récurrent (Recurrent Neural Network RNN) bidirectionnel encodeur/décodeur avec mécanisme d’attention pour une tâche de compréhension de la parole. Les premières expériences faites sur le corpus ATIS confirment la qualité du système RNN état de l’art utilisé pour cet article, en comparant les résultats obtenus à ceux récemment publiés dans la littérature. Des expériences supplémentaires montrent que les RNNs avec mécanisme d’attention obtiennent de meilleures performances que les RNNs récemment proposés pour la tâche d’étiquetage en concepts sémantiques. Sur le corpus MEDIA, un corpus français état de l’art pour la compréhension dédié à la réservation d’hôtel et aux informations touristiques, les expériences montrent qu’un RNN bidirectionnel atteint une f-mesure de 79,51 tandis que le même système intégrant le mécanisme d’attention permet d’atteindre une f-mesure de 80,27.
pdf
bib
Evaluation of acoustic word embeddings
Sahar Ghannay
|
Yannick Estève
|
Nathalie Camelin
|
Paul Deleglise
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP
pdf
bib
abs
Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks
Carole Lailler
|
Anaïs Landeau
|
Frédéric Béchet
|
Yannick Estève
|
Paul Deléglise
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this article, we present the RATP-DECODA Corpus which is composed by a set of 67 hours of speech from telephone conversations of a Customer Care Service (CCS). This corpus is already available on line at
http://sldr.org/sldr000847/fr in its first version. However, many enhancements have been made in order to allow the development of automatic techniques to transcript conversations and to capture their meaning. These enhancements fall into two categories: firstly, we have increased the size of the corpus with manual transcriptions from a new operational day; secondly we have added new linguistic annotations to the whole corpus (either manually or through an automatic processing) in order to perform various linguistic tasks from syntactic and semantic parsing to dialog act tagging and dialog summarization.
2015
pdf
bib
The LIUM ASR and SLT systems for IWSLT 2015
Mercedes Garcia Martínez
|
Loïc Barrault
|
Anthony Rousseau
|
Paul Deléglise
|
Yannick Estève
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign
2014
pdf
bib
abs
LIUM English-to-French spoken language translation system and the Vecsys/LIUM automatic speech recognition system for Italian language for IWSLT 2014
Anthony Rousseau
|
Loïc Barrault
|
Paul Deléglise
|
Yannick Estève
|
Holger Schwenk
|
Samir Bennacef
|
Armando Muscariello
|
Stephan Vanni
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the Spoken Language Translation system developed by the LIUM for the IWSLT 2014 evaluation campaign. We participated in two of the proposed tasks: (i) the Automatic Speech Recognition task (ASR) in two languages, Italian with the Vecsys company, and English alone, (ii) the English to French Spoken Language Translation task (SLT). We present the approaches and specificities found in our systems, as well as the results from the evaluation campaign.
pdf
bib
abs
Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks
Anthony Rousseau
|
Paul Deléglise
|
Yannick Estève
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we present improvements made to the TED-LIUM corpus we released in 2012. These enhancements fall into two categories. First, we describe how we filtered publicly available monolingual data and used it to estimate well-suited language models (LMs), using open-source tools. Then, we describe the process of selection we applied to new acoustic data from TED talks, providing additions to our previously released corpus. Finally, we report some experiments we made around these improvements.
2012
pdf
bib
Avancées dans le domaine de la transcription automatique par décodage guidé (Improvements on driven decoding system combination) [in French]
Fethi Bougares
|
Yannick Estève
|
Paul Deléglise
|
Mickael Rouvier
|
Georges Linarès
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP
pdf
bib
abs
TED-LIUM: an Automatic Speech Recognition dedicated corpus
Anthony Rousseau
|
Paul Deléglise
|
Yannick Estève
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based on the TED Talks. This corpus was built during the IWSLT 2011 Evaluation Campaign, and is composed of 118 hours of speech with its accompanying automatically aligned transcripts. We describe the content of the corpus, how the data was collected and processed, how it will be publicly available and how we built an ASR system using this data leading to a WER score of 17.4 %. The official results we obtained at the IWSLT 2011 evaluation campaign are also discussed.
2011
pdf
bib
abs
LIUM’s systems for the IWSLT 2011 speech translation tasks
Anthony Rousseau
|
Fethi Bougares
|
Paul Deléglise
|
Holger Schwenk
|
Yannick Estève
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated in three of the proposed tasks, namely the Automatic Speech Recognition task (ASR), the ASR system combination task (ASR_SC) and the Spoken Language Translation task (SLT), since these tasks are all related to speech translation. We present the approaches and specificities we developed on each task.
2010
pdf
bib
abs
LIUM’s statistical machine translation system for IWSLT 2010
Anthony Rousseau
|
Loïc Barrault
|
Paul Deléglise
|
Yannick Estève
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the two systems developed by the LIUM laboratory for the 2010 IWSLT evaluation campaign. We participated to the new English to French TALK task. We developed two systems, one for each evaluation condition, both being statistical phrase-based systems using the the Moses toolkit. Several approaches were investigated.
2008
pdf
bib
abs
Combined Systems for Automatic Phonetic Transcription of Proper Nouns
Antoine Laurent
|
Téva Merlin
|
Sylvain Meignier
|
Yannick Estève
|
Paul Deléglise
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Large vocabulary automatic speech recognition (ASR) technologies perform well in known, controlled contexts. However recognition of proper nouns is commonly considered as a difficult task. Accurate phonetic transcription of a proper noun is difficult to obtain, although it can be one of the most important resources for a recognition system. In this article, we propose methods of automatic phonetic transcription applied to proper nouns. The methods are based on combinations of the rule-based phonetic transcription generator LIA_PHON and an acoustic-phonetic decoding system. On the ESTER corpus, we observed that the combined systems obtain better results than our reference system (LIA_PHON). The WER (Word Error Rate) decreased on segments of speech containing proper nouns, without affecting negatively the results on the rest of the corpus. On the same corpus, the Proper Noun Error Rate (PNER, which is a WER computed on proper nouns only), decreased with our new system.
2006
pdf
bib
abs
Automatic Detection of Well Recognized Words in Automatic Speech Transcriptions
Julie Mauclair
|
Yannick Estève
|
Simon Petit-Renaud
|
Paul Deléglise
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This work adresses the use of confidence measures for extracting well recognized words with very low error rate from automatically transcribed segments in a unsupervised way. We present and compare several confidence measures and propose a method to merge them into a new one. We study its capabilities on extracting correct recognized word-segments compared to the amount of rejected words. We apply this fusion measure to select audio segments composed of words with a high confidence score. These segments come from an automatic transcription of french broadcast news given by our speech recognition system based on the CMU Sphinx3.3 decoder. Injecting new data resulting from unsupervised treatments of raw audio recordings in the training corpus of acoustic models gives statistically significant improvement (95% confident interval) in terms of word error rate. Experiments have been carried out on the corpus used during ESTER, the french evaluation campaign.