Jean-Luc Gauvain


2024

pdf bib
ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task
Waad Ben Kheder | Josef Jon | André Beyer | Abdel Messaoudi | Rabea Affan | Claude Barras | Maxim Tychonov | Jean-Luc Gauvain
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper presents ALADAN’s approach to the IWSLT 2024 Dialectal and Low-resource shared task, focusing on Levantine Arabic (apc) and Tunisian Arabic (aeb) to English speech translation (ST). Addressing challenges such as the lack of standardized orthography and limited training data, we propose a solution for data normalization in Dialectal Arabic, employing a modified Levenshtein distance and Word2vec models to find orthographic variants of the same word. Our system consists of a cascade ST system integrating two ASR systems (TDNN-F and Zipformer) and two NMT modules derived from pre-trained models (NLLB-200 1.3B distilled model and CohereAI’s Command-R). Additionally, we explore the integration of unsupervised textual and audio data, highlighting the importance of multi-dialectal datasets for both ASR and NMT tasks. Our system achieves BLEU score of 31.5 for Levantine Arabic on the official validation set.

2014

pdf bib
LIMSI English-French speech translation system
Natalia Segal | Hélène Bonneau-Maynard | Quoc Khanh Do | Alexandre Allauzen | Jean-Luc Gauvain | Lori Lamel | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper documents the systems developed by LIMSI for the IWSLT 2014 speech translation task (English→French). The main objective of this participation was twofold: adapting different components of the ASR baseline system to the peculiarities of TED talks and improving the machine translation quality on the automatic speech recognition output data. For the latter task, various techniques have been considered: punctuation and number normalization, adaptation to ASR errors, as well as the use of structured output layer neural network models for speech data.

2011

pdf bib
Speech recognition for machine translation in Quaero
Lori Lamel | Sandrine Courcinous | Julien Despres | Jean-Luc Gauvain | Yvan Josse | Kevin Kilgour | Florian Kraft | Viet-Bac Le | Hermann Ney | Markus Nußbaum-Thom | Ilya Oparin | Tim Schlippe | Ralf Schlüter | Tanja Schultz | Thiago Fraga da Silva | Sebastian Stüker | Martin Sundermeyer | Bianca Vieru | Ngoc Thang Vu | Alexander Waibel | Cécile Woehrling
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

2010

pdf bib
The Quaero program: multilingual and multimedia technologies
Jean-Luc Gauvain
Proceedings of the 7th International Workshop on Spoken Language Translation: Plenaries

2008

pdf bib
Limsi’s Statistical Translation Systems for WMT‘08
Daniel Déchelotte | Gilles Adda | Alexandre Allauzen | Hélène Bonneau-Maynard | Olivier Galibert | Jean-Luc Gauvain | Philippe Langlais | François Yvon
Proceedings of the Third Workshop on Statistical Machine Translation

2006

pdf bib
Continuous Space Language Models for Statistical Machine Translation
Holger Schwenk | Daniel Dechelotte | Jean-Luc Gauvain
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Training Neural Network Language Models on Very Large Corpora
Holger Schwenk | Jean-Luc Gauvain
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2001

pdf bib
Portability Issues for Speech Recognition Technologies
Lori Lamel | Fabrice Lefevre | Jean-Luc Gauvain | Gilles Adda
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Invited Talk: Processing Broadcast Audio for Information Access
Jean-Luc Gauvain | Lori Lamel | Gilles Adda | Martine Adda-Decker | Claude Barras | Langzhou Chen | Yannick de Kercadio
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
Identification of Non-Linguistic Speech Features
Jean-Luc Gauvain | Lori F. Lamel
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf bib
Progress Report on the Chronus System: ATIS Benchmark Results
Roberto Pieraccini | Evelyne Tzoukermann | Zakhar Gorelov | Esther Levin | Chin-Hui Lee | Jean-Luc Gauvain
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
MAP Estimation of Continuous Density HMM : Theory and Applications
Jean-Luc Gauvain | Chin-Hui Lee
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
Speaker-Independent Phone Recognition Using BREF
Jean-Luc Gauvain | Lori F. Lamel
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

1991

pdf bib
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models
Jean-Luc Gauvain | Chin-Hui Lee
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991