Viet-Bac Le


2011

pdf bib
Speech recognition for machine translation in Quaero
Lori Lamel | Sandrine Courcinous | Julien Despres | Jean-Luc Gauvain | Yvan Josse | Kevin Kilgour | Florian Kraft | Viet-Bac Le | Hermann Ney | Markus Nußbaum-Thom | Ilya Oparin | Tim Schlippe | Ralf Schlüter | Tanja Schultz | Thiago Fraga da Silva | Sebastian Stüker | Martin Sundermeyer | Bianca Vieru | Ngoc Thang Vu | Alexander Waibel | Cécile Woehrling
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

2009

pdf bib
Mining a Comparable Text Corpus for a Vietnamese-French Statistical Machine Translation System
Thi-Ngoc-Diep Do | Viet-Bac Le | Brigitte Bigi | Laurent Besacier | Eric Castelli
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Exploitation d’un corpus bilingue pour la création d’un système de traduction probabiliste Vietnamien - Français
Thi-Ngoc-Diep Do | Viet-Bac Le | Brigitte Bigi | Laurent Besacier | Eric Castelli
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente nos premiers travaux en vue de la construction d’un système de traduction probabiliste pour le couple de langue vietnamien-français. La langue vietnamienne étant considérée comme une langue peu dotée, une des difficultés réside dans la constitution des corpus parallèles, indispensable à l’apprentissage des modèles. Nous nous concentrons sur la constitution d’un grand corpus parallèle vietnamien-français. La méthode d’identification automatique des paires de documents parallèles fondée sur la date de publication, les mots spéciaux et les scores d’alignements des phrases est appliquée. Cet article présente également la construction d’un premier système de traduction automatique probabiliste vietnamienfrançais et français-vietnamien à partir de ce corpus et discute l’opportunité d’utiliser des unités lexicales ou sous-lexicales pour le vietnamien (syllabes, mots, ou leurs combinaisons). Les performances du système sont encourageantes et se comparent avantageusement à celles du système de Google.

2007

pdf bib
The LIG Arabic/English speech translation system at IWSLT07
Laurent Besacier | Amar Mahdhaoui | Viet-Bac Le
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper is a description of the system presented by the LIG laboratory to the IWSLT07 speech translation evaluation. The LIG participated, for the first time this year, in the Arabic to English speech translation task. For translation, we used a conventional statistical phrase-based system developed using the moses open source decoder. Our baseline MT system is described and we discuss particularly the use of an additional bilingual dictionary which seems useful when few training data is available. The main contribution of this paper concerns the proposal of a lattice decomposition algorithm that allows transforming a word lattice into a sub word lattice compatible with our MT model that uses word segmentation on the Arabic part. The lattice is then transformed into a confusion network which can be directly decoded into moses. The results show that this method outperforms the conventional 1-best translation which consists in translating only the most probable ASR hypothesis. The best BLEU score, from ASR output obtained on IWSLT06 evaluation data is 0.2253. The results confirm the interest of full CN decoding for speech translation, compared to traditional ASR 1-best approach. Our primary system was ranked 7/14 for IWSLT07 AE ASR task with a BLEU score of 0.3804.

2004

pdf bib
Spoken and Written Language Resources for Vietnamese
Viet-Bac Le | Do-Dat Tran | Eric Castelli | Laurent Besacier | Jean-François Serignat
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)