Hélène Bonneau-Maynard

Also published as: H. Bonneau-Maynard, Hélène Maynard


2018

pdf bib
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Pierre Godard | Gilles Adda | Martine Adda-Decker | Juan Benjumea | Laurent Besacier | Jamison Cooper-Leavitt | Guy-Noel Kouarata | Lori Lamel | Hélène Maynard | Markus Mueller | Annie Rialland | Sebastian Stueker | François Yvon | Marcely Zanon-Boito
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
Pierre Godard | Laurent Besacier | François Yvon | Martine Adda-Decker | Gilles Adda | Hélène Maynard | Annie Rialland
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30% token F-score from the results of a strong baseline.

2016

pdf bib
Investigating gender adaptation for speech translation
Rachel Bawden | Guillaume Wisniewski | Hélène Maynard
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

In this paper we investigate the impact of the integration of context into dialogue translation. We present a new contextual parallel corpus of television subtitles and show how taking into account speaker gender can significantly improve machine translation quality in terms of B LEU and M ETEOR scores. We perform a manual analysis, which suggests that these improvements are not necessary related to the morphological consequences of speaker gender, but to more general linguistic divergences.

2014

pdf bib
LIMSI English-French speech translation system
Natalia Segal | Hélène Bonneau-Maynard | Quoc Khanh Do | Alexandre Allauzen | Jean-Luc Gauvain | Lori Lamel | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper documents the systems developed by LIMSI for the IWSLT 2014 speech translation task (English→French). The main objective of this participation was twofold: adapting different components of the ASR baseline system to the peculiarities of TED talks and improving the machine translation quality on the automatic speech recognition output data. For the latter task, various techniques have been considered: punctuation and number normalization, adaptation to ASR errors, as well as the use of structured output layer neural network models for speech data.

pdf bib
Topic Adaptation for the Automatic Translation of News Articles (Adaptation thématique pour la traduction automatique de dépêches de presse) [in French]
Souhir Gahbiche-Braham | Hélène Bonneau-Maynard | François Yvon
Proceedings of TALN 2014 (Volume 1: Long Papers)

2012

pdf bib
Repérage des entités nommées pour l’arabe : adaptation non-supervisée et combinaison de systèmes (Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination) [in French]
Souhir Gahbiche-Braham | Hélène Bonneau-Maynard | Thomas Lavergne | François Yvon
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
Souhir Gahbiche-Braham | Hélène Bonneau-Maynard | Thomas Lavergne | François Yvon
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Arabic is a morphologically rich language, and Arabic texts abound of complex word forms built by concatenation of multiple subparts, corresponding for instance to prepositions, articles, roots prefixes, or suffixes. The development of Arabic Natural Language Processing applications, such as Machine Translation (MT) tools, thus requires some kind of morphological analysis. In this paper, we compare various strategies for performing such preprocessing, using generic machine learning techniques. The resulting tool is compared with two open domain alternatives in the context of a statistical MT task and is shown to be faster than its competitors, with no significant difference in MT quality.

2011

pdf bib
Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation
Souhir Gahbiche-Braham | Hélène Bonneau-Maynard | François Yvon
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf bib
LIMSI @ WMT11
Alexandre Allauzen | Hélène Bonneau-Maynard | Hai-Son Le | Aurélien Max | Guillaume Wisniewski | François Yvon | Gilles Adda | Josep Maria Crego | Adrien Lardilleux | Thomas Lavergne | Artem Sokolov
Proceedings of the Sixth Workshop on Statistical Machine Translation

2008

pdf bib
Training and Evaluation of POS Taggers on the French MULTITAG Corpus
Alexandre Allauzen | Hélène Bonneau-Maynard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluation for these kinds of linguistic resources. Therefore in this paper, three standard POS taggers (Treetagger, Brill’s tagger and the standard HMM POS tagger) are trained and evaluated in the same conditions on the French MULTITAG corpus. This POS-tagged corpus provides a tagset richer than the usual ones, including gender and number distinctions, for example. Experimental results show significant differences of performance between the taggers. According to the tagging accuracy estimated with a tagset of 300 items, taggers may be ranked as follows: Treetagger (95.7%), Brill’s tagger (94.6%), HMM tagger (93.4%). Examples of translation outputs illustrate how considering gender and number distinctions in the POS tagset can be relevant.

pdf bib
Limsi’s Statistical Translation Systems for WMT‘08
Daniel Déchelotte | Gilles Adda | Alexandre Allauzen | Hélène Bonneau-Maynard | Olivier Galibert | Jean-Luc Gauvain | Philippe Langlais | François Yvon
Proceedings of the Third Workshop on Statistical Machine Translation

2007

pdf bib
Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
Hélène Bonneau-Maynard | Alexandre Allauzen | Daniel Déchelotte | Holger Schwenk
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

pdf bib
Modèles statistiques enrichis par la syntaxe pour la traduction automatique
Holger Schwenk | Daniel Déchelotte | Hélène Bonneau-Maynard | Alexandre Allauzen
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

La traduction automatique statistique par séquences de mots est une voie prometteuse. Nous présentons dans cet article deux évolutions complémentaires. La première permet une modélisation de la langue cible dans un espace continu. La seconde intègre des catégories morpho-syntaxiques aux unités manipulées par le modèle de traduction. Ces deux approches sont évaluées sur la tâche Tc-Star. Les résultats les plus intéressants sont obtenus par la combinaison de ces deux méthodes.

pdf bib
A state-of-the-art statistical machine translation system based on Moses
Daniel Déchelotte | Holger Schwenk | Hélène Bonneau-Maynard | Alexandre Allauzen | Gilles Adda
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
Results of the French Evalda-Media evaluation campaign for literal understanding
H. Bonneau-Maynard | C. Ayache | F. Bechet | A. Denis | A. Kuhn | F. Lefevre | D. Mostefa | M. Quignard | S. Rosset | C. Servan | J. Villaneau
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The aim of the Media-Evalda project is to evaluate the understanding capabilities of dialog systems. This paper presents the Media protocol for speech understanding evaluation and describes the results of the June 2005 literal evaluation campaign. Five systems, both symbolic or corpus-based, participated to the evaluation which is based on a common semantic representation. Different scorings have been performed on the system results. The understanding error rate, for the Full scoring is, depending on the systems, from 29% to 41.3%. A diagnosis analysis of these results is proposed.

2004

pdf bib
The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems
Laurence Devillers | Hélène Maynard | Sophie Rosset | Patrick Paroubek | Kevin McTait | D. Mostefa | Khalid Choukri | Laurent Charnay | Caroline Bousquet | Nadine Vigouroux | Frédéric Béchet | Laurent Romary | Jean-Yves Antoine | J. Villaneau | Myriam Vergnes | J. Goulian
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The aim of the MEDIA project is to design and test a methodology for the evaluat ion of context-dependent and independent spoken dialogue systems. We propose an evaluation paradigm based on the use of test suites from real-world corpora and a common semantic representation and common metrics. This paradigm should allow us to diagnose the context-sensitive understanding capability of dialogue system s. This paradigm will be used within an evaluation campaign involving several si tes all of which will carry out the task of querying information from a database .

2003

pdf bib
The PEACE SLDS understanding evaluation paradigm of the French MEDIA campaign
Laurence Devillers | Hélène Maynard | Patrick Paroubek | Sophie Rosset
Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?

2002

pdf bib
Annotations for Dynamic Diagnosis of the Dialog State
Laurence Devillers | Sophie Rosset | Hélène Bonneau-Maynard | Lori Lamel
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Predictive Performance of Dialog Systems
H. Bonneau-Maynard | L. Devillers | S. Rosset
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)