Olivier Boëffard

Also published as: Olivier Boeffard

2025

Évaluation de la description automatique de scènes audio par la tâche d’Audio Question Answering
Marcel Gibier | Raphaël Duroselle | Pierre Serrano | Olivier Boëffard | Jean-François Bonastre
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

Nous explorons l’évaluation de la tâche de description automatique de scènes audio à travers une approche indirecte basée sur la réponse aux questions sur des documents audio. En l’absence de métriques d’évaluation robustes et automatiques pour la tâche de description automatique de scènes audio, nous nous appuyons sur le benchmark MMAU, un jeu de questions à choix multiple sur des extraits audio variés. Nous introduisons une architecture en cascade qui dépasse les performances de certains modèles de référence de taille comparable. Toutefois, nos résultats mettent en évidence des limitations du benchmark MMAU, notamment un biais textuel et une capacité limitée à évaluer l’intégration conjointe des informations relatives à la parole et aux événements sonores. Nous suggérons des pistes d’amélioration pour rendre les évaluations futures plus fidèles aux enjeux de la tâche de description automatique de scènes audio.

2015

pdf bib

Large Linguistic Corpus Reduction with SCP Algorithms
Nelly Barbot | Olivier Boëffard | Jonathan Chevelu | Arnaud Delhay
Computational Linguistics, Volume 41, Issue 3 - September 2015

2012

pdf bib

Vers une annotation automatique de corpus audio pour la synthèse de parole (Towards Fully Automatic Annotation of Audio Books for Text-To-Speech (TTS) Synthesis) [in French]
Olivier Boëffard | Laure Charonnat | Sébastien Le Maguer | Damien Lolive | Gaëlle Vidal
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf bib

Évaluation segmentale du système de synthèse HTS pour le français (Segmental evaluation of HTS) [in French]
Sébastien Le Maguer | Nelly Barbot | Olivier Boeffard
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

pdf bib abs

Towards Fully Automatic Annotation of Audio Books for TTS
Olivier Boeffard | Laure Charonnat | Sébastien Le Maguer | Damien Lolive
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Building speech corpora is a first and crucial step for every text-to-speech synthesis system. Nowadays, the use of statistical models implies the use of huge sized corpora that need to be recorded, transcribed, annotated and segmented to be usable. The variety of corpora necessary for recent applications (content, style, etc.) makes the use of existing digital audio resources very attractive. Among all available resources, audiobooks, considering their quality, are interesting. Considering this framework, we propose a complete acquisition, segmentation and annotation chain for audiobooks that tends to be fully automatic. The proposed process relies on a data structure, Roots, that establishes the relations between the different annotation levels represented as sequences of items. This methodology has been applied successfully on 11 hours of speech extracted from an audiobook. A manual check, on a part of the corpus, shows the efficiency of the process.

pdf bib abs

Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
Nelly Barbot | Olivier Boeffard | Arnaud Delhay
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Set covering algorithms are efficient tools for solving an optimal linguistic corpus reduction. The optimality of such a process is directly related to the descriptive features of the sentences of a reference corpus. This article suggests to verify experimentally the behaviour of three algorithms, a greedy approach and a lagrangian relaxation based one giving importance to rare events and a third one considering the Kullback-Liebler divergence between a reference and the ongoing distribution of events. The analysis of the content of the reduced corpora shows that the both first approaches stay the most effective to compress a corpus while guaranteeing a minimal content. The variant which minimises the Kullback-Liebler divergence guarantees a distribution of events close to a reference distribution as expected; however, the price for this solution is a much more important corpus. In the proposed experiments, we have also evaluated a mixed-approach considering a random complement to the smallest coverings.

2008

pdf bib abs

Automatic Phone Segmentation of Expressive Speech
Laure Charonnat | Gaëlle Vidal | Olivier Boeffard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In order to improve the flexibility and the precision of an automatic phone segmentation system for a type of expressive speech, the dubbing into French of fiction movies, we developed both the phonetic labeling process and the alignment process. The automatic labelling system relies on an automatic grapheme-to-phoneme conversion including all the variants of the phonetic chain and on HMM modeling. In this article, we will distinguish three sets of phone models: a set of context independent models, a set of left and right context dependant models and finally a mixing of the two that combines phone and triphone models according to the precision of alignment obtained for each phonetic broad-class. The three models are evaluated on a test corpus. On the one hand we notice a little decrease in the score of phonetic labelling mainly due to pauses insertions, but on the other hand the mixed set of models gives the best results for the score of precision of the alignment.

pdf bib abs

Comparing Set-Covering Strategies for Optimal Corpus Design
Jonathan Chevelu | Nelly Barbot | Olivier Boeffard | Arnaud Delhay
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This article is interested in the problem of the linguistic content of a speech corpus. Depending on the target task, the phonological and linguistic content of the corpus is controlled by collecting a set of sentences which covers a preset description of phonological attributes under the constraint of an overall duration as small as possible. This goal is classically achieved by greedy algorithms which however do not guarantee the optimality of the desired cover. In recent works, a lagrangian-based algorithm, called LamSCP, has been used to extract coverings of diphonemes from a large corpus in French, giving better results than a greedy algorithm. We propose to keep comparing both algorithms in terms of the shortest duration, stability and robustness by achieving multi-represented diphoneme or triphoneme covering. These coverings correspond to very large scale optimization problems, from a corpus in English. For each experiment, LamSCP improves the greedy results from 3.9 to 9.7 percent.

pdf bib abs

WEB-Based Listening Test System for Speech Synthesis and Speech Conversion Evaluation
Laurent Blin | Olivier Boeffard | Vincent Barreaud
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this article, we propose a web based listening test system that can be used with a large range of listeners. Our main goals were to make the configuration of the tests as simple and flexible as possible, to simplify the recruiting of the testees and, of course, to keep track of the results using a relational database. This first version of our system can perform the most widely used listening tests in the speech processing community (AB-BA, ABX and MOS tests). It can also easily evolve and propose other tests implemented by the tester by means of a module interface. This scenario is explored in this article which proposes an implementation of a module for Comparison Mean Opinion Score (CMOS) tests and conduct of such an experiment. This test allowed us to extract from the BREF120 corpus a couple of voices of distinct supra-segmental characteristics. This system is offered to the speech synthesis and speech conversion community under free license.

2005

pdf bib abs

Evaluation des Modèles de Langage n-gram et n/m-multigram
Pierre Alain | Olivier Boeffard
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une évaluation de modèles statistiques du langage menée sur la langue Française. Nous avons cherché à comparer la performance de modèles de langage exotiques par rapport aux modèles plus classiques de n-gramme à horizon fixe. Les expériences réalisées montrent que des modèles de n-gramme à horizon variable peuvent faire baisser de plus de 10% en moyenne la perplexité d’un modèle de n-gramme à horizon fixe. Les modèles de n/m-multigramme demandent une adaptation pour pouvoir être concurrentiels.

2004

pdf bib

2002

pdf bib

The Greedy Algorithm and its Application to the Construction of a Continuous Speech Database
Hélène François | Olivier Boëffard
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)