Frédéric Vernier


2022

pdf bib
Aesop’s fable “The North Wind and the Sun” Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages
Elena Knyazeva | Philippe Boula de Mareüil | Frédéric Vernier
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes a method of semi-automatic word spotting in minority languages, from one and the same Aesop fable “The North Wind and the Sun” translated in Romance languages/dialects from Hexagonal (i.e. Metropolitan) France and languages from French Polynesia. The first task consisted of finding out how a dozen words such as “wind” and “sun” were translated in over 200 versions collected in the field — taking advantage of orthographic similarity, word position and context. Occurrences of the translations were then extracted from the phone-aligned recordings. The results were judged accurate in 96–97% of cases, both on the development corpus and a test set of unseen data. Corrected alignments were then mapped and basemaps were drawn to make various linguistic phenomena immediately visible. The paper exemplifies how regular expressions may be used for this purpose. The final result, which takes the form of an online speaking atlas (enriching the https://atlas.limsi.fr website), enables us to illustrate lexical, morphological or phonetic variation.

2018

pdf bib
A Speaking Atlas of the Regional Languages of France
Philippe Boula de Mareüil | Albert Rilliard | Frédéric Vernier
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Cartopho : un site web de cartographie de variantes de prononciation en français (Cartopho: a website for mapping pronunciation variants in French)
Philippe Boula de Mareüil | Jean-Philippe Goldman | Albert Rilliard | Yves Scherrer | Frédéric Vernier
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Le présent travail se propose de renouveler les traditionnels atlas dialectologiques pour cartographier les variantes de prononciation en français, à travers un site internet. La toile est utilisée non seulement pour collecter des données, mais encore pour disséminer les résultats auprès des chercheurs et du grand public. La méthodologie utilisée, à base de crowdsourcing (ou « production participative »), nous a permis de recueillir des informations auprès de 2500 francophones d’Europe (France, Belgique, Suisse). Une plateforme dynamique à l’interface conviviale a ensuite été développée pour cartographier la prononciation de 70 mots dans les différentes régions des pays concernés (des mots notamment à voyelle moyenne ou dont la consonne finale peut être prononcée ou non). Les options de visualisation par département/canton/province ou par région, combinant plusieurs traits de prononciation et ensembles de mots, sous forme de pastilles colorées, de hachures, etc. sont présentées dans cet article. On peut ainsi observer immédiatement un /E/ plus fermé (ainsi qu’un /O/ plus ouvert) dans le Nord-Pas-de-Calais et le sud de la France, pour des mots comme parfait ou rose, un /Œ/ plus fermé en Suisse pour un mot comme gueule, par exemple.

pdf bib
Providing and Analyzing NLP Terms for our Community
Gil Francopoulo | Joseph Mariani | Patrick Paroubek | Frédéric Vernier
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

By its own nature, the Natural Language Processing (NLP) community is a priori the best equipped to study the evolution of its own publications, but works in this direction are rare and only recently have we seen a few attempts at charting the field. In this paper, we use the algorithms, resources, standards, tools and common practices of the NLP field to build a list of terms characteristic of ongoing research, by mining a large corpus of scientific publications, aiming at the largest possible exhaustivity and covering the largest possible time span. Study of the evolution of this term list through time reveals interesting insights on the dynamics of field and the availability of the term database and of the corpus (for a large part) make possible many further comparative studies in addition to providing a test field for a new graphic interface designed to perform visual time analytics of large sized thesauri.