Ioana Vasilescu

Also published as: I. Vasilescu


pdf bib
Linguistic Nudges and Verbal Interaction with Robots, Smart-Speakers, and Humans
Natalia Kalashnikova | Ioana Vasilescu | Laurence Devillers
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes a data collection methodology and emotion annotation of dyadic interactions between a human, a Pepper robot, a Google Home smart-speaker, or another human. The collected 16 hours of audio recordings were used to analyze the propensity to change someone’s opinions about ecological behavior regarding the type of conversational agent, the kind of nudges, and the speaker’s emotional state. We describe the statistics of data collection and annotation. We also report the first results, which showed that humans change their opinions on more questions with a human than with a device, even against mainstream ideas. We observe a correlation between a certain emotional state and the interlocutor and a human’s propensity to be influenced. We also reported the results of the studies that investigated the effect of human likeness on speech using our data.

pdf bib
Using Speech Technology to Test Theories of Phonetic and Phonological Typology
Anisia Popescu | Lori Lamel | Ioana Vasilescu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The present paper uses speech technology derived tools and methodologies to test theories about phonetic typology. We specifically look at how the two-way laryngeal contrast (voiced /b, d, g, v, z/ vs. voiceless /p, t, k, f, s/ obstruents) is implemented in European Portuguese, a language that has been suggested to exhibit a different voicing system than its sister Romance languages, more similar to the one found for Germanic languages. A large European Portuguese corpus was force aligned using (1) different combinations of parallel Portuguese (original), Italian (Romance language) and German (Germanic language) acoustic phone models and letting an ASR system choose the best fitting one, and (2) pronunciation variants (/b, d, g, v, z/ produced as either [b, d, g, v, z] or [p, t, k, f, s]) for obstruent consonants. Results support previous accounts in the literature that European Portuguese is diverging from the traditional voicing system known for Romance language, towards a hybrid system where stops and fricatives are specified for different voicing features.


pdf bib
Typological classification of European Portuguese fricatives: a cross-language forced alignment and pronunciation variants study
Anisia Popescu | Lori Lamel | Ioana Vasilescu
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

pdf bib
Effet de l’anthropomorphisme des machines sur le français adressé aux robots: Étude du débit de parole et de la fluence
Natalia Kalashnikova | Mathilde Hutin | Ioana Vasilescu | Laurence Devillers
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 4 : articles déjà soumis ou acceptés en conférence internationale

“Robot-directed speech” désigne la parole adressée à un appareil robotique, des petites enceintes domestiques aux robots humanoïdes grandeur-nature. Les études passées ont analysé les propriétés phonétiques et linguistiques de ce type de parole ou encore l’effet de l’anthropomorphisme des appareils sur la sociabilité des interactions, mais l’effet de l’anthropomorphisme sur les réalisations linguistiques n’a encore jamais été exploré. Notre étude propose de combler ce manque avec l’analyse d’un paramètre phonétique (débit de parole) et d’un paramètre linguistique (fréquence des pauses remplies) sur la parole adressée à l’enceinte vs. au robot humanoïde vs. à l’humain. Les données de 71 francophones natifs indiquent que les énoncés adressés aux humains sont plus longs, plus rapides et plus dysfluents que ceux adressés à l’enceinte et au robot. La parole adressée à l’enceinte et au robot est significativement différente de la parole adressée à l’humain, mais pas l’une de l’autre, indiquant l’existence d’un type particulier de la parole adressée aux machines.


pdf bib
Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives
Tanvi Dinkar | Chloé Clavel | Ioana Vasilescu
Traitement Automatique des Langues, Volume 63, Numéro 3 : Etats de l'art en TAL [Review articles in NLP]

pdf bib
Using a Knowledge Base to Automatically Annotate Speech Corpora and to Identify Sociolinguistic Variation
Yaru Wu | Fabian Suchanek | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Speech characteristics vary from speaker to speaker. While some variation phenomena are due to the overall communication setting, others are due to diastratic factors such as gender, provenance, age, and social background. The analysis of these factors, although relevant for both linguistic and speech technology communities, is hampered by the need to annotate existing corpora or to recruit, categorise, and record volunteers as a function of targeted profiles. This paper presents a methodology that uses a knowledge base to provide speaker-specific information. This can facilitate the enrichment of existing corpora with new annotations extracted from the knowledge base. The method also helps the large scale analysis by automatically extracting instances of speech variation to correlate with diastratic features. We apply our method to an over 120-hour corpus of broadcast speech in French and investigate variation patterns linked to reduction phenomena and/or specific to connected speech such as disfluencies. We find significant differences in speech rate, the use of filler words, and the rate of non-canonical realisations of frequent segments as a function of different professional categories and age groups.

pdf bib
Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages
Yaru Wu | Mathilde Hutin | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper builds upon recent work in leveraging the corpora and tools originally used to develop speech technologies for corpus-based linguistic studies. We address the non-canonical realization of consonants in connected speech and we focus on voicing alternation phenomena of stops in 5 standard varieties of Romance languages (French, Italian, Spanish, Portuguese, Romanian). For these languages, both large scale corpora and speech recognition systems were available for the study. We use forced alignment with pronunciation variants and machine learning techniques to examine to what extent such frequent phenomena characterize languages and what are the most triggering factors. The results confirm that voicing alternations occur in all Romance languages. Automatic classification underlines that surrounding contexts and segment duration are recurring contributing factors for modeling voicing alternation. The results of this study also demonstrate the new role that machine learning techniques such as classification algorithms can play in helping to extract linguistic knowledge from speech and to suggest interesting research directions.

pdf bib
Corpus Design for Studying Linguistic Nudges in Human-Computer Spoken Interactions
Natalia Kalashnikova | Serge Pajak | Fabrice Le Guel | Ioana Vasilescu | Gemma Serrano | Laurence Devillers
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present the methodology of corpus design that will be used to study the comparison of influence between linguistic nudges with positive or negative influences and three conversational agents: robot, smart speaker, and human. We recruited forty-nine participants to form six groups. The conversational agents first asked the participants about their willingness to adopt five ecological habits and invest time and money in ecological problems. The participants were then asked the same questions but preceded by one linguistic nudge with positive or negative influence. The comparison of standard deviation and mean metrics of differences between these two notes (before the nudge and after) showed that participants were mainly affected by nudges with positive influence, even though several nudges with negative influence decreased the average note. In addition, participants from all groups were willing to spend more money than time on ecological problems. In general, our experiment’s early results suggest that a machine agent can influence participants to the same degree as a human agent. A better understanding of the power of influence of different conversational machines and the potential of influence of nudges of different polarities will lead to the development of ethical norms of human-computer interactions.


pdf bib
Lénition et fortition des occlusives en coda finale dans deux langues romanes : le français et le roumain (Lenition and fortition of word-final stops in two Romance languages: French and Romanian)
Mathilde Hutin | Adèle Jatteau | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

L’exploration automatisée de grands corpus permet d’analyser plus finement la relation entre motifs de variation phonétique synchronique et changements diachroniques : les erreurs dans les transcriptions automatiques sont riches d’enseignements sur la variation contextuelle en parole continue et sur les possibles mutations systémiques sur le point d’apparaître. Dès lors, il est intéressant de se pencher sur des phénomènes phonologiques largement attestés dans les langues en diachronie comme en synchronie pour établir leur émergence ou non dans des langues qui n’y sont pas encore sujettes. La présente étude propose donc d’utiliser l’alignement forcé avec variantes de prononciation pour observer les alternances de voisement en coda finale de mot dans deux langues romanes : le français et le roumain. Il sera mis en évidence, notamment, que voisement et dévoisement non-canoniques des codas françaises comme roumaines ne sont pas le fruit du hasard mais bien des instances de dévoisement final et d’assimilation régressive de trait laryngal, qu’il s’agisse de voisement ou de non-voisement.

pdf bib
Alternances de voisement et processus de lénition et de fortition : une étude automatisée de grands corpus en cinq langues romanes [Voicing alternations in relation with lenition and fortition phenomena: an automated study of large corpora in five Romance languages]
Ioana Vasilescu | Yaru Wu | Adèle Jatteau | Martine Adda-Decker | Lori Lamel
Traitement Automatique des Langues, Volume 61, Numéro 1 : Varia [Varia]

pdf bib
Lenition and Fortition of Stop Codas in Romanian
Mathilde Hutin | Oana Niculescu | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

The present paper aims at providing a first study of lenition- and fortition-type phenomena in coda position in Romanian, a language that can be considered as less-resourced. Our data show that there are two contexts for devoicing in Romanian: before a voiceless obstruent, which means that there is regressive voicelessness assimilation in the language, and before pause, which means that there is a tendency towards final devoicing proper. The data also show that non-canonical voicing is an instance of voicing assimilation, as it is observed mainly before voiced consonants (voiced obstruents and sonorants alike). Two conclusions can be drawn from our analyses. First, from a phonetic point of view, the two devoicing phenomena exhibit the same behavior regarding place of articulation of the coda, while voicing assimilation displays the reverse tendency. In particular, alveolars, which tend to devoice the most, also voice the least. Second, the two assimilation processes have similarities that could distinguish them from final devoicing as such. Final devoicing seems to be sensitive to speech style and gender of the speaker, while assimilation processes do not. This may indicate that the two kinds of processes are phonologized at two different degrees in the language, assimilation being more accepted and generalized than final devoicing.


pdf bib
Réalisation phonétique et contraste phonologique marginal : une étude automatique des voyelles du roumain (Phonetic realization and marginal phonemic contrast : an automatic study of the Romanian vowels)
Ioana Vasilescu | Margaret Renwick | Camille Dutrey | Lori Lamel | Biana Vieru
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Cet article est dédié à l’analyse acoustique des voyelles du roumain : des productions en parole continue sont comparées à des prononciations “de laboratoire”. Les objectifs sont : (1) décrire les traits acoustiques des voyelles en fonction du style de parole ; (2) estimer la relation entre traits acoustiques et contrastes phonémiques de la langue ; (3) estimer dans quelle mesure l’étude de l’oral apporte des éclairages au sujet des attributs phonémiques des voyelles centrales [2] et [1], dont le statut (phonèmes vs allophones) est controversé. Nous montrons que les traits acoustiques sont comparables pour la parole journalistique vs contrôlée pour l’ensemble de l’inventaire sauf [2] et [1]. Dans la parole contrôlée [2] et [1] sont distinctes, mais confondues en faveur du timbre [2] à l’oral. La confusion de timbres n’est pas source d’inintelligibilité car [2] et [1] sont en distribution quasicomplémentaire. Ce résultat apporte des éclairages sur la question du contraste phonémique graduel et marginal (Goldsmith, 1995; Scobbie & Stuart-Smith, 2008; Hall, 2013).


pdf bib
Morpho-Syntactic Study of Errors from Speech Recognition System
Maria Goryainova | Cyril Grouin | Sophie Rosset | Ioana Vasilescu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context. The typology concerns the forms, the lemmas and the POS involved in erroneous chunks, and in the surrounding contexts. Comparison with error free contexts are also provided. The study is conducted on French. Morpho-syntactic analysis underlines that three main classes are particularly represented in the erroneous chunks: (i) grammatical words (to, of, the), (ii) auxiliary verbs (has, is), and (iii) modal verbs (should, must). Such items are widely encountered in the ASR outputs as frequent candidates to transcription errors. The analysis of the context points out that some left 3-grams contexts (e.g., repetitions, that is disfluencies, bracketing formulas such as “c’est”, etc.) may be better predictors than others. Finally, the surface analysis conducted through a Levensthein distance analysis, highlighted that the most common distance is of 2 characters and mainly involves differences between inflected forms of a unique item.

pdf bib
Human annotation of ASR error regions: Is “gravity” a sharable concept for human annotators?
Daniel Luzzati | Cyril Grouin | Ioana Vasilescu | Martine Adda-Decker | Eric Bilinski | Nathalie Camelin | Juliette Kahn | Carole Lailler | Lori Lamel | Sophie Rosset
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the “seriousness” of an ASR error using their own scientific background. Eight human annotators were involved in an annotation task on three distinct corpora, one of the corpora being annotated twice, hiding this annotation in duplicate to the annotators. None of the computed results (inter-annotator agreement, edit distance, majority annotation) allow any strong correlation between the considered criteria and the level of seriousness to be shown, which underlines the difficulty for a human to determine whether a ASR error is serious or not.


pdf bib
Quel est l’apport de la détection d’entités nommées pour l’extraction d’information en domaine restreint ? (What is the contribution of named entities detection for information extraction in restricted domain ?) [in French]
Camille Dutrey | Chloé Clavel | Sophie Rosset | Ioana Vasilescu | Martine Adda-Decker
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Cross-lingual studies of ASR errors: paradigms for perceptual evaluations
Ioana Vasilescu | Martine Adda-Decker | Lori Lamel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

It is well-known that human listeners significantly outperform machines when it comes to transcribing speech. This paper presents a progress report of the joint research in the automatic vs human speech transcription and of the perceptual experiments developed at LIMSI that aims to increase our understanding of automatic speech recognition errors. Two paradigms are described here in which human listeners are asked to transcribe speech segments containing words that are frequently misrecognized by the system. In particular, we sought to gain information about the impact of increased context to help humans disambiguate problematic lexical items, typically homophone or near-homophone words. The long-term aim of this research is to improve the modeling of ambiguous contexts so as to reduce automatic transcription errors.


pdf bib
On the Role of Discourse Markers in Interactive Spoken Question Answering Systems
Ioana Vasilescu | Sophie Rosset | Martine Adda-Decker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a preliminary analysis of the role of some discourse markers and the vocalic hesitation ""euh"" in a corpus of spoken human utterances collected with the Ritel system, an open domain and spoken dialog system. The frequency and contextual combinatory of classical discourse markers and of the vocalic hesitation have been studied. This analysis pointed out some specificity in terms of combinatory of the analyzed items. The classical discourse markers seem to help initiating larger discursive blocks both at initial and medial positions of the on-going turns. The vocalic hesitation stand also for marking the user's embarrassments and wish to close the dialog.


pdf bib
Caractéristiques acoustiques et prosodiques des hésitations vocaliques dans trois langues [Acoustic and prosodic characteristics of vocalic hesitations in three languages]
Ioana Vasilescu | Martine Adda-Decker | Rena Nemoto
Traitement Automatique des Langues, Volume 49, Numéro 3 : Recherches actuelles en phonologie et en phonétique : interfaces avec le traitement automatique des langues [Current Research in Phonology and Phonetics: Interfaces with Natural-Language Processing]

pdf bib
Speech Errors on Frequently Observed Homophones in French: Perceptual Evaluation vs Automatic Classification
Rena Nemoto | Ioana Vasilescu | Martine Adda-Decker
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The present contribution aims at increasing our understanding of automatic speech recognition (ASR) errors involving frequent homophone or almost homophone words by confronting them to perceptual results. The long-term aim is to improve acoustic modelling of these items to reduce automatic transcription errors. A first question of interest addressed in this paper is whether homophone words such as “et” (and); and “est” (to be), for which ASR systems rely on language model weights, can be discriminated in a perceptual transcription test with similar n-gram constraints. A second question concerns the acoustic separability of the two homophone words using appropriate acoustic and prosodic attributes. The perceptual test reveals that even though automatic and perceptual errors correlate positively, human listeners deal with local ambiguity more efficiently than the ASR system in conditions which attempt to approximate the information available for decision for a 4-gram language model. The corresponding acoustic analysis shows that the two homophone words may be distinguished thanks to some relevant acoustic and prosodic attributes. A first experiment in automatic classification of the two words using data mining techniques highlights the role of the prosodic (duration and voicing) and contextual information (pauses co-occurrence) in distinguishing the two words. Current results, even though preliminary, suggests that new levels of information, so far unexplored in pronunciations’ modelling for ASR, may be considered in order to efficiently factorize the word variants observed in speech and to improve the automatic speech transcription.


pdf bib
Fear-type emotions of the SAFE Corpus: annotation issues
Chloé Clavel | Ioana Vasilescu | Laurence Devillers | Thibaut Ehrette | Gaël Richard
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The present research focuses on annotation issues in the context of the acoustic detection of fear-type emotions for surveillance applications. The emotional speech material used for this study comes from the previously collected SAFE Database (Situation Analysis in a Fictional and Emotional Database) which consists of audio-visual sequences extracted from movie fictions. A generic annotation scheme was developed to annotate the various emotional manifestations contained in the corpus. The annotation was carried out by two labellers and the two annotations strategies are confronted. It emerges that the borderline between emotion and neutral vary according to the labeller. An acoustic validation by a third labeller allows at analysing the two strategies. Two human strategies are then observed: a first one, context-oriented which mixes audio and contextual (video) information in emotion categorization; and a second one, based mainly on audio information. The k-means clustering confirms the role of audio cues in human annotation strategies. It particularly helps in evaluating those strategies from the point of view of a detection system based on audio cues.


pdf bib
Reliability of Lexical and Prosodic Cues in Two Real-life Spoken Dialog Corpora
L. Devillers | I. Vasilescu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)