Agata Savary


2020

pdf bib
Verbal Multiword Expression Identification: Do We Need a Sledgehammer to Crack a Nut?
Caroline Pasquer | Agata Savary | Carlos Ramisch | Jean-Yves Antoine
Proceedings of the 28th International Conference on Computational Linguistics

Automatic identification of multiword expressions (MWEs), like ‘to cut corners’ (to do an incomplete job), is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. This paper deals with a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. A simple language-independent system based on a combination of filters competes with the best systems from a recent shared task: it obtains the best averaged F-score over 11 languages (0.6653) and even the best score for both seen and unseen VMWEs due to the high proportion of seen VMWEs in texts. This highlights the fact that focusing on the identification of seen VMWEs could be a strategy to improve VMWE identification in general.

pdf bib
DOING@DEFT : cascade de CRF pour l’annotation d’entités cliniques imbriquées (DOING@DEFT : cascade of CRF for the annotation of nested clinical entities)
Anne-Lyse Minard | Andréane Roques | Nicolas Hiot | Mirian Halfeld Ferrari Alves | Agata Savary
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes

Cet article présente le système développé par l’équipe DOING pour la campagne d’évaluation DEFT 2020 portant sur la similarité sémantique et l’extraction d’information fine. L’équipe a participé uniquement à la tâche 3 : “extraction d’information”. Nous avons utilisé une cascade de CRF pour annoter les différentes informations à repérer. Nous nous sommes concentrés sur la question de l’imbrication des entités et de la pertinence d’un type d’entité pour apprendre à reconnaître un autre. Nous avons également testé l’utilisation d’une ressource externe, MedDRA, pour améliorer les performances du système et d’un pipeline plus complexe mais ne gérant pas l’imbrication des entités. Nous avons soumis 3 runs et nous obtenons en moyenne sur toutes les classes des F-mesures de 0,64, 0,65 et 0,61.

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Stella Markantonatou | John McCrae | Jelena Mitrović | Carole Tiberius | Carlos Ramisch | Ashwini Vaidya | Petya Osenova | Agata Savary
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

pdf bib
Polish corpus of verbal multiword expressions
Agata Savary | Jakub Waszczuk
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper describes a manually annotated corpus of verbal multi-word expressions in Polish. It is among the 4 biggest datasets in release 1.2 of the PARSEME multiligual corpus. We describe the data sources, as well as the annotation process and its outcomes. We also present interesting phenomena encountered during the annotation task and put forward enhancements for the PARSEME annotation guidelines.

pdf bib
Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions
Carlos Ramisch | Agata Savary | Bruno Guillaume | Jakub Waszczuk | Marie Candito | Ashwini Vaidya | Verginica Barbu Mititelu | Archna Bhatia | Uxoa Iñurrieta | Voula Giouli | Tunga Güngör | Menghan Jiang | Timm Lichte | Chaya Liebeskind | Johanna Monti | Renata Ramisch | Sara Stymne | Abigail Walsh | Hongzhi Xu
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

We present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions.

pdf bib
Seen2Unseen at PARSEME Shared Task 2020: All Roads do not Lead to Unseen Verb-Noun VMWEs
Caroline Pasquer | Agata Savary | Carlos Ramisch | Jean-Yves Antoine
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

We describe the Seen2Unseen system that participated in edition 1.2 of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs). The identification of VMWEs that do not appear in the provided training corpora (called unseen VMWEs) – with a focus here on verb-noun VMWEs – is based on mutual information and lexical substitution or translation of seen VMWEs. We present the architecture of the system, report results for 14 languages, and propose an error analysis.

pdf bib
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Agata Savary | Yue Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2019

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

pdf bib
Without lexicons, multiword expression identification will never fly: A position statement
Agata Savary | Silvio Cordeiro | Carlos Ramisch
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

Because most multiword expressions (MWEs), especially verbal ones, are semantically non-compositional, their automatic identification in running text is a prerequisite for semantically-oriented downstream applications. However, recent developments, driven notably by the PARSEME shared task on automatic identification of verbal MWEs, show that this task is harder than related tasks, despite recent contributions both in multilingual corpus annotation and in computational models. In this paper, we analyse possible reasons for this state of affairs. They lie in the nature of the MWE phenomenon, as well as in its distributional properties. We also offer a comparative analysis of the state-of-the-art systems, which exhibit particularly strong sensitivity to unseen data. On this basis, we claim that, in order to make strong headway in MWE identification, the community should bend its mind into coupling identification of MWEs with their discovery, via syntactic MWE lexicons. Such lexicons need not necessarily achieve a linguistically complete modelling of MWEs’ behavior, but they should provide minimal morphosyntactic information to cover some potential uses, so as to complement existing MWE-annotated corpora. We define requirements for such minimal NLP-oriented lexicon, and we propose a roadmap for the MWE community driven by these requirements.

pdf bib
Démonstrateur en-ligne du projet ANR PARSEME-FR sur les expressions polylexicales (On-line demonstrator of the PARSEME-FR project on multiword expressions)
Marine Schmitt | Elise Moreau | Mathieu Constant | Agata Savary
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume IV : Démonstrations

Nous présentons le démonstrateur en-ligne du projet ANR PARSEME-FR dédié aux expressions polylexicales. Il inclut différents outils d’identification de telles expressions et un outil d’exploration des ressources linguistiques de ce projet.

2018

pdf bib
If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions
Caroline Pasquer | Agata Savary | Carlos Ramisch | Jean-Yves Antoine
Proceedings of the 27th International Conference on Computational Linguistics

Multiword expressions, especially verbal ones (VMWEs), show idiosyncratic variability, which is challenging for NLP applications, hence the need for VMWE identification. We focus on the task of variant identification, i.e. identifying variants of previously seen VMWEs, whatever their surface form. We model the problem as a classification task. Syntactic subtrees with previously seen combinations of lemmas are first extracted, and then classified on the basis of features relevant to morpho-syntactic variation of VMWEs. Feature values are both absolute, i.e. hold for a particular VMWE candidate, and relative, i.e. based on comparing a candidate with previously seen VMWEs. This approach outperforms a baseline by 4 percent points of F-measure on a French corpus.

pdf bib
Towards a Variability Measure for Multiword Expressions
Caroline Pasquer | Agata Savary | Jean-Yves Antoine | Carlos Ramisch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

One of the most outstanding properties of multiword expressions (MWEs), especially verbal ones (VMWEs), important both in theoretical models and applications, is their idiosyncratic variability. Some MWEs are always continuous, while some others admit certain types of insertions. Components of some MWEs are rarely or never modified, while some others admit either specific or unrestricted modification. This unpredictable variability profile of MWEs hinders modeling and processing them as “words-with-spaces” on the one hand, and as regular syntactic structures on the other hand. Since variability of MWEs is a matter of scale rather than a binary property, we propose a 2-dimensional language-independent measure of variability dedicated to verbal MWEs based on syntactic and discontinuity-related clues. We assess its relevance with respect to a linguistic benchmark and its utility for the tasks of VMWE classification and variant identification on a French corpus.

pdf bib
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Agata Savary | Carlos Ramisch | Jena D. Hwang | Nathan Schneider | Melanie Andresen | Sameer Pradhan | Miriam R. L. Petruck
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

pdf bib
VarIDE at PARSEME Shared Task 2018: Are Variants Really as Alike as Two Peas in a Pod?
Caroline Pasquer | Carlos Ramisch | Agata Savary | Jean-Yves Antoine
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

We describe the VarIDE system (standing for Variant IDEntification) which participated in the edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs). Our system focuses on the task of VMWE variant identification by using morphosyntactic information in the training data to predict if candidates extracted from the test corpus could be idiomatic, thanks to a naive Bayes classifier. We report results for 19 languages.

2017

pdf bib
Projecting Multiword Expression Resources on a Polish Treebank
Agata Savary | Jakub Waszczuk
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

Multiword expressions (MWEs) are linguistic objects containing two or more words and showing idiosyncratic behavior at different levels. Treebanks with annotated MWEs enable studies of such properties, as well as training and evaluation of MWE-aware parsers. However, few treebanks contain full-fledged MWE annotations. We show how this gap can be bridged in Polish by projecting 3 MWE resources on a constituency treebank.

pdf bib
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Stella Markantonatou | Carlos Ramisch | Agata Savary | Veronika Vincze
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

pdf bib
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.

pdf bib
Multiword Expression-Aware A* TAG Parsing Revisited
Jakub Waszczuk | Agata Savary | Yannick Parmentier
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

pdf bib
Temporal@ODIL project: Adapting ISO-TimeML to syntactic treebanks for the temporal annotation of spoken speech
Jean-Yves Antoine | Jakub Wasczuk | Anaïs Lefeuvre-Haftermeyer | Lotfi Abouda | Emmanuel Schang | Agata Savary
Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13)

pdf bib
Literal readings of multiword expressions: as scarce as hen’s teeth
Agata Savary | Silvio Ricardo Cordeiro
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf bib
Annotation d’expressions polylexicales verbales en français (Annotation of verbal multiword expressions in French)
Marie Candito | Mathieu Constant | Carlos Ramisch | Agata Savary | Yannick Parmentier | Caroline Pasquer | Jean-Yves Antoine
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Nous décrivons la partie française des données produites dans le cadre de la campagne multilingue PARSEME sur l’identification d’expressions polylexicales verbales (Savary et al., 2017). Les expressions couvertes pour le français sont les expressions verbales idiomatiques, les verbes intrinsèquement pronominaux et une généralisation des constructions à verbe support. Ces phénomènes ont été annotés sur le corpus French-UD (Nivre et al., 2016) et le corpus Sequoia (Candito & Seddah, 2012), soit un corpus de 22 645 phrases, pour un total de 4 962 expressions annotées. On obtient un ratio d’une expression annotée tous les 100 tokens environ, avec un fort taux d’expressions discontinues (40%).

2016

pdf bib
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Diana Bogantes | Eric Rodríguez | Alejandro Arauco | Alejandro Rodríguez | Agata Savary
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes a pilot study in lexical encoding of multi-word expressions (MWEs) in 4 Latin American dialects of Spanish: Costa Rican, Colombian, Mexican and Peruvian. We describe the variability of MWE usage across dialects. We adapt an existing data model to a dialect-aware encoding, so as to represent dialect-related specificities, while avoiding redundancy of the data common for all dialects. A dozen of linguistic properties of MWEs can be expressed in this model, both on the level of a whole MWE and of its individual components. We describe the resulting lexical resource containing several dozens of MWEs in four dialects and we propose a method for constructing a web corpus as a support for crowdsourcing examples of MWE occurrences. The resource is available under an open license and paves the way towards a large-scale dialect-aware language resource construction, which should prove useful in both traditional and novel NLP applications.

pdf bib
PARSEME Survey on MWE Resources
Gyri Smørdal Losnegaard | Federico Sangati | Carla Parra Escartín | Agata Savary | Sascha Bargmann | Johanna Monti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multiword datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult to find. In many cases, language resources such as corpora, treebanks, or lexical databases include multiwords as part of their data or take them into account in their annotations. However, these resources need to be centralized to make them accessible. The aim of this survey is to create a portal where researchers can easily find multiword(-aware) language resources for their research. We report on the design of the survey and analyze the data gathered so far. We also discuss the problems we have detected upon examination of the data as well as possible ways of enhancing the survey.

pdf bib
MWEs in Treebanks: From Survey to Guidelines
Victoria Rosén | Koenraad De Smedt | Gyri Smørdal Losnegaard | Eduard Bejček | Agata Savary | Petya Osenova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

By means of an online survey, we have investigated ways in which various types of multiword expressions are annotated in existing treebanks. The results indicate that there is considerable variation in treatments across treebanks and thereby also, to some extent, across languages and across theoretical frameworks. The comparison is focused on the annotation of light verb constructions and verbal idioms. The survey shows that the light verb constructions either get special annotations as such, or are treated as ordinary verbs, while VP idioms are handled through different strategies. Based on insights from our investigation, we propose some general guidelines for annotating multiword expressions in treebanks. The recommendations address the following application-based needs: distinguishing MWEs from similar but compositional constructions; searching distinct types of MWEs in treebanks; awareness of literal and nonliteral meanings; and normalization of the MWE representation. The cross-lingually and cross-theoretically focused survey is intended as an aid to accessing treebanks and an aid for further work on treebank annotation.

pdf bib
Covering various Needs in Temporal Annotation: a Proposal of Extension of ISO TimeML that Preserves Upward Compatibility
Anaïs Lefeuvre-Halftermeyer | Jean-Yves Antoine | Alain Couillault | Emmanuel Schang | Lotfi Abouda | Agata Savary | Denis Maurel | Iris Eshkol | Delphine Battistelli
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports a critical analysis of the ISO TimeML standard, in the light of several experiences of temporal annotation that were conducted on spoken French. It shows that the norm suffers from weaknesses that should be corrected to fit a larger variety of needs inNLP and in corpus linguistics. We present our proposition of some improvements of the norm before it will be revised by the ISO Committee in 2017. These modifications concern mainly (1) Enrichments of well identified features of the norm: temporal function of TIMEX time expressions, additional types for TLINK temporal relations; (2) Deeper modifications concerning the units or features annotated: clarification between time and tense for EVENT units, coherence of representation between temporal signals (the SIGNAL unit) and TIMEX modifiers (the MOD feature); (3) A recommendation to perform temporal annotation on top of a syntactic (rather than lexical) layer (temporal annotation on a treebank).

pdf bib
Promoting multiword expressions in A* TAG parsing
Jakub Waszczuk | Agata Savary | Yannick Parmentier
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Multiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives.

2014

pdf bib
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
Valia Kordoni | Markus Egg | Agata Savary | Eric Wehrli | Stefan Evert
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf bib
Polish Coreference Corpus in Numbers
Maciej Ogrodniczuk | Mateusz Kopeć | Agata Savary
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper attempts a preliminary interpretation of the occurrence of different types of linguistic constructs in the manually-annotated Polish Coreference Corpus by providing analyses of various statistical properties related to mentions, clusters and near-identity links. Among others, frequency of mentions, zero subjects and singleton clusters is presented, as well as the average mention and cluster size. We also show that some coreference clustering constraints, such as gender or number agreement, are frequently not valid in case of Polish. The need for lemmatization for automatic coreference resolution is supported by an empirical study. Correlation between cluster and mention count within a text is investigated, with short characteristics of outlier cases. We also examine this correlation in each of the 14 text domains present in the corpus and show that none of them has abnormal frequency of outlier texts regarding the cluster/mention ratio. Finally, we report on our negative experiences concerning the annotation of the near-identity relation. In the conclusion we put forward some guidelines for the future research in the area.

pdf bib
Tense and Time Annotations : a Contribution to TimeML Improvement (Annotation de la temporalité en corpus : contribution à l’amélioration de la norme TimeML) [in French]
Anaïs Lefeuvre | Jean-Yves Antoine | Agata Savary | Emmanuel Schang | Lotfi Abouda | Denis Maurel | Iris Eshkol
Proceedings of TALN 2014 (Volume 2: Short Papers)

2012

pdf bib
SEJFEK - a Lexicon and a Shallow Grammar of Polish Economic Multi-Word Units
Agata Savary | Bartosz Zaborowski | Aleksandra Krawczyk-Wieczorek | Filip Makowiecki
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

2010

pdf bib
Computational Lexicography of Multi-Word Units. How Efficient Can It Be?
Filip Graliński | Agata Savary | Monika Czerepowicka | Filip Makowiecki
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Towards the Annotation of Named Entities in the National Corpus of Polish
Agata Savary | Jakub Waszczuk | Adam Przepiórkowski
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present the named entity annotation task within the on-going project of the National Corpus of Polish. To the best of our knowledge, this is the first attempt at a large-scale corpus annotation of Polish named entities. We describe the scope and the TEI-inspired hierarchy of named entities admitted for this task, as well as the TEI-conformant multi-level stand-off annotation format. We also discuss some methodological strategies including the annotation of embedded, coordinated and discontinuous names. Our annotation platform consists of two main tools interconnected by converting facilities. A rule-based natural language processing platform SProUT is used for the automatic pre-annotation of named entities, due to the previously created Polish extraction grammars adapted to the annotation task. A customizable graphical tree editor TrEd, extended to our needs, provides an ergonomic environment for manual correction of annotations. Despite some difficult cases encountered in the early annotation phase, about 2,600 named entities in 1,800 corpus sentences have presently been annotated, which allowed to validate the project methodology and tools.

pdf bib
Détection hors contexte des émotions à partir du contenu linguistique d’énoncés oraux : le système EmoLogus
Marc Le Tallec | Jeanne Villaneau | Jean-Yves Antoine | Agata Savary | Arielle Syssau-Vaccarella
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Le projet EmotiRob, soutenu par l’agence nationale de la recherche, s’est donné pour objectif de détecter des émotions dans un contexte d’application original : la réalisation d’un robot compagnon émotionnel pour des enfants fragilisés. Nous présentons dans cet article le système qui caractérise l’émotion induite par le contenu linguistique des propos de l’enfant. Il se base sur un principe de compositionnalité des émotions, avec une valeur émotionnelle fixe attribuée aux mots lexicaux, tandis que les verbes et les adjectifs agissent comme des fonctions dont le résultat dépend de la valeur émotionnelle de leurs arguments. L’article présente la méthode de calcul utilisée, ainsi que la norme lexicale émotionnelle correspondante. Une analyse quantitative et qualitative des premières expérimentations présente les différences entre les sorties du module de détection et l’annotation d’experts, montrant des résultats satisfaisants, avec la bonne détection de la valence émotionnelle dans plus de 90% des cas.

2009

pdf bib
Détection des émotions à partir du contenu linguistique d’énoncés oraux : application à un robot compagnon pour enfants fragilisés
Marc Le Tallec | Jeanne Villaneau | Jean-Yves Antoine | Agata Savary | Arielle Syssau-Vaccarella
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Le projet ANR Emotirob aborde la question de la détection des émotions sous un cadre original : concevoir un robot compagnon émotionnel pour enfants fragilisés. Notre approche consiste à combiner détection linguistique et prosodie. Nos expériences montrent qu’un sujet humain peut estimer de manière fiable la valence émotionnelle d’un énoncé à partir de son contenu propositionnel. Nous avons donc développé un premier modèle de détection linguistique qui repose sur le principe de compositionnalité des émotions : les mots simples ont une valence émotionnelle donnée et les prédicats modifient la valence de leurs arguments. Après une description succincte du système logique de compréhension dont les sorties sont utilisées pour le calcul global de l’émotion, cet article présente la construction d’une norme émotionnelle lexicale de référence, ainsi que d’une ontologie de classes émotionnelles de prédicats, pour des enfants de 5 et 7 ans.
Search
Co-authors