Chloé Clavel


2022

pdf bib
"You might think about slightly revising the title”: Identifying Hedges in Peer-tutoring Interactions
Yann Raphalen | Chloé Clavel | Justine Cassell
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Hedges have an important role in the management of rapport. In peer-tutoring, they are notably used by tutors in dyads experiencing low rapport to tone down the impact of instructions and negative feedback.Pursuing the objective of building a tutoring agent that manages rapport with teenagers in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of a such a hybrid model approach.

2021

pdf bib
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo | Chloé Clavel | Chouchang Yack | Giovanna Varni
Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf bib
From local hesitations to global impressions of a speaker’s feeling of knowing
Tanvi Dinkar | Beatrice Biancardi | Chloé Clavel
Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf bib
Meta-learning for Classifying Previously Unseen Data Source into Previously Unseen Emotional Categories
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

In this paper, we place ourselves in a classification scenario in which the target classes and data type are not accessible during training. We use a meta-learning approach to determine whether or not meta-trained information from common social network data with fine-grained emotion labels can achieve competitive performance on messages labeled with different emotion categories. We leverage few-shot learning to match with the classification scenario and consider metric learning based meta-learning by setting up Prototypical Networks with a Transformer encoder, trained in an episodic fashion. This approach proves to be effective for capturing meta-information from a source emotional tag set to predict previously unseen emotional tags. Even though shifting the data type triggers an expected performance drop, our meta-learning approach achieves decent results when compared to the fully supervised one.

pdf bib
A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations
Pierre Colombo | Pablo Piantanida | Chloé Clavel
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Learning disentangled representations of textual data is essential for many natural language tasks such as fair classification, style transfer and sentence generation, among others. The existent dominant approaches in the context of text data either rely on training an adversary (discriminator) that aims at making attribute values difficult to be inferred from the latent code or rely on minimising variational bounds of the mutual information between latent code and the value attribute. However, the available methods suffer of the impossibility to provide a fine-grained control of the degree (or force) of disentanglement. In contrast to adversarial methods, which are remarkably simple, although the adversary seems to be performing perfectly well during the training phase, after it is completed a fair amount of information about the undesired attribute still remains. This paper introduces a novel variational upper bound to the mutual information between an attribute and the latent code of an encoder. Our bound aims at controlling the approximation error via the Renyi’s divergence, leading to both better disentangled representations and in particular, a precise control of the desirable degree of disentanglement than state-of-the-art methods proposed for textual data. Furthermore, it does not suffer from the degeneracy of other losses in multi-class scenarios. We show the superiority of this method on fair classification and on textual style transfer tasks. Additionally, we provide new insights illustrating various trade-offs in style transfer when attempting to learn disentangled representations and quality of the generated sentence.

pdf bib
Méta-apprentissage : classification de messages en catégories émotionnelles inconnues en entraînement (Meta-learning : Classifying Messages into Unseen Emotional Categories)
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Dans cet article nous reproduisons un scénario d’apprentissage selon lequel les données cibles ne sont pas accessibles et seules des données connexes le sont. Nous utilisons une approche par méta-apprentissage afin de déterminer si les méta-informations apprises à partir de messages issus de médias sociaux, finement annotés en émotions, peuvent produire de bonnes performances une fois utilisées sur des messages issus de conversations, étiquetés en émotions avec une granularité différente. Nous mettons à profit l’apprentissage sur quelques exemples (few-shot learning) pour la mise en place de ce scénario. Cette approche se montre efficace pour capturer les méta-informations d’un jeu d’étiquettes émotionnelles pour prédire des étiquettes jusqu’alors inconnues au modèle. Bien que le fait de varier le type de données engendre une baisse de performance, notre approche par méta-apprentissage atteint des résultats décents comparés au référentiel d’apprentissage supervisé.

pdf bib
Improving Multimodal fusion via Mutual Dependency Maximisation
Pierre Colombo | Emile Chapuis | Matthieu Labeau | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multimodal sentiment analysis is a trending area of research, and multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as L1 or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to 4.3 on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: CMU-MOSI and CMU-MOSEI. Our method not only achieves a new SOTA on both datasets but also produces representations that are more robust to modality drops. Finally, a by-product of our methods includes a statistical network which can be used to interpret the high dimensional representations learnt by the model.

pdf bib
Few-Shot Emotion Recognition in Conversation with Sequential Prototypical Networks
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Several recent studies on dyadic human-human interactions have been done on conversations without specific business objectives. However, many companies might benefit from studies dedicated to more precise environments such as after sales services or customer satisfaction surveys. In this work, we place ourselves in the scope of a live chat customer service in which we want to detect emotions and their evolution in the conversation flow. This context leads to multiple challenges that range from exploiting restricted, small and mostly unlabeled datasets to finding and adapting methods for such context. We tackle these challenges by using Few-Shot Learning while making the hypothesis it can serve conversational emotion classification for different languages and sparse labels. We contribute by proposing a variation of Prototypical Networks for sequence labeling in conversation that we name ProtoSeq. We test this method on two datasets with different languages: daily conversations in English and customer service chat conversations in French. When applied to emotion classification in conversations, our method proved to be competitive even when compared to other ones.

pdf bib
Code-switched inspired losses for spoken dialog representations
Pierre Colombo | Emile Chapuis | Matthieu Labeau | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Spoken dialogue systems need to be able to handle both multiple languages and multilinguality inside a conversation (e.g in case of code-switching). In this work, we introduce new pretraining losses tailored to learn generic multilingual spoken dialogue representations. The goal of these losses is to expose the model to code-switched language. In order to scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from OpenSubtitles, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on MIAM, a new benchmark composed of five dialogue act corpora on the same aforementioned languages as well as on two novel multilingual tasks (i.e multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new losses achieve a better performance in both monolingual and multilingual settings.

pdf bib
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Pierre Colombo | Guillaume Staerman | Chloé Clavel | Pablo Piantanida
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

A new metric BaryScore to evaluate text generation based on deep contextualized embeddings (e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions (e.g., MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that BaryScore outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.

2020

pdf bib
Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
Emile Chapuis | Pierre Colombo | Matteo Manica | Matthieu Labeau | Chloé Clavel
Findings of the Association for Computational Linguistics: EMNLP 2020

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (SILICONE). SILICONE is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pre-training objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over 2.3 billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.

pdf bib
Multimodal Analysis of Cohesion in Multi-party Interactions
Reshmashree Bangalore Kantharaju | Caroline Langlet | Mukesh Barange | Chloé Clavel | Catherine Pelachaud
Proceedings of the 12th Language Resources and Evaluation Conference

Group cohesion is an emergent phenomenon that describes the tendency of the group members’ shared commitment to group tasks and the interpersonal attraction among them. This paper presents a multimodal analysis of group cohesion using a corpus of multi-party interactions. We utilize 16 two-minute segments annotated with cohesion from the AMI corpus. We define three layers of modalities: non-verbal social cues, dialogue acts and interruptions. The initial analysis is performed at the individual level and later, we combine the different modalities to observe their impact on perceived level of cohesion. Results indicate that occurrence of laughter and interruption are higher in high cohesive segments. We also observe that, dialogue acts and head nods did not have an impact on the level of cohesion by itself. However, when combined there was an impact on the perceived level of cohesion. Overall, the analysis shows that multimodal cues are crucial for accurate analysis of group cohesion.

pdf bib
The POTUS Corpus, a Database of Weekly Addresses for the Study of Stance in Politics and Virtual Agents
Thomas Janssoone | Kévin Bailly | Gaël Richard | Chloé Clavel
Proceedings of the 12th Language Resources and Evaluation Conference

One of the main challenges in the field of Embodied Conversational Agent (ECA) is to generate socially believable agents. The common strategy for agent behaviour synthesis is to rely on dedicated corpus analysis. Such a corpus is composed of multimedia files of socio-emotional behaviors which have been annotated by external observers. The underlying idea is to identify interaction information for the agent’s socio-emotional behavior by checking whether the intended socio-emotional behavior is actually perceived by humans. Then, the annotations can be used as learning classes for machine learning algorithms applied to the social signals. This paper introduces the POTUS Corpus composed of high-quality audio-video files of political addresses to the American people. Two protagonists are present in this database. First, it includes speeches of former president Barack Obama to the American people. Secondly, it provides videos of these same speeches given by a virtual agent named Rodrigue. The ECA reproduces the original address as closely as possible using social signals automatically extracted from the original one. Both are annotated for social attitudes, providing information about the stance observed in each file. It also provides the social signals automatically extracted from Obama’s addresses used to generate Rodrigue’s ones.

pdf bib
The importance of fillers for text representations of speech transcripts
Tanvi Dinkar | Pierre Colombo | Matthieu Labeau | Chloé Clavel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While being an essential component of spoken language, fillers (e.g. “um” or “uh”) often remain overlooked in Spoken Language Understanding (SLU) tasks. We explore the possibility of representing them with deep contextualised embeddings, showing improvements on modelling spoken language and two downstream tasks — predicting a speaker’s stance and expressed confidence.

2019

pdf bib
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining
Alexandre Garcia | Pierre Colombo | Florence d’Alché-Buc | Slim Essid | Chloé Clavel
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The task of predicting fine grained user opinion based on spontaneous spoken language is a key problem arising in the development of Computational Agents as well as in the development of social network based opinion miners. Unfortunately, gathering reliable data on which a model can be trained is notoriously difficult and existing works rely only on coarsely labeled opinions. In this work we aim at bridging the gap separating fine grained opinion models already developed for written language and coarse grained models developed for spontaneous multimodal opinion mining. We take advantage of the implicit hierarchical structure of opinions to build a joint fine and coarse grained opinion model that exploits different views of the opinion expression. The resulting model shares some properties with attention-based models and is shown to provide competitive results on a recently released multimodal fine grained annotated corpus.

2017

pdf bib
Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction
Guillaume Dubuisson Duplessis | Chloé Clavel | Frédéric Landragin
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

This work aims at characterising verbal alignment processes for improving virtual agent communicative capabilities. We propose computationally inexpensive measures of verbal alignment based on expression repetition in dyadic textual dialogues. Using these measures, we present a contrastive study between Human-Human and Human-Agent dialogues on a negotiation task. We exhibit quantitative differences in the strength and orientation of verbal alignment showing the ability of our approach to characterise important aspects of verbal alignment.

2015

pdf bib
Improving social relationships in face-to-face human-agent interactions: when the agent wants to know user’s likes and dislikes
Caroline Langlet | Chloé Clavel
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Modelling agent’s questions for analysing user’s affects, appreciations and judgements in human-agent interaction (Modélisation des questions de l’agent pour l’analyse des affects, jugements et appréciations de l’utilisateur dans les interactions humain-agent) [in French]
Caroline Langlet | Chloé Clavel
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
Comparative analysis of verbal alignment in human-human and human-agent interactions
Sabrina Campano | Jessica Durand | Chloé Clavel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Engagement is an important feature in human-human and human-agent interaction. In this paper, we investigate lexical alignment as a cue of engagement, relying on two different corpora : CID and SEMAINE. Our final goal is to build a virtual conversational character that could use alignment strategies to maintain user’s engagement. To do so, we investigate two alignment processes : shared vocabulary and other-repetitions. A quantitative and qualitative approach is proposed to characterize these aspects in human-human (CID) and human-operator (SEMAINE) interactions. Our results show that these processes are observable in both corpora, indicating a stable pattern that can be further modelled in conversational agents.

2012

pdf bib
Quel est l’apport de la détection d’entités nommées pour l’extraction d’information en domaine restreint ? (What is the contribution of named entities detection for information extraction in restricted domain ?) [in French]
Camille Dutrey | Chloé Clavel | Sophie Rosset | Ioana Vasilescu | Martine Adda-Decker
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

2010

pdf bib
L’apport des concepts métiers pour la classification des questions ouvertes d’enquête
Ludivine Kuznik | Anne-Laure Guénet | Anne Peradotto | Chloé Clavel
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

EDF utilise les techniques de Text Mining pour optimiser sa relation client, en analysant des réponses aux questions ouvertes d’enquête de satisfaction, et des retranscriptions de conversations issues des centres d’appels. Dans cet article, nous présentons les différentes contraintes applicatives liées à l’utilisation d’outils de text mining pour l’analyse de données clients. Après une analyse des différents outils présents sur le marché, nous avons identifié la technologie Skill CartridgeTM fournie par la société TEMIS comme la plus adaptée à nos besoins. Cette technologie nous permet une modélisation sémantique de concepts liés au motif d’insatisfaction. L’apport de cette modélisation est illustrée pour une tâche de classification de réponses d’enquêtes de satisfaction chargée d’évaluer la fidélité des clients EDF. La modélisation sémantique a permis une nette amélioration des scores de classification (F-mesure = 75,5%) notamment pour les catégories correspondant à la satisfaction et au mécontentement.

2006

pdf bib
Fear-type emotions of the SAFE Corpus: annotation issues
Chloé Clavel | Ioana Vasilescu | Laurence Devillers | Thibaut Ehrette | Gaël Richard
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The present research focuses on annotation issues in the context of the acoustic detection of fear-type emotions for surveillance applications. The emotional speech material used for this study comes from the previously collected SAFE Database (Situation Analysis in a Fictional and Emotional Database) which consists of audio-visual sequences extracted from movie fictions. A generic annotation scheme was developed to annotate the various emotional manifestations contained in the corpus. The annotation was carried out by two labellers and the two annotations strategies are confronted. It emerges that the borderline between emotion and neutral vary according to the labeller. An acoustic validation by a third labeller allows at analysing the two strategies. Two human strategies are then observed: a first one, context-oriented which mixes audio and contextual (video) information in emotion categorization; and a second one, based mainly on audio information. The k-means clustering confirms the role of audio cues in human annotation strategies. It particularly helps in evaluating those strategies from the point of view of a detection system based on audio cues.