Seyed Mahed Mousavi


2023

pdf bib
What’s New? Identifying the Unfolding of New Events in a Narrative
Seyed Mahed Mousavi | Shohei Tanaka | Gabriel Roccabruna | Koichiro Yoshino | Satoshi Nakamura | Giuseppe Riccardi
Proceedings of the 5th Workshop on Narrative Understanding

Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.

pdf bib
Understanding Emotion Valence is a Joint Deep Learning Task
Gabriel Roccabruna | Seyed Mahed Mousavi | Giuseppe Riccardi
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

The valence analysis of speakers’ utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task, two-step, and joint settings for the valence and EC prediction tasks. We compare and evaluate the performance of generative (GPT-2) and discriminative (BERT) architectures in each setting. We observed that providing the ground truth label of one task improves the prediction performance of the models in the other task. We further observed that the discriminative model achieves the best trade-off of valence and EC prediction tasks in the joint prediction setting. As a result, we attain a single model that performs both tasks, thus, saving computation resources at training and inference times.

pdf bib
Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?
Seyed Mahed Mousavi | Simone Caldarella | Giuseppe Riccardi
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)

Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.

2022

pdf bib
Evaluation of Response Generation Models: Shouldn’t It Be Shareable and Replicable?
Seyed Mahed Mousavi | Gabriel Roccabruna | Michela Lorandi | Simone Caldarella | Giuseppe Riccardi
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Human Evaluation (HE) of automatically generated responses is necessary for the advancement of human-machine dialogue research. Current automatic evaluation measures are poor surrogates, at best. There are no agreed-upon HE protocols and it is difficult to develop them. As a result, researchers either perform non-replicable, non-transparent and inconsistent procedures or, worse, limit themselves to automated metrics. We propose to standardize the human evaluation of response generation models by publicly sharing a detailed protocol. The proposal includes the task design, annotators recruitment, task execution, and annotation reporting. Such protocol and process can be used as-is, as-a-whole, in-part, or modified and extended by the research community. We validate the protocol by evaluating two conversationally fine-tuned state-of-the-art models (GPT-2 and T5) for the complex task of personalized response generation. We invite the community to use this protocol - or its future community amended versions - as a transparent, replicable, and comparable approach to HE of generated responses.

pdf bib
Can Emotion Carriers Explain Automatic Sentiment Prediction? A Study on Personal Narratives
Seyed Mahed Mousavi | Gabriel Roccabruna | Aniruddha Tammewar | Steve Azzolin | Giuseppe Riccardi
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Deep Neural Networks (DNN) models have achieved acceptable performance in sentiment prediction of written text. However, the output of these machine learning (ML) models cannot be natively interpreted. In this paper, we study how the sentiment polarity predictions by DNNs can be explained and compare them to humans’ explanations. We crowdsource a corpus of Personal Narratives and ask human judges to annotate them with polarity and select the corresponding token chunks - the Emotion Carriers (EC) - that convey narrators’ emotions in the text. The interpretations of ML neural models are carried out through Integrated Gradients method and we compare them with human annotators’ interpretations. The results of our comparative analysis indicate that while the ML model mostly focuses on the explicit appearance of emotions-laden words (e.g. happy, frustrated), the human annotator predominantly focuses the attention on the manifestation of emotions through ECs that denote events, persons, and objects which activate narrator’s emotional state.

2021

pdf bib
Would you like to tell me more? Generating a corpus of psychotherapy dialogues
Seyed Mahed Mousavi | Alessandra Cervone | Morena Danieli | Giuseppe Riccardi
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

The acquisition of a dialogue corpus is a key step in the process of training a dialogue model. In this context, corpora acquisitions have been designed either for open-domain information retrieval or slot-filling (e.g. restaurant booking) tasks. However, there has been scarce research in the problem of collecting personal conversations with users over a long period of time. In this paper we focus on the types of dialogues that are required for mental health applications. One of these types is the follow-up dialogue that a psychotherapist would initiate in reviewing the progress of a Cognitive Behavioral Therapy (CBT) intervention. The elicitation of the dialogues is achieved through textual stimuli presented to dialogue writers. We propose an automatic algorithm that generates textual stimuli from personal narratives collected during psychotherapy interventions. The automatically generated stimuli are presented as a seed to dialogue writers following principled guidelines. We analyze the linguistic quality of the collected corpus and compare the performances of psychotherapists and non-expert dialogue writers. Moreover, we report the human evaluation of a corpus-based response-selection model.