Giuseppe Riccardi - ACL Anthology

Giuseppe Riccardi

2026

Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It
Seyed Mahed Mousavi | Edoardo Cecchinato | Lucia Horníková | Giuseppe Riccardi
Findings of the Association for Computational Linguistics: EACL 2026

We conduct a systematic audit of three widely used social reasoning benchmarks, SocialIQa, FauxPas-EAI, and ToMi, and uncover pervasive flaws in both benchmark items and evaluation methodology. Using five LLMs (GPT-3, 3.5, 4, o1, and LLaMA 3.1) as diagnostic tools, we identify structural, semantic, and pragmatic issues in benchmark design (e.g., duplicated items, ambiguous wording, and implausible answers), as well as scoring procedures that prioritize output form over the reasoning process. Through systematic human annotation and re-evaluation on cleaned benchmark subsets, we find that model scores often improve not due to due to erratic surface wording variations and not to improved reasoning. In fact, further analyses show that model performance is highly sensitive to minor input variations such as context availability and phrasing, revealing that high scores may reflect alignment with format-specific cues rather than consistent inference based on the input. These findings challenge the validity of current benchmark-based claims about social reasoning in LLMs, and highlight the need for evaluation protocols that assess reasoning as a process of drawing inference from available information, rather than as static output selection. We release audited data and evaluation tools to support more interpretable and diagnostic assessments of model reasoning

2025

Can LLMs Help Recollect and Elaborate on Our Personal Experiences?
Gabriel Roccabruna | Olha Khomyn | Michele Yin | Giuseppe Riccardi
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

CIVET: Systematic Evaluation of Understanding in VLMs
Massimo Rizzoli | Simone Alghisi | Olha Khomyn | Gabriel Roccabruna | Seyed Mahed Mousavi | Giuseppe Riccardi
Findings of the Association for Computational Linguistics: EMNLP 2025

While Vision-Language Models (VLMs) have achieved competitive performance in various tasks, their comprehension of the underlying structure and semantics of a scene remains understudied. To investigate the understanding of VLMs, we study their capability regarding object properties and relations in a controlled and interpretable manner. To this scope, we introduce CIVET, a novel and extensible framework for systematiC evaluatIon Via controllEd sTimuli. CIVET addresses the lack of standardized systematic evaluation for assessing VLMs’ understanding, enabling researchers to test hypotheses with statistical rigor. With CIVET, we evaluate five state-of-the-art VLMs on exhaustive sets of stimuli, free from annotation noise, dataset-specific biases, and uncontrolled scene complexity. Our findings reveal that 1) current VLMs can accurately recognize only a limited set of basic object properties; 2) their performance heavily depends on the position of the object in the scene; 3) they struggle to understand basic relations among objects. Furthermore, a comparative evaluation with human annotators reveals that VLMs still fall short of achieving human-level accuracy.

2024

Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?
Gabriel Roccabruna | Massimo Rizzoli | Giuseppe Riccardi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The automatic detection of temporal relations among events has been mainly investigated with encoder-only models such as RoBERTa. Large Language Models (LLM) have recently shown promising performance in temporal reasoning tasks such as temporal question answering. Nevertheless, recent studies have tested the LLMs’ performance in detecting temporal relations of closed-source models only, limiting the interpretability of those results. In this work, we investigate LLMs’ performance and decision process in the Temporal Relation Classification task. First, we assess the performance of seven open and closed-sourced LLMs experimenting with in-context learning and lightweight fine-tuning approaches. Results show that LLMs with in-context learning significantly underperform smaller encoder-only models based on RoBERTa. Then, we delve into the possible reasons for this gap by applying explainable methods. The outcome suggests a limitation of LLMs in this task due to their autoregressive nature, which causes them to focus only on the last part of the sequence. Additionally, we evaluate the word embeddings of these two models to better understand their pre-training differences. The code and the fine-tuned models can be found respectively on GitHub.

DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
Seyed Mahed Mousavi | Simone Alghisi | Giuseppe Riccardi
Findings of the Association for Computational Linguistics: EMNLP 2024

LLMs acquire knowledge from massive data snapshots collected at different timestamps. Their knowledge is then commonly evaluated using static benchmarks. However, factual knowledge is generally subject to time-sensitive changes, and static benchmarks cannot address those cases. We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata, a publicly available up-to-date knowledge graph. We evaluate the time-sensitive knowledge in twenty-four private and open-source LLMs, as well as the effectiveness of four editing methods in updating the outdated facts. Our results show that 1) outdatedness is a critical problem across state-of-the-art LLMs; 2) LLMs output inconsistent answers when prompted with slight variations of the question prompt; and 3) the performance of the state-of-the-art knowledge editing algorithms is very limited, as they can not reduce the cases of outdatedness and output inconsistency.

Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
Simone Alghisi | Massimo Rizzoli | Gabriel Roccabruna | Seyed Mahed Mousavi | Giuseppe Riccardi
Proceedings of the 17th International Natural Language Generation Conference

We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LLM adaptation techniques when applied to different dialogue types. We have selected two base LLMs, Llama-2 and Mistral, and four dialogue types Open-Domain, Knowledge-Grounded, Task-Oriented, and Question Answering. We evaluate the performance of in-context learning and fine-tuning techniques across datasets selected for each dialogue type. We assess the impact of incorporating external knowledge to ground the generation in both scenarios of Retrieval-Augmented Generation (RAG) and gold knowledge. We adopt consistent evaluation and explainability criteria for automatic metrics and human evaluation protocols. Our analysis shows that there is no universal best-technique for adapting large language models as the efficacy of each technique depends on both the base LLM and the specific type of dialogue. Last but not least, the assessment of the best adaptation technique should include human evaluation to avoid false expectations and outcomes derived from automatic metrics.

2023

Understanding Emotion Valence is a Joint Deep Learning Task
Gabriel Roccabruna | Seyed Mahed Mousavi | Giuseppe Riccardi
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

The valence analysis of speakers’ utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task, two-step, and joint settings for the valence and EC prediction tasks. We compare and evaluate the performance of generative (GPT-2) and discriminative (BERT) architectures in each setting. We observed that providing the ground truth label of one task improves the prediction performance of the models in the other task. We further observed that the discriminative model achieves the best trade-off of valence and EC prediction tasks in the joint prediction setting. As a result, we attain a single model that performs both tasks, thus, saving computation resources at training and inference times.

What’s New? Identifying the Unfolding of New Events in a Narrative
Seyed Mahed Mousavi | Shohei Tanaka | Gabriel Roccabruna | Koichiro Yoshino | Satoshi Nakamura | Giuseppe Riccardi
Proceedings of the 5th Workshop on Narrative Understanding

Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.

Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?
Seyed Mahed Mousavi | Simone Caldarella | Giuseppe Riccardi
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)

Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.

2022

Can Emotion Carriers Explain Automatic Sentiment Prediction? A Study on Personal Narratives
Seyed Mahed Mousavi | Gabriel Roccabruna | Aniruddha Tammewar | Steve Azzolin | Giuseppe Riccardi
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Deep Neural Networks (DNN) models have achieved acceptable performance in sentiment prediction of written text. However, the output of these machine learning (ML) models cannot be natively interpreted. In this paper, we study how the sentiment polarity predictions by DNNs can be explained and compare them to humans’ explanations. We crowdsource a corpus of Personal Narratives and ask human judges to annotate them with polarity and select the corresponding token chunks - the Emotion Carriers (EC) - that convey narrators’ emotions in the text. The interpretations of ML neural models are carried out through Integrated Gradients method and we compare them with human annotators’ interpretations. The results of our comparative analysis indicate that while the ML model mostly focuses on the explicit appearance of emotions-laden words (e.g. happy, frustrated), the human annotator predominantly focuses the attention on the manifestation of emotions through ECs that denote events, persons, and objects which activate narrator’s emotional state.

Multi-source Multi-domain Sentiment Analysis with BERT-based Models
Gabriel Roccabruna | Steve Azzolin | Giuseppe Riccardi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Sentiment analysis is one of the most widely studied tasks in natural language processing. While BERT-based models have achieved state-of-the-art results in this task, little attention has been given to its performance variability across class labels, multi-source and multi-domain corpora. In this paper, we present an improved state-of-the-art and comparatively evaluate BERT-based models for sentiment analysis on Italian corpora. The proposed model is evaluated over eight sentiment analysis corpora from different domains (social media, finance, e-commerce, health, travel) and sources (Twitter, YouTube, Facebook, Amazon, Tripadvisor, Opera and Personal Healthcare Agent) on the prediction of positive, negative and neutral classes. Our findings suggest that BERT-based models are confident in predicting positive and negative examples but not as much with neutral examples. We release the sentiment analysis model as well as a newly financial domain sentiment corpus.

Annotation of Valence Unfolding in Spoken Personal Narratives
Aniruddha Tammewar | Franziska Braun | Gabriel Roccabruna | Sebastian Bayerl | Korbinian Riedhammer | Giuseppe Riccardi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Personal Narrative (PN) is the recollection of individuals’ life experiences, events, and thoughts along with the associated emotions in the form of a story. Compared to other genres such as social media texts or microblogs, where people write about experienced events or products, the spoken PNs are complex to analyze and understand. They are usually long and unstructured, involving multiple and related events, characters as well as thoughts and emotions associated with events, objects, and persons. In spoken PNs, emotions are conveyed by changing the speech signal characteristics as well as the lexical content of the narrative. In this work, we annotate a corpus of spoken personal narratives, with the emotion valence using discrete values. The PNs are segmented into speech segments, and the annotators annotate them in the discourse context, with values on a 5-point bipolar scale ranging from -2 to +2 (0 for neutral). In this way, we capture the unfolding of the PNs events and changes in the emotional state of the narrator. We perform an in-depth analysis of the inter-annotator agreement, the relation between the label distribution w.r.t. the stimulus (positive/negative) used for the elicitation of the narrative, and compare the segment-level annotations to a baseline continuous annotation. We find that the neutral score plays an important role in the agreement. We observe that it is easy to differentiate the positive from the negative valence while the confusion with the neutral label is high. Keywords: Personal Narratives, Emotion Annotation, Segment Level Annotation

Evaluation of Response Generation Models: Shouldn’t It Be Shareable and Replicable?
Seyed Mahed Mousavi | Gabriel Roccabruna | Michela Lorandi | Simone Caldarella | Giuseppe Riccardi
Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Human Evaluation (HE) of automatically generated responses is necessary for the advancement of human-machine dialogue research. Current automatic evaluation measures are poor surrogates, at best. There are no agreed-upon HE protocols and it is difficult to develop them. As a result, researchers either perform non-replicable, non-transparent and inconsistent procedures or, worse, limit themselves to automated metrics. We propose to standardize the human evaluation of response generation models by publicly sharing a detailed protocol. The proposal includes the task design, annotators recruitment, task execution, and annotation reporting. Such protocol and process can be used as-is, as-a-whole, in-part, or modified and extended by the research community. We validate the protocol by evaluating two conversationally fine-tuned state-of-the-art models (GPT-2 and T5) for the complex task of personalized response generation. We invite the community to use this protocol - or its future community amended versions - as a transparent, replicable, and comparable approach to HE of generated responses.

2021

An Unsupervised Approach to Extract Life-Events from Personal Narratives in the Mental Health Domain
Seyed Mahed Mousavi | Roberto Negro | Giuseppe Riccardi
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

Would you like to tell me more? Generating a corpus of psychotherapy dialogues
Seyed Mahed Mousavi | Alessandra Cervone | Morena Danieli | Giuseppe Riccardi
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

The acquisition of a dialogue corpus is a key step in the process of training a dialogue model. In this context, corpora acquisitions have been designed either for open-domain information retrieval or slot-filling (e.g. restaurant booking) tasks. However, there has been scarce research in the problem of collecting personal conversations with users over a long period of time. In this paper we focus on the types of dialogues that are required for mental health applications. One of these types is the follow-up dialogue that a psychotherapist would initiate in reviewing the progress of a Cognitive Behavioral Therapy (CBT) intervention. The elicitation of the dialogues is achieved through textual stimuli presented to dialogue writers. We propose an automatic algorithm that generates textual stimuli from personal narratives collected during psychotherapy interventions. The automatically generated stimuli are presented as a seed to dialogue writers following principled guidelines. We analyze the linguistic quality of the collected corpus and compare the performances of psychotherapists and non-expert dialogue writers. Moreover, we report the human evaluation of a corpus-based response-selection model.

2020

Annotation of Emotion Carriers in Personal Narratives
Aniruddha Tammewar | Alessandra Cervone | Eva-Maria Messner | Giuseppe Riccardi
Proceedings of the Twelfth Language Resources and Evaluation Conference

We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts. For PNs, we define emotion carriers as the speech or text segments that best explain the emotional state of the narrator. Such segments may span from single to multiple words, containing for example verb or noun phrases. Advanced automatic understanding of PNs requires not only the prediction of the narrator’s emotional state but also to identify which events (e.g. the loss of a relative or the visit of grandpa) or people (e.g. the old group of high school mates) carry the emotion manifested during the personal recollection. This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives. Compared to other text genres such as news and microblogs, spoken PNs are particularly challenging because a narrative is usually unstructured, involving multiple sub-events and characters as well as thoughts and associated emotions perceived by the narrator. In this work, we experiment with annotating emotion carriers in speech transcriptions from the Ulm State-of-Mind in Speech (USoMS) corpus, a dataset of PNs in German. We believe this resource could be used for experiments in the automatic extraction of emotion carriers from PN, a task that could provide further advancements in narrative understanding.

Multifunctional ISO Standard Dialogue Act Tagging in Italian
Gabriel Roccabruna | Alessandra Cervone | Giuseppe Riccardi
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

Is this Dialogue Coherent? Learning from Dialogue Acts and Entities
Alessandra Cervone | Giuseppe Riccardi
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-turn candidate utterances ratings are provided considering the full dialogue context. Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities previously introduced and the Dialogue Acts used. Second, we experiment with different architectures to model entities, Dialogue Acts and their combination and evaluate their performance in predicting human coherence ratings on SWBD-Coh. We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating.

2019

Affective Behaviour Analysis of On-line User Interactions: Are On-line Support Groups More Therapeutic than Twitter?
Giuliano Tortoreto | Evgeny Stepanov | Alessandra Cervone | Mateusz Dubiel | Giuseppe Riccardi
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

The increase in the prevalence of mental health problems has coincided with a growing popularity of health related social networking sites. Regardless of their therapeutic potential, on-line support groups (OSGs) can also have negative effects on patients. In this work we propose a novel methodology to automatically verify the presence of therapeutic factors in social networking websites by using Natural Language Processing (NLP) techniques. The methodology is evaluated on on-line asynchronous multi-party conversations collected from an OSG and Twitter. The results of the analysis indicate that therapeutic factors occur more frequently in OSG conversations than in Twitter conversations. Moreover, the analysis of OSG conversations reveals that the users of that platform are supportive, and interactions are likely to lead to the improvement of their emotional state. We believe that our method provides a stepping stone towards automatic analysis of emotional states of users of online platforms. Possible applications of the method include provision of guidelines that highlight potential implications of using such platforms on users’ mental health, and/or support in the analysis of their impact on specific individuals.

2018

ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents
Stefano Mezza | Alessandra Cervone | Evgeny Stepanov | Giuliano Tortoreto | Giuseppe Riccardi
Proceedings of the 27th International Conference on Computational Linguistics

Dialogue Act (DA) tagging is crucial for spoken language understanding systems, as it provides a general representation of speakers’ intents, not bound to a particular dialogue system. Unfortunately, publicly available data sets with DA annotation are all based on different annotation schemes and thus incompatible with each other. Moreover, their schemes often do not cover all aspects necessary for open-domain human-machine interaction. In this paper, we propose a methodology to map several publicly available corpora to a subset of the ISO standard, in order to create a large task-independent training corpus for DA classification. We show the feasibility of using this corpus to train a domain-independent DA tagger testing it on out-of-domain conversational data, and argue the importance of training on multiple corpora to achieve robustness across different DA categories.

Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development
Jacopo Gobbi | Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Automatically Predicting User Ratings for Conversational Systems
Alessandra Cervone | Enrico Gambi | Giuliano Tortoreto | Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

Irony Detection: from the Twittersphere to the News Space
Alessandra Cervone | Evgeny Stepanov | Fabio Celli | Giuseppe Riccardi
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Automatic Community Creation for Abstractive Spoken Conversations Summarization
Karan Singla | Evgeny Stepanov | Ali Orkan Bayer | Giuseppe Carenini | Giuseppe Riccardi
Proceedings of the Workshop on New Frontiers in Summarization

Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation using cosine similarity on different levels of representation: raw text, WordNet SynSet IDs, and word embeddings. We show that the abstractive summarization systems with automatic communities significantly outperform previously published results on both English and Italian corpora.

Functions of Silences towards Information Flow in Spoken Conversation
Shammur Absar Chowdhury | Evgeny Stepanov | Morena Danieli | Giuseppe Riccardi
Proceedings of the Workshop on Speech-Centric Natural Language Processing

Silence is an integral part of the most frequent turn-taking phenomena in spoken conversations. Silence is sized and placed within the conversation flow and it is coordinated by the speakers along with the other speech acts. The objective of this analytical study is twofold: to explore the functions of silence with duration of one second and above, towards information flow in a dyadic conversation utilizing the sequences of dialog acts present in the turns surrounding the silence itself; and to design a feature space useful for clustering the silences using a hierarchical concept formation algorithm. The resulting clusters are manually grouped into functional categories based on their similarities. It is observed that the silence plays an important role in response preparation, also can indicate speakers’ hesitation or indecisiveness. It is also observed that sometimes long silences can be used deliberately to get a forced response from another speaker thus making silence a multi-functional and an important catalyst towards information flow.

2016

Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
Morena Danieli | Balamurali A R | Evgeny Stepanov | Benoit Favre | Frederic Bechet | Giuseppe Riccardi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Annotating and predicting behavioural aspects in conversations is becoming critical in the conversational analytics industry. In this paper we look into inter-annotator agreement of agent behaviour dimensions on two call center corpora. We find that the task can be annotated consistently over time, but that subjectivity issues impacts the quality of the annotation. The reformulation of some of the annotated dimensions is suggested in order to improve agreement.

Transfer of Corpus-Specific Dialogue Act Annotation to ISO Standard: Is it worth it?
Shammur Absar Chowdhury | Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Spoken conversation corpora often adapt existing Dialogue Act (DA) annotation specifications, such as DAMSL, DIT++, etc., to task specific needs, yielding incompatible annotations; thus, limiting corpora re-usability. Recently accepted ISO standard for DA annotation – Dialogue Act Markup Language (DiAML) – is designed as domain and application independent. Moreover, the clear separation of dialogue dimensions and communicative functions, coupled with the hierarchical organization of the latter, allows for classification at different levels of granularity. However, re-annotating existing corpora with the new scheme might require significant effort. In this paper we test the utility of the ISO standard through comparative evaluation of the corpus-specific legacy and the semi-automatically transferred DiAML DA annotations on supervised dialogue act classification task. To test the domain independence of the resulting annotations, we perform cross-domain and data aggregation evaluation. Compared to the legacy annotation scheme, on the Italian LUNA Human-Human corpus, the DiAML annotation scheme exhibits better cross-domain and data aggregation classification performance, while maintaining comparable in-domain performance.

Do We Really Need All Those Rich Linguistic Features? A Neural Network-Based Approach to Implicit Sense Labeling
Niko Schenk | Christian Chiarcos | Kathrin Donandt | Samuel Rönnqvist | Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the CoNLL-16 shared task

Multilevel Annotation of Agreement and Disagreement in Italian News Blogs
Fabio Celli | Giuseppe Riccardi | Firoj Alam
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present a corpus of news blog conversations in Italian annotated with gold standard agreement/disagreement relations at message and sentence levels. This is the first resource of this kind in Italian. From the analysis of ADRs at the two levels emerged that agreement annotated at message level is consistent and generally reflected at sentence level, moreover, the argumentation structure of disagreement is more complex than agreement. The manual error analysis revealed that this resource is useful not only for the analysis of argumentation, but also for the detection of irony/sarcasm in online debates. The corpus and annotation tool are available for research purposes on request.

Predicting Brexit: Classifying Agreement is Better than Sentiment and Pollsters
Fabio Celli | Evgeny Stepanov | Massimo Poesio | Giuseppe Riccardi
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

On June 23rd 2016, UK held the referendum which ratified the exit from the EU. While most of the traditional pollsters failed to forecast the final vote, there were online systems that hit the result with high accuracy using opinion mining techniques and big data. Starting one month before, we collected and monitored millions of posts about the referendum from social media conversations, and exploited Natural Language Processing techniques to predict the referendum outcome. In this paper we discuss the methods used by traditional pollsters and compare it to the predictions based on different opinion mining techniques. We find that opinion mining based on agreement/disagreement classification works better than opinion mining based on polarity classification in the forecast of the referendum outcome.

How Interlocutors Coordinate with each other within Emotional Segments?
Firoj Alam | Shammur Absar Chowdhury | Morena Danieli | Giuseppe Riccardi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we aim to investigate the coordination of interlocutors behavior in different emotional segments. Conversational coordination between the interlocutors is the tendency of speakers to predict and adjust each other accordingly on an ongoing conversation. In order to find such a coordination, we investigated 1) lexical similarities between the speakers in each emotional segments, 2) correlation between the interlocutors using psycholinguistic features, such as linguistic styles, psychological process, personal concerns among others, and 3) relation of interlocutors turn-taking behaviors such as competitiveness. To study the degree of coordination in different emotional segments, we conducted our experiments using real dyadic conversations collected from call centers in which agent’s emotional state include empathy and customer’s emotional states include anger and frustration. Our findings suggest that the most coordination occurs between the interlocutors inside anger segments, where as, a little coordination was observed when the agent was empathic, even though an increase in the amount of non-competitive overlaps was observed. We found no significant difference between anger and frustration segment in terms of turn-taking behaviors. However, the length of pause significantly decreases in the preceding segment of anger where as it increases in the preceding segment of frustration.

The Social Mood of News: Self-reported Annotations to Design Automatic Mood Detection Systems
Firoj Alam | Fabio Celli | Evgeny A. Stepanov | Arindam Ghosh | Giuseppe Riccardi
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

In this paper, we address the issue of automatic prediction of readers’ mood from newspaper articles and comments. As online newspapers are becoming more and more similar to social media platforms, users can provide affective feedback, such as mood and emotion. We have exploited the self-reported annotation of mood categories obtained from the metadata of the Italian online newspaper corriere.it to design and evaluate a system for predicting five different mood categories from news articles and comments: indignation, disappointment, worry, satisfaction, and amusement. The outcome of our experiments shows that overall, bag-of-word-ngrams perform better compared to all other feature sets; however, stylometric features perform better for the mood score prediction of articles. Our study shows that self-reported annotations can be used to design automatic mood prediction systems.

UniTN End-to-End Discourse Parser for CoNLL 2016 Shared Task
Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the CoNLL-16 shared task

2015

Call Centre Conversation Summarization: A Pilot Task at Multiling 2015
Benoit Favre | Evgeny Stepanov | Jérémy Trione | Frédéric Béchet | Giuseppe Riccardi
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The UniTN Discourse Parser in CoNLL 2015 Shared Task: Token-level Sequence Labeling with Argument-specific Models
Evgeny Stepanov | Giuseppe Riccardi | Ali Orkan Bayer
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

The Development of the Multilingual LUNA Corpus for Spoken Language System Porting
Evgeny Stepanov | Giuseppe Riccardi | Ali Orkan Bayer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we address the problem of creating multilingual aligned corpora and its evaluation in the context of a spoken language understanding (SLU) porting task. We discuss the challenges of the manual creation of multilingual corpora, as well as present the algorithms for the creation of multilingual SLU via Statistical Machine Translation (SMT).

Towards Cross-Domain PDTB-Style Discourse Parsing
Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

2013

Comparative Evaluation of Argument Extraction Algorithms in Discourse Relation Parsing
Evgeny Stepanov | Giuseppe Riccardi
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

2012

Up from Limited Dialog Systems!
Giuseppe Riccardi | Philipp Cimiano | Alexandros Potamianos | Christina Unger
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

Global Features for Shallow Discourse Parsing
Sucheta Ghosh | Giuseppe Riccardi | Richard Johansson
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Improving the Recall of a Discourse Parser by Constraint-based Postprocessing
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.

2011

Shallow Discourse Parsing with Conditional Random Fields
Sucheta Ghosh | Richard Johansson | Giuseppe Riccardi | Sara Tonelli
Proceedings of 5th International Joint Conference on Natural Language Processing

Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
Alessandro Moschitti | Jennifer Chu-Carroll | Siddharth Patwardhan | James Fan | Giuseppe Riccardi
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

Cooperative User Models in Statistical Dialog Simulators
Meritxell González | Silvia Quarteroni | Giuseppe Riccardi | Sebastian Varges
Proceedings of the SIGDIAL 2010 Conference

Investigating Clarification Strategies in a Hybrid POMDP Dialog Manager
Sebastian Varges | Silvia Quarteroni | Giuseppe Riccardi | Alexei Ivanov
Proceedings of the SIGDIAL 2010 Conference

Annotation of Discourse Relations for Conversational Spoken Dialogs
Sara Tonelli | Giuseppe Riccardi | Rashmi Prasad | Aravind Joshi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we make a qualitative and quantitative analysis of discourse relations within the LUNA conversational spoken dialog corpus. In particular, we first describe the Penn Discourse Treebank (PDTB) and then we detail the adaptation of its annotation scheme to the LUNA corpus of Italian task-oriented dialogs in the domain of software/hardware assistance. We discuss similarities and differences between our approach and the PDTB paradigm and point out the peculiarities of spontaneous dialogs w.r.t. written text, which motivated some changes in the annotation strategy. In particular, we introduced the annotation of relations between non-contiguous arguments and we modified the sense hierarchy in order to take into account the important role of pragmatics in dialogs. In the final part of the paper, we present a comparison between the sense and connective frequency in a representative subset of the LUNA corpus and in the PDTB. Such analysis confirmed the differences between the two corpora and corroborates our choice to introduce dialog-specific adaptations.

Kernel-based Reranking for Named-Entity Extraction
Truc-Vien T. Nguyen | Alessandro Moschitti | Giuseppe Riccardi
Coling 2010: Posters

2009

Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction
Truc-Vien T. Nguyen | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Leveraging POMDPs Trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System
Sebastian Varges | Silvia Quarteroni | Giuseppe Riccardi | Alexei Ivanov | Pierluigi Roberti
Proceedings of the SIGDIAL 2009 Conference

Re-Ranking Models for Spoken Language Understanding
Marco Dinarelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

Shallow Semantic Parsing for Spoken Language Understanding
Bonaventura Coppola | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

Combining POMDPs trained with User Simulations and Rule-based Dialogue Management in a Spoken Dialogue System
Sebastian Varges | Silvia Quarteroni | Giuseppe Riccardi | Alexei V. Ivanov | Pierluigi Roberti
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

Re-Ranking Models Based-on Small Training Data for Spoken Language Understanding
Marco Dinarelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Annotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics
Marco Dinarelli | Silvia Quarteroni | Sara Tonelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

2008

Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues
Christian Raymond | Kepa Joseba Rodriguez | Giuseppe Riccardi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present an active approach to annotate with lexical and semantic labels an Italian corpus of conversational human-human and Wizard-of-Oz dialogues. This procedure consists in the use of a machine learner to assist human annotators in the labeling task. The computer assisted process engages human annotators to check and correct the automatic annotation rather than starting the annotation from un-annotated data. The active learning procedure is combined with an annotation error detection to control the reliablity of the annotation. With the goal of converging as fast as possible to reliable automatic annotations minimizing the human effort, we follow the active learning paradigm, which selects for annotation the most informative training examples required to achieve a better level of performance. We show that this procedure allows to quickly converge on correct annotations and thus minimize the cost of human supervision.

Persistent Information State in a Data-Centric Architecture
Sebastian Varges | Giuseppe Riccardi | Silvia Quarteroni
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

2007

Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus
Kepa Joseba Rodríguez | Stefanie Dipper | Michael Götze | Massimo Poesio | Giuseppe Riccardi | Christian Raymond | Joanna Rabiega-Wiśniewska
Proceedings of the Linguistic Annotation Workshop

2004

Mining Spoken Dialogue Corpora for System Evaluation and Modelin
Frederic Bechet | Giuseppe Riccardi | Dilek Hakkani-Tur
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2002

Bootstrapping Bilingual Data using Consensus Translation for a Multilingual Instant Messaging System
Srinivas Bangalore | Vanessa Murdock | Giuseppe Riccardi
COLING 2002: The 19th International Conference on Computational Linguistics

2001

A Finite-State Approach to Machine Translation
Srinivas Bangalore | Giuseppe Riccardi
Second Meeting of the North American Chapter of the Association for Computational Linguistics

2000

Stochastic Finite-State models for Spoken Language Machine Translation
Srinivas Bangalore | Giuseppe Riccardi
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

1998

Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling
Giuseppe Riccardi | Srinivas Bangalore
Sixth Workshop on Very Large Corpora

Co-authors

Silvia Quarteroni 6

Sebastian Varges 5

Srinivas Bangalore 4

Frederic Bechet 4

Shammur Absar Chowdhury 4

Morena Danieli 4

Simone Alghisi 3

Ali Orkan Bayer 3

Marco Dinarelli 3

Sucheta Ghosh 3

Alexei V. Ivanov 3

Richard Johansson 3

Massimo Rizzoli 3

Aniruddha Tammewar 3

Giuliano Tortoreto 3

Steve Azzolin 2

Simone Caldarella 2

Dilek Hakkani-Tur 2

Truc-Vien T. Nguyen 2

Massimo Poesio 2

Christian Raymond 2

Pierluigi Roberti 2

Kepa Joseba Rodriguez 2

Koichiro Yoshino 2

Balamurali AR 1

Sebastian Bayerl 1

Raffaella Bernardi 1

Franziska Braun 1

Zoraida Callejas 1

Giuseppe Carenini 1

Edoardo Cecchinato 1

Yun-Nung Chen 1

Christian Chiarcos 1

Jennifer Chu-Carroll 1

Philipp Cimiano 1

Bonaventura Coppola 1

Géraldine Damnati 1

Giuseppe "Pino" Di Fabbrizio 1

Stefanie Dipper 1

Kathrin Donandt 1

Mateusz Dubiel 1

Luis Fernando D’Haro 1

Arindam Ghosh 1

Meritxell Gonzàlez 1

Joakim Gustafson 1

Michael Götze 1

Lucia Horníková 1

Michael Johnston 1

Aravind Joshi 1

Tatsuya Kawahara 1

Michela Lorandi 1

John Mendonça 1

Eva-Maria Messner 1

Stefano Mezza 1

Vanessa Murdock 1

Satoshi Nakamura 1

Roberto Negro 1

Alexandros Papangelis 1

Siddharth Patwardhan 1

Alexandros Potamianos 1

Rashmi Prasad 1

Joanna Rabiega-Wiśniewska 1

Korbinian Riedhammer 1

Samuel Rönnqvist 1

Shohei Tanaka 1

M. Inés Torres 1

Jérémy Trione 1

Christina Unger 1

Venues