Bernardo Magnini

Also published as: B. Magnini

2025

“I Understand, but...”: Towards a Comprehensive Account of the Explainee’s Voice in Explanatory Dialogues
Andrea Zaninello | Petar Bodlovic | Marcin Lewinski | Bernardo Magnini
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib

pdf bib abs

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its reasoning categories, the metric it uses, and the language and culture of the videos. MAIA evaluates Vision Language Models (VLMs) on two aligned tasks: a visual statement verification task, and an open-ended visual question-answering task, both on the same set of video-related questions. It considers twelve reasoning categories that aim to disentangle language and vision relations by highlighting the role of the visual input. Thanks to its carefully taught design, it evaluates VLMs’ consistency and visually grounded natural language comprehension and generation simultaneously through an aggregated metric revealing low results that highlight models’ fragility. Last but not least, the video collection has been carefully selected to reflect the Italian culture, and the language data are produced by native-speakers.Data available at *[GitHub](https://github.com/Caput97/MAIA-Multimodal_AI_Assessment.git).*

pdf bib abs

Investigating Proactivity in Task-Oriented Dialogues
Sofia Brenna | Elisabetta Jezek | Bernardo Magnini
Dialogue & Discourse Volume 16

This paper investigates proactivity, a characteristic phenomenon of collaborative human-human interaction, where a participant in the dialogue offers the addressee some useful and not explicitly requested information. More precisely, a proactive behaviour is: (i) self-prompted and not simply reactive, that is, the speaker does not act merely in response to the requests the other participant has made; (ii) somehow effective for the achievement of the dialogue goal, since the speaker has a long-term, goal-directed behaviour that predicts future states and needs. Proactivity has been poorly investigated from a theoretical point of view, and there is a general need of empirical data for both quantitative and qualitative research. The paper provides an extensive analysis of proactivity in several human-human task-oriented dialogic corpora, selected with different characteristics, including chat exchanges and telephone calls, collection modalities such as natural setting and Wizard of Oz, and two languages, Italian and English. The main result is the D-Pro Corpus, a new resource manually annotated at the utterance level with proactivity and dialogue acts, which allows to investigate proactivity in the context of task-oriented dialogues. There are several findings from our empirical investigation of proactivity: (i) we find that about 20% of turns in our corpus are proactive turns, showing that this is a very diffused and relevant phenomenon; (ii) we confirm the non-reactive nature of proactivity, highlighting the presence of a pattern where a turn in the dialogue triggers a reaction in a following turn and a proactive utterance is then added to the turn; (iii) we show that only a limited number of dialogue acts are actually involved in expressing proactivity, and we discuss the theoretical implications of this finding; (iv) we empirically confirm that proactivity has a crucial role in recovering from goal-failure situations, contributing to the effectiveness of the whole dialogue; (v) we support the intuition of a non-uniform distribution of proactive utterances throughout the dialogue. Our empirical findings and the D-Pro Corpus provide relevant insights for deeper theoretical investigations, as well as crucial resources for improving proactivity in current task-oriented dialogue systems.

pdf bib abs

Converting Annotated Clinical Cases into Structured Case Report Forms
Pietro Ferrazzi | Alberto Lavelli | Bernardo Magnini
Proceedings of the 24th Workshop on Biomedical Language Processing

Case Report Forms (CRFs) are largely used in medical research as they ensure accuracy, reliability, and validity of results in clinical studies. However, publicly available, well-annotated CRF datasets are scarce, limiting the development of CRF slot filling systems able to fill in a CRF from clinical notes. To mitigate the scarcity of CRF datasets, we propose to take advantage of available datasets annotated for information extraction tasks and to convert them into structured CRFs. We present a semi-automatic conversion methodology, which has been applied to the E3C dataset in two languages (English and Italian), resulting in a new, high-quality dataset for CRF slot filling. Through several experiments on the created dataset, we report that slot filling achieves 59.7% for Italian and 67.3% for English on a closed Large Language Models (zero-shot) and worse performances on three families of open-source models, showing that filling CRFs is challenging even for recent state-of-the-art LLMs.

pdf bib abs

Explanations explained. Influence of Free-text Explanations on LLMs and the Role of Implicit Knowledge
Andrea Zaninello | Roberto Dessi | Malvina Nissim | Bernardo Magnini
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

In this work, we investigate the relationship between the quality of explanations produced by different models and the amount of implicit knowledge the are able to provide beyond the input. We approximate explanation quality via accuracy on a downstream task with a standardized pipeline (GEISER) and study its correlation with three different association measures, each capturing different aspects of implicitness, defined as a combination of relevance and novelty. We conduct experiments with three SOTA LLMs on four tasks involving implicit knowledge, with explanations either confirming or contradicting the correct label. Our results demonstrate that providing quality explanations consistently improves the accuracy of LLM predictions, even when the models are not explicitly trained to take explanations as input, and underline the correlation between implicit content delivered by the explanation and its effectiveness.

pdf bib

pdf bib

pdf bib abs

Task-Oriented Dialogue Systems through Function Calling
Tiziano Labruna | Giovanni Bonetta | Bernardo Magnini
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating dialogues and handling a broad range of user queries. However, their effectiveness as end-to-end Task-Oriented Dialogue (TOD) systems remains limited due to their reliance on static parametric memory, which fails to accommodate evolving knowledge bases (KBs). This paper investigates a scalable function-calling approach that enables LLMs to retrieve only the necessary KB entries via schema-guided queries, rather than embedding the entire KB into each prompt. This selective retrieval strategy reduces prompt size and inference time while improving factual accuracy in system responses. We evaluate our method on the MultiWOZ 2.3 dataset and compare it against a full-KB baseline that injects the entire KB into every prompt. Experimental results show that our approach consistently outperforms the full-KB method in accuracy, while requiring significantly fewer input tokens and considerably less computation time, especially when the KB size increases.

2024

pdf bib abs

Understanding High-complexity Technical Documents with State-of-Art Models
Bernardo Magnini | Roberto Zanoli
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Technical documents, particularly those in civil engineering, contain crucial information that supports critical decision-making in construction, transportation and infrastructure projects. Large language models (LLMs) offer a promising solution for automating the extraction and comprehension of technical documents, potentially transforming our interaction with technical information. However, LLMs may encounter significant challenges when processing technical documents due to their complex structure, specialized terminology and reliance on graphical and visual elements. Moreover, LLMs are known to sometimes produce unexpected or incorrect analyses, a phenomenon referred to as hallucination.This study explores the potential of state-of-the-art LLMs, specifically GPT-4omni, to automate the comprehension of technical documents. The evaluation was performed on two types of PDF documents. The first type is selectable text PDFs, which are extractable and editable, focusing on civil engineering documents from the Italian state railways. The second type is scanned OCR PDFs, where text is derived from scanning or OCR, specifically focusing on the design of an outdoor swimming pool. These documents include textual and visual elements such as tables, figures and photos. Our findings suggest that GPT-4omni has a high potential for real-world use, although it may still be susceptible to producing misleading information.

pdf bib abs

Dynamic Task-Oriented Dialogue: A Comparative Study of Llama-2 and Bert in Slot Value Generation
Tiziano Labruna | Sofia Brenna | Bernardo Magnini
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Recent advancements in instruction-based language models have demonstrated exceptional performance across various natural language processing tasks. We present a comprehensive analysis of the performance of two open-source language models, BERT and Llama-2, in the context of dynamic task-oriented dialogues. Focusing on the Restaurant domain and utilizing the MultiWOZ 2.4 dataset, our investigation centers on the models’ ability to generate predictions for masked slot values within text. The dynamic aspect is introduced through simulated domain changes, mirroring real-world scenarios where new slot values are incrementally added to a domain over time.This study contributes to the understanding of instruction-based models’ effectiveness in dynamic natural language understanding tasks when compared to traditional language models and emphasizes the significance of open-source, reproducible models in advancing research within the academic community.

pdf bib abs

Are You a Good Assistant? Assessing LLM Trustability in Task-oriented Dialogues
Tiziano Labruna | Sofia Brenna | Giovanni Bonetta | Bernardo Magnini
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Despite the impressive capabilities of recent Large Language Models (LLMs) to generate human-like text, their ability to produce contextually appropriate content for specific communicative situations is still a matter of debate. This issue is particularly crucial when LLMs are employed as assistants to help solve tasks or achieve goals within a given conversational domain. In such scenarios, the assistant is expected to access specific knowledge (e.g., a database of restaurants, a calendar of appointments) that is not directly accessible to the user and must be consistently utilised to accomplish the task.In this paper, we conduct experiments to evaluate the trustworthiness of automatic assistants in task-oriented dialogues. Our findings indicate that state-of-the-art open-source LLMs still face significant challenges in maintaining logical consistency with a knowledge base of facts, highlighting the need for further advancements in this area.

pdf bib abs

GEESE - Generating and Evaluating Explanations for Semantic Entailment: A CALAMITA Challenge
Andrea Zaninello | Bernardo Magnini
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

In the GEESE challenge, we present a pipeline to evaluate generated explanations for the task of Recognizing Textual Entailment (RTE) in Italian. The challenge focuses on evaluating the impact of generated explanations on the predictive performance of language models. Using a dataset enriched with human-written explanations, we employ two large language models (LLMs) to generate and utilize explanations for semantic relationships between sentence pairs. Our methodology assesses the quality of generated explanations by measuring changes in prediction accuracy when explanations are provided. Through reproducible experimentation, we establish benchmarks against various baseline approaches, demonstrating the potential of explanation injection to enhance model interpretability and performance.

pdf bib abs

Towards Cost-effective Multi-style Conversations: A Pilot Study in Task-oriented Dialogue Generation
Tiziano Labruna | Bernardo Magnini
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Conversations exhibit significant variation when different styles are employed by participants, often leading to subpar performance when a dialogue model is exclusively trained on single-style datasets. We present a cost-effective methodology for generating multi-style conversations, which can be used in the development of conversational agents. This methodology only assumes the availability of a conversational domain, such as a knowledge base, and leverages the generative capabilities of large language models. In a pilot study focused on the generation aspect of task-oriented dialogues, we extended the well-known MultiWOZ dataset to encompass multi-style variations. Our findings highlight two key experimental outcomes: (i) these novel resources pose challenges for current single-style models, and (ii) multi-style resources enhance the dialogue model’s resilience to stylistic variations.

pdf bib abs

Get the Best out of 1B LLMs: Insights from Information Extraction on Clinical Documents
Saeed Farzi | Soumitra Ghosh | Alberto Lavelli | Bernardo Magnini
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

While the popularity of large, versatile language models like ChatGPT continues to rise, the landscape shifts when considering open-source models tailored to specific domains. Moreover, many areas, such as clinical documents, suffer from a scarcity of training data, often amounting to only a few hundred instances. Additionally, in certain settings, such as hospitals, cloud-based solutions pose privacy concerns, necessitating the deployment of language models on traditional hardware, such as single GPUs or powerful CPUs. To address these complexities, we conduct extensive experiments on both clinical entity detection and relation extraction in clinical documents using 1B parameter models. Our study delves into traditional fine-tuning, continuous pre-training in the medical domain, and instruction-tuning methods, providing valuable insights into their effectiveness in a multilingual setting. Our results underscore the importance of domain-specific models and pre-training for clinical natural language processing tasks. Furthermore, data augmentation using cross-lingual information improves performance in most cases, highlighting the potential for multilingual enhancements.

pdf bib abs

Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.

2023

pdf bib abs

A smashed glass cannot be full: Generation of Commonsense Explanations through Prompt-based Few-shot Learning
Andrea Zaninello | Bernardo Magnini
Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)

We assume that providing explanations is a process to elicit implicit knowledge in human communication, and propose a general methodology to generate commonsense explanations from pairs of semantically related sentences. We take advantage of both prompting applied to large, encoder-decoder pre-trained language models, and few-shot learning techniques, such as pattern-exploiting training. Experiments run on the e-SNLI dataset show that the proposed method achieves state-of-the-art results on the explanation generation task, with a substantial reduction of labelled data. The obtained results open new perspective on a number of tasks involving the elicitation of implicit knowledge.

pdf bib abs

Addressing Domain Changes in Task-oriented Conversational Agents through Dialogue Adaptation
Tiziano Labruna | Bernardo Magnini
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Recent task-oriented dialogue systems are trained on annotated dialogues, which, in turn, reflect certain domain information (e.g., restaurants or hotels in a given region). However, when such domain knowledge changes (e.g., new restaurants open), the initial dialogue model may become obsolete, decreasing the overall performance of the system. Through a number of experiments, we show, for instance, that adding 50% of new slot-values reduces of about 55% the dialogue state-tracker performance. In light of such evidence, we suggest that automatic adaptation of training dialogues is a valuable option for re-training obsolete models. We experimented with a dialogue adaptation approach based on fine-tuning a generative language model on domain changes, showing that a significant reduction of performance decrease can be obtained.

pdf bib

Preface to the CLiC-it 2023 Proceedings
Federico Boschetti | Gianluca E. Lebani | Bernardo Magnini | Nicole Novielli
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib

Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)
Federico Boschetti | Gianluca E. Lebani | Bernardo Magnini | Nicole Novielli
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib

Textual Entailment with Natural Language Explanations: The Italian e-RTE-3 Dataset
Andrea Zaninello | Sofia Brenna | Bernardo Magnini
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib

Testing ChatGPT for Stability and Reasoning: A Case Study Using Italian Medical Specialty Tests
Silvia Casola | Tiziano Labruna | Alberto Lavelli | Bernardo Magnini
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

2021

pdf bib abs

Addressing Slot-Value Changes in Task-oriented Dialogue Systems through Dialogue Domain Adaptation
Tiziano Labruna | Bernardo Magnini
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Recent task-oriented dialogue systems learn a model from annotated dialogues, and such dialogues are in turn collected and annotated so that they are consistent with certain domain knowledge. However, in real scenarios, domain knowledge is subject to frequent changes, and initial training dialogues may soon become obsolete, resulting in a significant decrease in the model performance. In this paper, we investigate the relationship between training dialogues and domain knowledge, and propose Dialogue Domain Adaptation, a methodology aiming at adapting initial training dialogues to changes intervened in the domain knowledge. We focus on slot-value changes (e.g., when new slot values are available to describe domain entities) and define an experimental setting for dialogue domain adaptation. First, we show that current state-of-the-art models for dialogue state tracking are still poorly robust to slot-value changes of the domain knowledge. Then, we compare different domain adaptation strategies, showing that simple techniques are effective to reduce the gap between training dialogues and domain knowledge.

pdf bib abs

Recent Neural Methods on Dialogue State Tracking for Task-Oriented Dialogue Systems: A Survey
Vevake Balaraman | Seyedmostafa Sheikhalishahi | Bernardo Magnini
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper aims at providing a comprehensive overview of recent developments in dialogue state tracking (DST) for task-oriented conversational systems. We introduce the task, the main datasets that have been exploited as well as their evaluation metrics, and we analyze several proposed approaches. We distinguish between static ontology DST models, which predict a fixed set of dialogue states, and dynamic ontology models, which can predict dialogue states even when the ontology changes. We also discuss the model’s ability to track either single or multiple domains and to scale to new domains, both in terms of knowledge transfer and zero-shot learning. We cover a period from 2013 to 2020, showing a significant increase of multiple domain methods, most of them utilizing pre-trained language models.

pdf bib

Investigating Continued Pretraining for Zero-Shot Cross-Lingual Spoken Language Understanding
Samuel Louvan | Silvia Casola | Bernardo Magnini
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib

From Cambridge to Pisa: A Journey into Cross-Lingual Dialogue Domain Adaptation for Conversational Agents
Tiziano Labruna | Bernardo Magnini
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

pdf bib

Simple Data Augmentation for Multilingual NLU in Task Oriented Dialogue Systems
Samuel Louvan | Bernardo Magnini
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs

How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks
Samuel Louvan | Bernardo Magnini
Proceedings of the First Workshop on Insights from Negative Results in NLP

Although several works have addressed the role of data selection to improve transfer learning for various NLP tasks, there is no consensus about its real benefits and, more generally, there is a lack of shared practices on how it can be best applied. We propose a systematic approach aimed at evaluating data selection in scenarios of increasing complexity. Specifically, we compare the case in which source and target tasks are the same while source and target domains are different, against the more challenging scenario where both tasks and domains are different. We run a number of experiments on semantic sequence tagging tasks, which are relatively less investigated in data selection, and conclude that data selection has more benefit on the scenario when the tasks are the same, while in case of different (although related) tasks from distant domains, a combination of data selection and multi-task learning is ineffective for most cases.

pdf bib abs

Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey
Samuel Louvan | Bernardo Magnini
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, fostered by deep learning technologies and by the high demand for conversational AI, various approaches have been proposed that address the capacity to elicit and understand user’s needs in task-oriented dialogue systems. We focus on two core tasks, slot filling (SF) and intent classification (IC), and survey how neural based models have rapidly evolved to address natural language understanding in dialogue systems. We introduce three neural architectures: independent models, which model SF and IC separately, joint models, which exploit the mutual benefit of the two tasks simultaneously, and transfer learning models, that scale the model to new domains. We discuss the current state of the research in SF and IC, and highlight challenges that still require attention.

pdf bib

Becoming JILDA
Irene Sucameli | Alessandro Lenci | Bernardo Magnini | Maria Simi | Manuela Speranza
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib

Investigating Proactivity in Task-Oriented Dialogues
Vevake Balaraman | Bernardo Magnini
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs

Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language
Bernardo Magnini | Alberto Lavelli | Simone Magnolini
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a comparison between deep learning and traditional machine learning methods for various NLP tasks in Italian. We carried on experiments using available datasets (e.g., from the Evalita shared tasks) on two sequence tagging tasks (i.e., named entities recognition and nominal entities recognition) and four classification tasks (i.e., lexical relations among words, semantic relations among sentences, sentiment analysis and text classification). We show that deep learning approaches outperform traditional machine learning algorithms in sequence tagging, while for classification tasks that heavily rely on semantics approaches based on feature engineering are still competitive. We think that a similar analysis could be carried out for other languages to provide an assessment of machine learning / deep learning models across different languages.

pdf bib abs

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.

pdf bib

Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification
Samuel Louvan | Bernardo Magnini
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib

The E3C Project:Collection and Annotation of a Multilingual Corpus of Clinical Cases
Bernardo Magnini | Begoña Altuna | Alberto Lavelli | Manuela Speranza | Roberto Zanoli
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

2019

pdf bib abs

Leveraging Non-Conversational Tasks for Low Resource Slot Filling: Does it help?
Samuel Louvan | Bernardo Magnini
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Slot filling is a core operation for utterance understanding in task-oriented dialogue systems. Slots are typically domain-specific, and adding new domains to a dialogue system involves data and time-intensive processes. A popular technique to address the problem is transfer learning, where it is assumed the availability of a large slot filling dataset for the source domain, to be used to help slot filling on the target domain, with fewer data. In this work, instead, we propose to leverage source tasks based on semantically related non-conversational resources (e.g., semantic sequence tagging datasets), as they are both cheaper to obtain and reusable to several slot filling domains. We show that using auxiliary non-conversational tasks in a multi-task learning setup consistently improves low resource slot filling performance.

pdf bib

How to Use Gazetteers for Entity Recognition with Neural Models
Simone Magnolini | Valerio Piccioni | Vevake Balaraman | Marco Guerini | Bernardo Magnini
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib abs

FASTDial: Abstracting Dialogue Policies for Fast Development of Task Oriented Agents
Serra Sinem Tekiroglu | Bernardo Magnini | Marco Guerini
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a novel abstraction framework called FASTDial for designing task oriented dialogue agents, built on top of the OpenDial toolkit. This framework is meant to facilitate prototyping and development of dialogue systems from scratch also by non tech savvy especially when limited training data is available. To this end, we use a generic and simple frame-slots data-structure with pre-defined dialogue policies that allows for fast design and implementation at the price of some flexibility reduction. Moreover, it allows for minimizing programming effort and domain expert training time, by hiding away many implementation details. We provide a system demonstration screencast video in the following link: https://vimeo.com/329840716

2018

pdf bib

What’s in a Food Name: Knowledge Induction from Gazetteers of Food Main Ingredient
Bernardo Magnini | Vevake Balaraman | Simone Magnolini | Marco Guerini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib

From General to Specific : Leveraging Named Entity Recognition for Slot Filling in Conversational Language Understanding
Samuel Louvan | Bernardo Magnini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib abs

Exploring Named Entity Recognition As an Auxiliary Task for Slot Filling in Conversational Language Understanding
Samuel Louvan | Bernardo Magnini
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Slot filling is a crucial task in the Natural Language Understanding (NLU) component of a dialogue system. Most approaches for this task rely solely on the domain-specific datasets for training. We propose a joint model of slot filling and Named Entity Recognition (NER) in a multi-task learning (MTL) setup. Our experiments on three slot filling datasets show that using NER as an auxiliary task improves slot filling performance and achieve competitive performance compared with state-of-the-art. In particular, NER is effective when supervised at the lower layer of the model. For low-resource scenarios, we found that MTL is effective for one dataset.

pdf bib

Enriching a Lexicon of Discourse Connectives with Corpus-based Data
Anna Feltracco | Elisabetta Jezek | Bernardo Magnini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

A Methodology for Evaluating Interaction Strategies of Task-Oriented Conversational Agents
Marco Guerini | Sara Falcone | Bernardo Magnini
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

In task-oriented conversational agents, more attention has been usually devoted to assessing task effectiveness, rather than to how the task is achieved. However, conversational agents are moving towards more complex and human-like interaction capabilities (e.g. the ability to use a formal/informal register, to show an empathetic behavior), for which standard evaluation methodologies may not suffice. In this paper, we provide a novel methodology to assess - in a completely controlled way - the impact on the quality of experience of agent’s interaction strategies. The methodology is based on a within subject design, where two slightly different transcripts of the same interaction with a conversational agent are presented to the user. Through a series of pilot experiments we prove that this methodology allows fast and cheap experimentation/evaluation, focusing on aspects that are overlooked by current methods.

pdf bib

Effective Communication without Verbs? Sure! Identification of Nominal Utterances in Italian Social Media Texts
Gloria Comandini | Manuela Speranza | Bernardo Magnini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib

Lexical Opposition in Discourse Contrast
Anna Feltracco | Bernardo Magnini | Elisabetta Jezek
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib abs

Toward zero-shot Entity Recognition in Task-oriented Conversational Agents
Marco Guerini | Simone Magnolini | Vevake Balaraman | Bernardo Magnini
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present a domain portable zero-shot learning approach for entity recognition in task-oriented conversational agents, which does not assume any annotated sentences at training time. Rather, we derive a neural model of the entity names based only on available gazetteers, and then apply the model to recognize new entities in the context of user utterances. In order to evaluate our working hypothesis we focus on nominal entities that are largely used in e-commerce to name products. Through a set of experiments in two languages (English and Italian) and three different domains (furniture, food, clothing), we show that the neural gazetteer-based approach outperforms several competitive baselines, with minimal requirements of linguistic features.

pdf bib

KRAUTS: A German Temporally Annotated News Corpus
Jannik Strötgen | Anne-Lyse Minard | Lukas Lange | Manuela Speranza | Bernardo Magnini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib

Sanremo’s Winner Is... Category-driven Selection Strategies for Active Learning
Anne-Lyse Minard | Manuela Speranza | Mohammed R. H. Qwaider | Bernardo Magnini
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib

Contrast-Ita Bank: A corpus for Italian Annotated with Discourse Contrast Relations
Anna Feltracco | Bernardo Magnini | Elisabetta Jezek
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib

Find Problems before They Find You with AnnotatorPro’s Monitoring Functionalities
Mohammed R. H. Qwaider | Anne-Lyse Minard | Manuela Speranza | Bernardo Magnini
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib

Tagging Semantic Types for Verb Argument Positions
Francesca Della Moretta | Anna Feltracco | Elisabetta Jezek | Bernardo Magnini
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

2016

pdf bib abs

Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
Anna Feltracco | Simone Magnolini | Elisabetta Jezek | Bernardo Magnini
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe an experiment for the acquisition of opposition relations among Italian verb senses, based on a crowdsourcing methodology. The goal of the experiment is to discuss whether the types of opposition we distinguish (i.e. complementarity, antonymy, converseness and reversiveness) are actually perceived by the crowd. In particular, we collect data for Italian by using the crowdsourcing platform CrowdFlower. We ask annotators to judge the type of opposition existing among pairs of sentences -previously judged as opposite- that differ only for a verb: the verb in the first sentence is opposite of the verb in second sentence. Data corroborate the hypothesis that some opposition relations exclude each other, while others interact, being recognized as compatible by the contributors.

pdf bib

FBK-HLT-NLP at SemEval-2016 Task 2: A Multitask, Deep Learning Approach for Interpretable Semantic Textual Similarity
Simone Magnolini | Anna Feltracco | Bernardo Magnini
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib abs

TextPro-AL: An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks
Bernardo Magnini | Anne-Lyse Minard | Mohammed R. H. Qwaider | Manuela Speranza
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents TextPro-AL (Active Learning for Text Processing), a platform where human annotators can efficiently work to produce high quality training data for new domains and new languages exploiting Active Learning methodologies. TextPro-AL is a web-based application integrating four components: a machine learning based NLP pipeline, an annotation editor for task definition and text annotations, an incremental re-training procedure based on active learning selection from a large pool of unannotated data, and a graphical visualization of the learning status of the system.

pdf bib abs

Using WordNet to Build Lexical Sets for Italian Verbs
Anna Feltracco | Lorenzo Gatti | Elisabetta Jezek | Bernardo Magnini | Simone Magnolini
Proceedings of the 8th Global WordNet Conference (GWC)

We present a methodology for building lexical sets for argument slots of Italian verbs. We start from an inventory of semantically typed Italian verb frames and through a mapping to WordNet we automatically annotate the sets of fillers for the argument positions in a corpus of sentences. We evaluate both a baseline algorithm and a syntax driven algorithm and show that the latter performs significantly better in terms of precision.

2015

pdf bib

Predicting Correlations Between Lexical Alignments and Semantic Inferences
Simone Magnolini | Bernardo Magnini
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib

Opposition Relations among Verb Frames
Anna Feltracco | Elisabetta Jezek | Bernardo Magnini
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib

pdf bib

Book Reviews: Recognizing Textual Entailment: Models and Applications by Ido Dagan, Dan Roth, Mark Sammons and Fabio Massimo Zanzotto
Bernardo Magnini
Computational Linguistics, Volume 41, Issue 1 - March 2015

2014

pdf bib abs

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiatives work throughout Europe in order to boost progress and innovation in our field.

pdf bib abs

Decomposing Semantic Inference
Elana Cabria | Bernardo Magnini
Linguistic Issues in Language Technology, Volume 9, 2014 - Perspectives on Semantic Representations for Textual Inference

Beside formal approaches to semantic inference that rely on logical representation of meaning, the notion of Textual Entailment (TE) has been proposed as an applied framework to capture major semantic inference needs across applications in Computational Linguistics. Although several approaches have been tried and evaluation campaigns have shown improvements in TE, a renewed interest is rising in the research community towards a deeper and better understanding of the core phenomena involved in textual inference. Pursuing this direction, we are convinced that crucial progress will derive from a focus on decomposing the complexity of the TE task into basic phenomena and on their combination. In this paper, we carry out a deep analysis on TE data sets, investigating the relations among two relevant aspects of semantic inferences: the logical dimension, i.e. the capacity of the inference to prove the conclusion from its premises, and the linguistic dimension, i.e. the linguistic devices used to accomplish the goal of the inference. We propose a decomposition approach over TE pairs, where single linguistic phenomena are isolated in what we have called atomic inference pairs, and we show that at this granularity level the actual correlation between the linguistic and the logical dimensions of semantic inferences emerges and can be empirically observed.

pdf bib abs

T-PAS; A resource of Typed Predicate Argument Structures for linguistic analysis and semantic processing
Elisabetta Jezek | Bernardo Magnini | Anna Feltracco | Alessia Bianchini | Octavian Popescu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The goal of this paper is to introduce T-PAS, a resource of typed predicate argument structures for Italian, acquired from corpora by manual clustering of distributional information about Italian verbs, to be used for linguistic analysis and semantic processing tasks. T-PAS is the first resource for Italian in which semantic selection properties and sense-in-context distinctions of verbs are characterized fully on empirical ground. In the paper, we first describe the process of pattern acquisition and corpus annotation (section 2) and its ongoing evaluation (section 3). We then demonstrate the benefits of pattern tagging for NLP purposes (section 4), and discuss current effort to improve the annotation of the corpus (section 5). We conclude by reporting on ongoing experiments using semiautomatic techniques for extending coverage (section 6).

pdf bib

pdf bib abs

This paper presents META-SHARE (www.meta-share.eu), an open language resource infrastructure, and its usage since its Europe-wide deployment in early 2013. META-SHARE is a network of repositories that store language resources (data, tools and processing services) documented with high-quality metadata, aggregated in central inventories allowing for uniform search and access. META-SHARE was developed by META-NET (www.meta-net.eu) and aims to serve as an important component of a language technology marketplace for researchers, developers, professionals and industrial players, catering for the full development cycle of language technology, from research through to innovative products and services. The observed usage in its initial steps, the steadily increasing number of network nodes, resources, users, queries, views and downloads are all encouraging and considered as supportive of the choices made so far. In tandem, take-up activities like direct linking and processing of datasets by language processing services as well as metadata transformation to RDF are expected to open new avenues for data and resources linking and boost the organic growth of the infrastructure while facilitating language technology deployment by much wider research communities and industrial sectors.

2013

pdf bib

Entailment graphs for text exploration
Ido Dagan | Bernardo Magnini
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib

Bridges Across the Language Divide — EU-BRIDGE Excitement: Exploring Customer Interactions through Textual EntailMENT
Ido Dagan | Bernardo Magnini | Guenter Neumann | Sebastian Pado
Proceedings of Machine Translation Summit XIV: European projects

pdf bib

Excitement: Exploring Customer Interactions through Textual EntailMENT
Ido Dagan | Bernardo Magnini | Guenter Neumann | Sebastian Pado
Proceedings of Machine Translation Summit XIV: European projects

2012

pdf bib abs

Uncertainty language permeates biomedical research and is fundamental for the computer interpretation of unstructured text. And yet, a coherent, cognitive-based theory to interpret Uncertainty language and guide Natural Language Processing is, to our knowledge, non-existing. The aim of our project was therefore to detect and annotate Uncertainty markers ― which play a significant role in building knowledge or beliefs in readers' minds ― in a biomedical research corpus. Our corpus includes 80 manually annotated articles from the British Medical Journal randomly sampled from a 168-year period. Uncertainty markers have been classified according to a theoretical framework based on a combined linguistic and cognitive theory. The corpus was manually annotated according to such principles. We performed preliminary experiments to assess the manually annotated corpus and establish a baseline for the automatic detection of Uncertainty markers. The results of the experiments show that most of the Uncertainty markers can be recognized with good accuracy.

pdf bib abs

The KnowledgeStore: an Entity-Based Storage System
Roldano Cattoni | Francesco Corcoglioniti | Christian Girardi | Bernardo Magnini | Luciano Serafini | Roberto Zanoli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the KnowledgeStore, a large-scale infrastructure for the combined storage and interlinking of multimedia resources and ontological knowledge. Information in the KnowledgeStore is organized around entities, such as persons, organizations and locations. The system allows (i) to import background knowledge about entities, in form of annotated RDF triples; (ii) to associate resources to entities by automatically recognizing, coreferring and linking mentions of named entities; and (iii) to derive new entities based on knowledge extracted from mentions. The KnowledgeStore builds on state of art technologies for language processing, including document tagging, named entity extraction and cross-document coreference. Its design provides for a tight integration of linguistic and semantic features, and eases the further processing of information by explicitly representing the contexts where knowledge and mentions are valid or relevant. We describe the system and report about the creation of a large-scale KnowledgeStore instance for storing and integrating multimedia contents and background knowledge relevant to the Italian Trentino region.

pdf bib

Extracting Context-Rich Entailment Rules from Wikipedia Revision History
Elena Cabrio | Bernardo Magnini | Angelina Ivanova
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

2011

pdf bib

Towards Component-Based Textual Entailment
Elena Cabrio | Bernardo Magnini
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf bib

Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Rutu Mulkar-Mehta | James Allen | Jerry Hobbs | Eduard Hovy | Bernardo Magnini | Chris Manning
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib

Toward Qualitative Evaluation of Textual Entailment Systems
Elena Cabrio | Bernardo Magnini
Coling 2010: Posters

pdf bib

Contradiction-focused qualitative evaluation of textual entailment
Bernardo Magnini | Elena Cabrio
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib abs

Evaluating Multilingual Question Answering Systems at CLEF
Pamela Forner | Danilo Giampiccolo | Bernardo Magnini | Anselmo Peñas | Álvaro Rodrigo | Richard Sutcliffe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper offers an overview of the key issues raised during the seven years activity of the Multilingual Question Answering Track at the Cross Language Evaluation Forum (CLEF). The general aim of the Multilingual Question Answering Track has been to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages, also drawing attention to a number of challenging issues for research in multilingual QA. The paper gives a brief description of how the task has evolved over the years and of the way in which the data sets have been created, presenting also a brief summary of the different types of questions developed. The document collections adopted in the competitions are sketched as well, and some data about the participation are provided. Moreover, the main evaluation measures used to evaluate system performances are explained and an overall analysis of the results achieved is presented.

pdf bib abs

Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference
Luisa Bentivogli | Elena Cabrio | Ido Dagan | Danilo Giampiccolo | Medea Lo Leggio | Bernardo Magnini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes a methodology for the creation of specialized data sets for Textual Entailment, made of monothematic Text-Hypothesis pairs (i.e. pairs in which only one linguistic phenomenon relevant to the entailment relation is highlighted and isolated). The expected benefits derive from the intuition that investigating the linguistic phenomena separately, i.e. decomposing the complexity of the TE problem, would yield an improvement in the development of specific strategies to cope with them. The annotation procedure assumes that humans have knowledge about the linguistic phenomena relevant to inference, and a classification of such phenomena both into fine grained and macro categories is suggested. We experimented with the proposed methodology over a sample of pairs taken from the RTE-5 data set, and investigated critical issues arising when entailment, contradiction or unknown pairs are considered. The result is a new resource, which can be profitably used both to advance the comprehension of the linguistic phenomena relevant to entailment judgments and to make a first step towards the creation of large-scale specialized data sets.

2009

pdf bib

Optimizing Textual Entailment Recognition Using Particle Swarm Optimization
Yashar Mehdad | Bernardo Magnini
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf bib abs

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

pdf bib abs

This paper presents the QALL-ME benchmark, a multilingual resource of annotated spoken requests in the tourism domain, freely available for research purposes. The languages currently involved in the project are Italian, English, Spanish and German. It introduces a semantic annotation scheme for spoken information access requests, specifically derived from Question Answering (QA) research. In addition to pragmatic and semantic annotations, we propose three QA-based annotation levels: the Expected Answer Type, the Expected Answer Quantifier and the Question Topical Target of a request, to fully capture the content of a request and extract the sought-after information. The QALL-ME benchmark is developed under the EU-FP6 QALL-ME project which aims at the realization of a shared and distributed infrastructure for Question Answering (QA) systems on mobile devices (e.g. mobile phones). Questions are formulated by the users in free natural language input, and the system returns the actual sequence of words which constitutes the answer from a collection of information sources (e.g. documents, databases). Within this framework, the benchmark has the twofold purpose of training machine learning based applications for QA, and testing their actual performance with a rapid turnaround in controlled laboratory setting.

2007

pdf bib

SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval
Eneko Agirre | Bernardo Magnini | Oier Lopez de Lacalle | Arantxa Otegi | German Rigau | Piek Vossen
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib

The Third PASCAL Recognizing Textual Entailment Challenge
Danilo Giampiccolo | Bernardo Magnini | Ido Dagan | Bill Dolan
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib

pdf bib

IRST-BP: Web People Search Using Name Entities
Octavian Popescu | Bernardo Magnini
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib abs

This paper presents an overview of the Multilingual Question Answering evaluation campaigns which have been organized at CLEF (Cross Language Evaluation Forum) since 2003. Over the years, the competition has registered a steady increment in the number of participants and languages involved. In fact, from the original eight groups which participated in 2003 QA track, the number of competitors in 2005 rose to twenty-four. Also, the performances of the systems have steadily improved, and the average of the best performances in the 2005 saw an increase of 10% with respect to the previous year.

pdf bib

Representing and Accessing Multilevel Linguistic Annotation using the MEANING Format
Emanuele Pianta | Luisa Bentivogli | Christian Girardi | Bernardo Magnini
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

pdf bib

Weakly Supervised Approaches for Ontology Population
Hristo Tanev | Bernardo Magnini
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs

Building a Large-Scale Repository of Textual Entailment Rules
Milen Kouylekov | Bernardo Magnini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Entailment rules are rules where the left hand side (LHS) specifies some knowledge which entails the knowledge expressed n the RHS of the rule, with some degree of confidence. Simple entailment rules can be combined in complex entailment chains, which n turn are at the basis of entailment-based reasoning, which has been recently proposed as a pervasive and application independent approach to Natural Language Understanding. We present the first elease of a large-scale repository of entailment rules at the lexical level, which have been derived from a number of available resources, including WordNet and a word similarity database. Experiments on the PASCAL-RTE dataset show that this resource plays a crucial role in recognizing textual entailment.

pdf bib

Ontology Population from Textual Mentions: Task Definition and Benchmark
Bernardo Magnini | Emanuele Pianta | Octavian Popescu | Manuela Speranza
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

pdf bib abs

In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.