Elena Cabrio


2023

pdf bib
Playing the Part of the Sharp Bully: Generating Adversarial Examples for Implicit Hate Speech Detection
Nicolás Benjamín Ocampo | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: ACL 2023

Research on abusive content detection on social media has primarily focused on explicit forms of hate speech (HS), that are often identifiable by recognizing hateful words and expressions. Messages containing linguistically subtle and implicit forms of hate speech still constitute an open challenge for automatic hate speech detection. In this paper, we propose a new framework for generating adversarial implicit HS short-text messages using Auto-regressive Language Models. Moreover, we propose a strategy to group the generated implicit messages in complexity levels (EASY, MEDIUM, and HARD categories) characterizing how challenging these messages are for supervised classifiers. Finally, relying on (Dinan et al., 2019; Vidgen et al., 2021), we propose a “build it, break it, fix it”, training scheme using HARD messages showing how iteratively retraining on HARD messages substantially leverages SOTA models’ performances on implicit HS benchmarks.

pdf bib
Unmasking the Hidden Meaning: Bridging Implicit and Explicit Hate Speech Embedding Representations
Nicolás Benjamín Ocampo | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2023

Research on automatic hate speech (HS) detection has mainly focused on identifying explicit forms of hateful expressions on user-generated content. Recently, a few works have started to investigate methods to address more implicit and subtle abusive content. However, despite these efforts, automated systems still struggle to correctly recognize implicit and more veiled forms of HS. As these systems heavily rely on proper textual representations for classification, it is crucial to investigate the differences in embedding implicit and explicit messages. Our contribution to address this challenging task is fourfold. First, we present a comparative analysis of transformer-based models, evaluating their performance across five datasets containing implicit HS messages. Second, we examine the embedding representations of implicit messages across different targets, gaining insight into how veiled cases are encoded. Third, we compare and link explicit and implicit hateful messages across these datasets through their targets, enforcing the relation between explicitness and implicitness and obtaining more meaningful embedding representations. Lastly, we show how these newer representation maintains high performance on HS labels, while improving classification in borderline cases.

pdf bib
Argument-based Detection and Classification of Fallacies in Political Debates
Pierpaolo Goffredo | Mariana Chaves | Serena Villata | Elena Cabrio
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Fallacies are arguments that employ faulty reasoning. Given their persuasive and seemingly valid nature, fallacious arguments are often used in political debates. Employing these misleading arguments in politics can have detrimental consequences for society, since they can lead to inaccurate conclusions and invalid inferences from the public opinion and the policymakers. Automatically detecting and classifying fallacious arguments represents therefore a crucial challenge to limit the spread of misleading or manipulative claims and promote a more informed and healthier political discourse. Our contribution to address this challenging task is twofold. First, we extend the ElecDeb60To16 dataset of U.S. presidential debates annotated with fallacious arguments, by incorporating the most recent Trump-Biden presidential debate. We include updated token-level annotations, incorporating argumentative components (i.e., claims and premises), the relations between these components (i.e., support and attack), and six categories of fallacious arguments (i.e., Ad Hominem, Appeal to Authority, Appeal to Emotion, False Cause, Slippery Slope, and Slogans). Second, we perform the twofold task of fallacious argument detection and classification by defining neural network architectures based on Transformers models, combining text, argumentative features, and engineered features. Our results show the advantages of complementing transformer-generated text representations with non-text features.

pdf bib
An In-depth Analysis of Implicit and Subtle Hate Speech Messages
Nicolás Benjamín Ocampo | Ekaterina Sviridova | Elena Cabrio | Serena Villata
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The research carried out so far in detecting abusive content in social media has primarily focused on overt forms of hate speech. While explicit hate speech (HS) is more easily identifiable by recognizing hateful words, messages containing linguistically subtle and implicit forms of HS (as circumlocution, metaphors and sarcasm) constitute a real challenge for automatic systems. While the sneaky and tricky nature of subtle messages might be perceived as less hurtful with respect to the same content expressed clearly, such abuse is at least as harmful as overt abuse. In this paper, we first provide an in-depth and systematic analysis of 7 standard benchmarks for HS detection, relying on a fine-grained and linguistically-grounded definition of implicit and subtle messages. Then, we experiment with state-of-the-art neural network architectures on two supervised tasks, namely implicit HS and subtle HS message classification. We show that while such models perform satisfactory on explicit messages, they fail to detect implicit and subtle content, highlighting the fact that HS detection is not a solved problem and deserves further investigation.

2022

pdf bib
Graph Embeddings for Argumentation Quality Assessment
Santiago Marro | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2022

Argumentation is used by people both internally, by evaluating arguments and counterarguments to make sense of a situation and take a decision, and externally, e.g., in a debate, by exchanging arguments to reach an agreement or to promote an individual position. In this context, the assessment of the quality of the arguments is of extreme importance, as it strongly influences the evaluation of the overall argumentation, impacting on the decision making process. The automatic assessment of the quality of natural language arguments is recently attracting interest in the Argument Mining field. However, the issue of automatically assessing the quality of an argumentation largely remains a challenging unsolved task. Our contribution is twofold: first, we present a novel resource of 402 student persuasive essays, where three main quality dimensions (i.e., cogency, rhetoric, and reasonableness) have been annotated, leading to 1908 arguments tagged with quality facets; second, we address this novel task of argumentation quality assessment proposing a novel neural architecture based on graph embeddings, that combines both the textual features of the natural language arguments and the overall argument graph, i.e., considering also the support and attack relations holding among the arguments. Results on the persuasive essays dataset outperform state-of-the-art and standard baselines’ performance.

pdf bib
CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Catherine Blaya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Over the past decades, the number of episodes of cyber aggression occurring online has grown substantially, especially among teens. Most solutions investigated by the NLP community to curb such online abusive behaviors consist of supervised approaches relying on annotated data extracted from social media. However, recent studies have highlighted that private instant messaging platforms are major mediums of cyber aggression among teens. As such interactions remain invisible due to the app privacy policies, very few datasets collecting aggressive conversations are available for the computational analysis of language. In order to overcome this limitation, in this paper we present the CyberAgressionAdo-V1 dataset, containing aggressive multiparty chats in French collected through a role-playing game in high-schools, and annotated at different layers. We describe the data collection and annotation phases, carried out in the context of a EU and a national research projects, and provide insightful analysis on the different types of aggression and verbal abuse depending on the targeted victims (individuals or communities) emerging from the collected data.

2021

pdf bib
Sifting French Tweets to Investigate the Impact of Covid-19 in Triggering Intense Anxiety
Mohamed Amine Romdhane | Elena Cabrio | Serena Villata
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Sifting French Tweets to Investigate the Impact of Covid-19 in Triggering Intense Anxiety. Social media can be leveraged to understand public sentiment and feelings in real-time, and target public health messages based on user interests and emotions. In this paper, we investigate the impact of the COVID-19 pandemic in triggering intense anxiety, relying on messages exchanged on Twitter. More specifically, we provide : i) a quantitative and qualitative analysis of a corpus of tweets in French related to coronavirus, and ii) a pipeline approach (a filtering mechanism followed by Neural Network methods) to satisfactory classify messages expressing intense anxiety on social media, considering the role played by emotions.

pdf bib
Extraction d’arguments basée sur les transformateurs pour des applications dans le domaine de la santé (Transformer-based Argument Mining for Healthcare Applications)
Tobias Mayer | Elena Cabrio | Serena Villata
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nous présentons des résumés en français et en anglais de l’article (Mayer et al., 2020) présenté à la conférence 24th European Conference on Artificial Intelligence (ECAI-2020) en 2020.

pdf bib
“Don’t discuss”: Investigating Semantic and Argumentative Features for Supervised Propagandist Message Detection and Classification
Vorakit Vorakitphan | Elena Cabrio | Serena Villata
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

One of the mechanisms through which disinformation is spreading online, in particular through social media, is by employing propaganda techniques. These include specific rhetorical and psychological strategies, ranging from leveraging on emotions to exploiting logical fallacies. In this paper, our goal is to push forward research on propaganda detection based on text analysis, given the crucial role these methods may play to address this main societal issue. More precisely, we propose a supervised approach to classify textual snippets both as propaganda messages and according to the precise applied propaganda technique, as well as a detailed linguistic analysis of the features characterising propaganda information in text (e.g., semantic, sentiment and argumentation features). Extensive experiments conducted on two available propagandist resources (i.e., NLP4IF’19 and SemEval’20-Task 11 datasets) show that the proposed approach, leveraging different language models and the investigated linguistic features, achieves very promising results on propaganda classification, both at sentence- and at fragment-level.

2020

pdf bib
Love Me, Love Me, Say (and Write!) that You Love Me: Enriching the WASABI Song Corpus with Lyrics Annotations
Michael Fell | Elena Cabrio | Elmahdi Korfed | Michel Buffa | Fabien Gandon
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the WASABI Song Corpus, a large corpus of songs enriched with metadata extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an important part of the semantics of a song, we focus here on the description of the methods we proposed to extract relevant information from the lyrics, as their structure segmentation, their topic, the explicitness of the lyrics content, the salient passages of a song and the emotions conveyed. The creation of the resource is still ongoing: so far, the corpus contains 1.73M songs with lyrics (1.41M unique lyrics) annotated at different levels with the output of the above mentioned methods. Such corpus labels and the provided methods can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and segmentation recommendation of songs.

pdf bib
Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.

pdf bib
Proceedings of the 7th Workshop on Argument Mining
Elena Cabrio | Serena Villata
Proceedings of the 7th Workshop on Argument Mining

pdf bib
Regrexit or not Regrexit: Aspect-based Sentiment Analysis in Polarized Contexts
Vorakit Vorakitphan | Marco Guerini | Elena Cabrio | Serena Villata
Proceedings of the 28th International Conference on Computational Linguistics

Emotion analysis in polarized contexts represents a challenge for Natural Language Processing modeling. As a step in the aforementioned direction, we present a methodology to extend the task of Aspect-based Sentiment Analysis (ABSA) toward the affect and emotion representation in polarized settings. In particular, we adopt the three-dimensional model of affect based on Valence, Arousal, and Dominance (VAD). We then present a Brexit scenario that proves how affect varies toward the same aspect when politically polarized stances are presented. Our approach captures aspect-based polarization from newspapers regarding the Brexit scenario of 1.2m entities at sentence-level. We demonstrate how basic constituents of emotions can be mapped to the VAD model, along with their interactions respecting the polarized context in ABSA settings using biased key-concepts (e.g., “stop Brexit” vs. “support Brexit”). Quite intriguingly, the framework achieves to produce coherent aspect evidences of Brexit’s stance from key-concepts, showing that VAD influence the support and opposition aspects.

2019

pdf bib
Song Lyrics Summarization Inspired by Audio Thumbnailing
Michael Fell | Elena Cabrio | Fabien Gandon | Alain Giboin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Given the peculiar structure of songs, applying generic text summarization methods to lyrics can lead to the generation of highly redundant and incoherent text. In this paper, we propose to enhance state-of-the-art text summarization approaches with a method inspired by audio thumbnailing. Instead of searching for the thumbnail clues in the audio of the song, we identify equivalent clues in the lyrics. We then show how these summaries that take into account the audio nature of the lyrics outperform the generic methods according to both an automatic evaluation and human judgments.

pdf bib
Comparing Automated Methods to Detect Explicit Content in Song Lyrics
Michael Fell | Elena Cabrio | Michele Corazza | Fabien Gandon
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children. Since 2015, digital providers – such as iTunes, Spotify, Amazon Music and Deezer – also follow PAL guidelines and tag such tracks as “explicit”. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discuss the inherent hardness and subjectivity of the task.

pdf bib
Yes, we can! Mining Arguments in 50 Years of US Presidential Campaign Debates
Shohreh Haddadan | Elena Cabrio | Serena Villata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Political debates offer a rare opportunity for citizens to compare the candidates’ positions on the most controversial topics of the campaign. Thus they represent a natural application scenario for Argument Mining. As existing research lacks solid empirical investigation of the typology of argument components in political debates, we fill this gap by proposing an Argument Mining approach to political debates. We address this task in an empirical manner by annotating 39 political debates from the last 50 years of US presidential campaigns, creating a new corpus of 29k argument components, labeled as premises and claims. We then propose two tasks: (1) identifying the argumentative components in such debates, and (2) classifying them as premises and claims. We show that feature-rich SVM learners and Neural Network architectures outperform standard baselines in Argument Mining over such complex data. We release the new corpus USElecDeb60To16 and the accompanying software under free licenses to the research community.

pdf bib
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Third Workshop on Abusive Language Online

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

2018

pdf bib
Evidence Type Classification in Randomized Controlled Trials
Tobias Mayer | Elena Cabrio | Serena Villata
Proceedings of the 5th Workshop on Argument Mining

Randomized Controlled Trials (RCT) are a common type of experimental studies in the medical domain for evidence-based decision making. The ability to automatically extract the arguments proposed therein can be of valuable support for clinicians and practitioners in their daily evidence-based decision making activities. Given the peculiarity of the medical domain and the required level of detail, standard approaches to argument component detection in argument(ation) mining are not fine-grained enough to support such activities. In this paper, we introduce a new sub-task of the argument component identification task: evidence type classification. To address it, we propose a supervised approach and we test it on a set of RCT abstracts on different medical topics.

pdf bib
Lyrics Segmentation: Textual Macrostructure Detection using Convolutions
Michael Fell | Yaroslav Nechaev | Elena Cabrio | Fabien Gandon
Proceedings of the 27th International Conference on Computational Linguistics

Lyrics contain repeated patterns that are correlated with the repetitions found in the music they accompany. Repetitions in song texts have been shown to enable lyrics segmentation – a fundamental prerequisite of automatically detecting the building blocks (e.g. chorus, verse) of a song text. In this article we improve on the state-of-the-art in lyrics segmentation by applying a convolutional neural network to the task, and experiment with novel features as a step towards deeper macrostructure detection of lyrics.

pdf bib
Measuring Frame Instance Relatedness
Valerio Basile | Roque Lopez Condori | Elena Cabrio
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Frame semantics is a well-established framework to represent the meaning of natural language in computational terms. In this work, we aim to propose a quantitative measure of relatedness between pairs of frame instances. We test our method on a dataset of sentence pairs, highlighting the correlation between our metric and human judgments of semantic similarity. Furthermore, we propose an application of our measure for clustering frame instances to extract prototypical knowledge from natural language.

2017

pdf bib
Building timelines of soccer matches from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This demo paper presents a system that builds a timeline with salient actions of a soccer game, based on the tweets posted by users. It combines information provided by external knowledge bases to enrich the content of tweets and applies graph theory to model relations between actions (e.g. goals, penalties) and participants of a game (e.g. players, teams). In the demo, a web application displays in nearly real-time the actions detected from tweets posted by users for a given match of Euro 2016. Our tools are freely available at https://bitbucket.org/eamosse/event_tracking.

pdf bib
You’ll Never Tweet Alone: Building Sports Match Timelines from Microblog Posts
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we propose an approach to build a timeline with actions in a sports game based on tweets. We combine information provided by external knowledge bases to enrich the content of the tweets, and apply graph theory to model relations between actions and participants in a game. We demonstrate the validity of our approach using tweets collected during the EURO 2016 Championship and evaluate the output against live summaries produced by sports channels.

pdf bib
Graph-based Event Extraction from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Detecting which tweets describe a specific event and clustering them is one of the main challenging tasks related to Social Media currently addressed in the NLP community. Existing approaches have mainly focused on detecting spikes in clusters around specific keywords or Named Entities (NE). However, one of the main drawbacks of such approaches is the difficulty in understanding when the same keywords describe different events. In this paper, we propose a novel approach that exploits NE mentions in tweets and their entity context to create a temporal event graph. Then, using simple graph theory techniques and a PageRank-like algorithm, we process the event graphs to detect clusters of tweets describing the same events. Experiments on two gold standard datasets show that our approach achieves state-of-the-art results both in terms of evaluation performances and the quality of the detected events.

pdf bib
Argument Mining on Twitter: Arguments, Facts and Sources
Mihai Dusmanu | Elena Cabrio | Serena Villata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Social media collect and spread on the Web personal opinions, facts, fake news and all kind of information users may be interested in. Applying argument mining methods to such heterogeneous data sources is a challenging open research issue, in particular considering the peculiarities of the language used to write textual messages on social media. In addition, new issues emerge when dealing with arguments posted on such platforms, such as the need to make a distinction between personal opinions and actual facts, and to detect the source disseminating information about such facts to allow for provenance verification. In this paper, we apply supervised classification to identify arguments on Twitter, and we present two new tasks for argument mining, namely facts recognition and source identification. We study the feasibility of the approaches proposed to address these tasks on a set of tweets related to the Grexit and Brexit news topics.

2016

pdf bib
DART: a Dataset of Arguments and their Relations on Twitter
Tom Bosc | Elena Cabrio | Serena Villata
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The problem of understanding the stream of messages exchanged on social media such as Facebook and Twitter is becoming a major challenge for automated systems. The tremendous amount of data exchanged on these platforms as well as the specific form of language adopted by social media users constitute a new challenging context for existing argument mining techniques. In this paper, we describe a resource of natural language arguments called DART (Dataset of Arguments and their Relations on Twitter) where the complete argument mining pipeline over Twitter messages is considered: (i) we identify which tweets can be considered as arguments and which cannot, and (ii) we identify what is the relation, i.e., support or attack, linking such tweets to each other.

2014

pdf bib
Classifying Inconsistencies in DBpedia Language Specific Chapters
Elena Cabrio | Serena Villata | Fabien Gandon
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper proposes a methodology to identify and classify the semantic relations holding among the possible different answers obtained for a certain query on DBpedia language specific chapters. The goal is to reconcile information provided by language specific DBpedia chapters to obtain a consistent results set. Starting from the identified semantic relations between two pieces of information, we further classify them as positive or negative, and we exploit bipolar abstract argumentation to represent the result set as a unique graph, where using argumentation semantics we are able to detect the (possible multiple) consistent sets of elements of the query result. We experimented with the proposed methodology over a sample of triples extracted from 10 DBpedia ontology properties. We define the LingRel ontology to represent how the extracted information from different chapters is related to each other, and we map the properties of the LingRel ontology to the properties of the SIOC-Argumentation ontology to built argumentation graphs. The result is a pilot resource that can be profitably used both to train and to evaluate NLP applications querying linked data in detecting the semantic relations among the extracted values, in order to output consistent information sets.

2013

pdf bib
Detecting Bipolar Semantic Relations among Natural Language Arguments with Textual Entailment: a Study.
Elena Cabrio | Serena Villata
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2012

pdf bib
Hunting for Entailing Pairs in the Penn Discourse Treebank
Sara Tonelli | Elena Cabrio
Proceedings of COLING 2012

pdf bib
Key-concept extraction from French articles with KX
Sara Tonelli | Elena Cabrio | Emanuele Pianta
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

pdf bib
Extracting Context-Rich Entailment Rules from Wikipedia Revision History
Elena Cabrio | Bernardo Magnini | Angelina Ivanova
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

pdf bib
Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
Elena Cabrio | Serena Villata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Towards Component-Based Textual Entailment
Elena Cabrio | Bernardo Magnini
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

pdf bib
Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference
Luisa Bentivogli | Elena Cabrio | Ido Dagan | Danilo Giampiccolo | Medea Lo Leggio | Bernardo Magnini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes a methodology for the creation of specialized data sets for Textual Entailment, made of monothematic Text-Hypothesis pairs (i.e. pairs in which only one linguistic phenomenon relevant to the entailment relation is highlighted and isolated). The expected benefits derive from the intuition that investigating the linguistic phenomena separately, i.e. decomposing the complexity of the TE problem, would yield an improvement in the development of specific strategies to cope with them. The annotation procedure assumes that humans have knowledge about the linguistic phenomena relevant to inference, and a classification of such phenomena both into fine grained and macro categories is suggested. We experimented with the proposed methodology over a sample of pairs taken from the RTE-5 data set, and investigated critical issues arising when entailment, contradiction or unknown pairs are considered. The result is a new resource, which can be profitably used both to advance the comprehension of the linguistic phenomena relevant to entailment judgments and to make a first step towards the creation of large-scale specialized data sets.

pdf bib
Contradiction-focused qualitative evaluation of textual entailment
Bernardo Magnini | Elena Cabrio
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib
Toward Qualitative Evaluation of Textual Entailment Systems
Elena Cabrio | Bernardo Magnini
Coling 2010: Posters

2008

pdf bib
The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering
Elena Cabrio | Milen Kouylekov | Bernardo Magnini | Matteo Negri | Laura Hasler | Constantin Orasan | David Tomás | Jose Luis Vicedo | Guenter Neumann | Corinna Weber
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the QALL-ME benchmark, a multilingual resource of annotated spoken requests in the tourism domain, freely available for research purposes. The languages currently involved in the project are Italian, English, Spanish and German. It introduces a semantic annotation scheme for spoken information access requests, specifically derived from Question Answering (QA) research. In addition to pragmatic and semantic annotations, we propose three QA-based annotation levels: the Expected Answer Type, the Expected Answer Quantifier and the Question Topical Target of a request, to fully capture the content of a request and extract the sought-after information. The QALL-ME benchmark is developed under the EU-FP6 QALL-ME project which aims at the realization of a shared and distributed infrastructure for Question Answering (QA) systems on mobile devices (e.g. mobile phones). Questions are formulated by the users in free natural language input, and the system returns the actual sequence of words which constitutes the answer from a collection of information sources (e.g. documents, databases). Within this framework, the benchmark has the twofold purpose of training machine learning based applications for QA, and testing their actual performance with a rapid turnaround in controlled laboratory setting.