Elena Cabrio - ACL Anthology

Elena Cabrio

2026

Weakly-supervised Argument Mining with Boundary Refinement and Relation Denoising
Wei Sun | Mingxiao Li | Jesse Davis | Elena Cabrio | Serena Villata | Marie-Francine Moens
Proceedings of the Ninth Fact Extraction and VERification Workshop (FEVER)

Argument mining (AM) involves extracting argument components and predicting relations between them to create argumentative graphs, which are essential for applications requiring argumentative comprehension. To automatically provide high-quality graphs, previous works require a large amount of human-annotated training samples to train AM models. Instead, we leverage a large language model (LLM) to assign pseudo-labels to training samples for reducing reliance on human-annotated training data. However, the training data weakly-labeled by the LLM are too noisy to develop an AM model with reliable performance. In this paper, to improve the model performance, we propose a center-based component detector that refines the boundaries of the detected components and a relation denoiser to deal with noise present in the pseudo-labels when classifying relations between detected components. Experimentally, our AM model improves the boundary detection obtained from the LLM by up to 16% in terms of IoU75 and of the relation classification obtained from the LLM by up to 12% in terms of macro-F1 score. Our AM model achieves new state-of-the-art performance in weakly-supervised AM, showing up to a 6% improvement over the state-of-the-art component detector and up to a 7% improvement over the state-of-the-art relation classifier. Additionally, our model uses less than 20% of human-annotated data to match the performance of state-of-the-art fully-supervised AM models.

Stakeholder Suite: A Unified AI Framework for Mapping Actors, Topics and Arguments in Public Debates
Mohamed Chenene | Jeanne Rouhier | Jean Daniélou | Mihir Sarkar | Elena Cabrio
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Public debates surrounding infrastructure and energy projects involve complex networks of stakeholders, arguments, and evolving narratives. Understanding these dynamics is crucial for anticipating controversies and informing engagement strategies, yet existing tools in media intelligence largely rely on descriptive analytics with limited transparency. This paper presents **Stakeholder Suite**, a framework deployed in operational contexts for mapping actors, topics, and arguments within public debates. The system combines actor detection, topic modeling, argument extraction and stance classification in a unified pipeline. Tested on multiple energy infrastructure projects as a case study, the approach delivers fine-grained, source-grounded insights while remaining adaptable to diverse domains. The framework achieves strong retrieval precision and stance accuracy, producing arguments judged relevant in 75% of pilot use cases. Beyond quantitative metrics, the tool has proven effective for operational use: helping project teams visualize networks of influence, identify emerging controversies, and support evidence-based decision-making.

2025

AM4DSP: Argumentation Mining in Structured Decentralized Discussion Platforms for Deliberative Democracy
Sofiane Elguendouze | Lucas Anastasiou | Erwan Hain | Elena Cabrio | Anna De Liddo | Serena Villata
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Argument(ation) mining (AM) is the automated process of identification and extraction of argumentative structures in natural language. This field has seen rapid advancements, offering powerful tools to analyze and interpret complex and large discourse in diverse domains (political debates, medical reports, etc.). In this paper we introduce an AM-boosted version of BCause, a large-scale deliberation platform.The system enables the extraction and analysis of arguments from online discussions in the context of deliberative democracy, which aims to enhance the understanding and accessibility of structured argumentation in large-scale deliberation processes.

Overview of the Critical Questions Generation Shared Task
Blanca Calvo Figueras | Jaione Bengoetxea | Maite Heredia | Ekaterina Sviridova | Elena Cabrio | Serena Villata | Rodrigo Agerri
Proceedings of the 12th Argument mining Workshop

The proliferation of AI technologies has reinforced the importance of developing critical thinking skills. We propose leveraging Large Language Models (LLMs) to facilitate the generation of critical questions: inquiries designed to identify fallacious or inadequately constructed arguments. This paper presents an overview of the first shared task on Critical Questions Generation (CQs-Gen). Thirteen teams investigated various methodologies for generating questions that critically assess arguments within the provided texts. The highest accuracy achieved was 67.6, indicating substantial room for improvement in this task. Moreover, three of the four top-performing teams incorporated argumentation scheme annotations to enhance their systems. Finally, while most participants employed open-weight models, the two highest-ranking teams relied on proprietary LLMs.

DISPUTool 3.0: Fallacy Detection and Repairing in Argumentative Political Debates
Pierpaolo Goffredo | Deborah Dore | Elena Cabrio | Serena Villata
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

This paper introduces and evaluates a novel web-based application designed to identify and repair fallacious arguments in political debates. DISPUTool 3.0 offers a comprehensive tool for argumentation analysis of political debate, integrating state-of-the-art natural language processing techniques to mine and classify argument components and relations. DISPUTool 3.0 builds on the ElecDeb60to20 dataset, covering US presidential debates from 1960 to 2020. In this paper, we introduce a novel task which is integrated as a new module in DISPUTool, i.e., the automatic detection and classification of fallacious arguments, and the automatic repairing of such misleading arguments. The goal is to show to the user a tool which not only identifies fallacies in political debates, but it also shows how the argument looks like once the veil of fallacy falls down. An extensive evaluation of the module is addressed employing both automated metrics and human assessments. With the inclusion of this module, DISPUTool 3.0 advances even more user critical thinking in front of the augmenting spread of such nefarious kind of content in political debates and beyond. The tool is publicly available here: https://3ia-demos.inria.fr/disputool/

CyberAgressionAdo-Large: French Multiparty Chat Dataset to Address Online Hate
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Valerio Basile
Traitement Automatique des Langues, Volume 65, Numéro 3 : Discours de haine : ressources linguistiques, méthodes et applications [Abusive Language: Linguistic Resources, Methods and Applications]

2024

MedMT5: An Open-Source Multilingual Text-to-Text LLM for the Medical Domain
Iker García-Ferrero | Rodrigo Agerri | Aitziber Atutxa Salazar | Elena Cabrio | Iker de la Iglesia | Alberto Lavelli | Bernardo Magnini | Benjamin Molinet | Johana Ramirez-Romero | German Rigau | Jose Maria Villa-Gonzalez | Serena Villata | Andrea Zaninello
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.

Argument Quality Assessment in the Age of Instruction-Following Large Language Models
Henning Wachsmuth | Gabriella Lapesa | Elena Cabrio | Anne Lauscher | Joonsuk Park | Eva Maria Vecchi | Serena Villata | Timon Ziegenbein
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument’s quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.

Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering
Helena Bonaldi | Greta Damo | Nicolás Benjamín Ocampo | Elena Cabrio | Serena Villata | Marco Guerini
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The potential effectiveness of counterspeech as a hate speech mitigation strategy is attracting increasing interest in the NLG research community, particularly towards the task of automatically producing it. However, automatically generated responses often lack the argumentative richness which characterises expert-produced counterspeech. In this work, we focus on two aspects of counterspeech generation to produce more cogent responses. First, by investigating the tension between helpfulness and harmlessness of LLMs, we test whether the presence of safety guardrails hinders the quality of the generations. Secondly, we assess whether attacking a specific component of the hate speech results in a more effective argumentative strategy to fight online hate. By conducting an extensive human and automatic evaluation, we show how the presence of safety guardrails can be detrimental also to a task that inherently aims at fostering positive social interactions. Moreover, our results show that attacking a specific component of the hate speech, and in particular its implicit negative stereotype and its hateful parts, leads to higher-quality generations.

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures
Ekaterina Sviridova | Anar Yeginbergen | Ainara Estarrona | Elena Cabrio | Serena Villata | Rodrigo Agerri
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Explaining Artificial Intelligence (AI) decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issues also for human-based deliberation as it is important to justify why a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support). The Multilingual CasiMedicos-arg dataset consists of 558 clinical cases (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.

2023

Unmasking the Hidden Meaning: Bridging Implicit and Explicit Hate Speech Embedding Representations
Nicolás Benjamín Ocampo | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2023

Research on automatic hate speech (HS) detection has mainly focused on identifying explicit forms of hateful expressions on user-generated content. Recently, a few works have started to investigate methods to address more implicit and subtle abusive content. However, despite these efforts, automated systems still struggle to correctly recognize implicit and more veiled forms of HS. As these systems heavily rely on proper textual representations for classification, it is crucial to investigate the differences in embedding implicit and explicit messages. Our contribution to address this challenging task is fourfold. First, we present a comparative analysis of transformer-based models, evaluating their performance across five datasets containing implicit HS messages. Second, we examine the embedding representations of implicit messages across different targets, gaining insight into how veiled cases are encoded. Third, we compare and link explicit and implicit hateful messages across these datasets through their targets, enforcing the relation between explicitness and implicitness and obtaining more meaningful embedding representations. Lastly, we show how these newer representation maintains high performance on HS labels, while improving classification in borderline cases.

Playing the Part of the Sharp Bully: Generating Adversarial Examples for Implicit Hate Speech Detection
Nicolás Benjamín Ocampo | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: ACL 2023

Research on abusive content detection on social media has primarily focused on explicit forms of hate speech (HS), that are often identifiable by recognizing hateful words and expressions. Messages containing linguistically subtle and implicit forms of hate speech still constitute an open challenge for automatic hate speech detection. In this paper, we propose a new framework for generating adversarial implicit HS short-text messages using Auto-regressive Language Models. Moreover, we propose a strategy to group the generated implicit messages in complexity levels (EASY, MEDIUM, and HARD categories) characterizing how challenging these messages are for supervised classifiers. Finally, relying on (Dinan et al., 2019; Vidgen et al., 2021), we propose a “build it, break it, fix it”, training scheme using HARD messages showing how iteratively retraining on HARD messages substantially leverages SOTA models’ performances on implicit HS benchmarks.

Argument-based Detection and Classification of Fallacies in Political Debates
Pierpaolo Goffredo | Mariana Chaves | Serena Villata | Elena Cabrio
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Fallacies are arguments that employ faulty reasoning. Given their persuasive and seemingly valid nature, fallacious arguments are often used in political debates. Employing these misleading arguments in politics can have detrimental consequences for society, since they can lead to inaccurate conclusions and invalid inferences from the public opinion and the policymakers. Automatically detecting and classifying fallacious arguments represents therefore a crucial challenge to limit the spread of misleading or manipulative claims and promote a more informed and healthier political discourse. Our contribution to address this challenging task is twofold. First, we extend the ElecDeb60To16 dataset of U.S. presidential debates annotated with fallacious arguments, by incorporating the most recent Trump-Biden presidential debate. We include updated token-level annotations, incorporating argumentative components (i.e., claims and premises), the relations between these components (i.e., support and attack), and six categories of fallacious arguments (i.e., Ad Hominem, Appeal to Authority, Appeal to Emotion, False Cause, Slippery Slope, and Slogans). Second, we perform the twofold task of fallacious argument detection and classification by defining neural network architectures based on Transformers models, combining text, argumentative features, and engineered features. Our results show the advantages of complementing transformer-generated text representations with non-text features.

An In-depth Analysis of Implicit and Subtle Hate Speech Messages
Nicolás Benjamín Ocampo | Ekaterina Sviridova | Elena Cabrio | Serena Villata
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The research carried out so far in detecting abusive content in social media has primarily focused on overt forms of hate speech. While explicit hate speech (HS) is more easily identifiable by recognizing hateful words, messages containing linguistically subtle and implicit forms of HS (as circumlocution, metaphors and sarcasm) constitute a real challenge for automatic systems. While the sneaky and tricky nature of subtle messages might be perceived as less hurtful with respect to the same content expressed clearly, such abuse is at least as harmful as overt abuse. In this paper, we first provide an in-depth and systematic analysis of 7 standard benchmarks for HS detection, relying on a fine-grained and linguistically-grounded definition of implicit and subtle messages. Then, we experiment with state-of-the-art neural network architectures on two supervised tasks, namely implicit HS and subtle HS message classification. We show that while such models perform satisfactory on explicit messages, they fail to detect implicit and subtle content, highlighting the fact that HS detection is not a solved problem and deserves further investigation.

2022

CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game
Anaïs Ollagnier | Elena Cabrio | Serena Villata | Catherine Blaya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Over the past decades, the number of episodes of cyber aggression occurring online has grown substantially, especially among teens. Most solutions investigated by the NLP community to curb such online abusive behaviors consist of supervised approaches relying on annotated data extracted from social media. However, recent studies have highlighted that private instant messaging platforms are major mediums of cyber aggression among teens. As such interactions remain invisible due to the app privacy policies, very few datasets collecting aggressive conversations are available for the computational analysis of language. In order to overcome this limitation, in this paper we present the CyberAgressionAdo-V1 dataset, containing aggressive multiparty chats in French collected through a role-playing game in high-schools, and annotated at different layers. We describe the data collection and annotation phases, carried out in the context of a EU and a national research projects, and provide insightful analysis on the different types of aggression and verbal abuse depending on the targeted victims (individuals or communities) emerging from the collected data.

Graph Embeddings for Argumentation Quality Assessment
Santiago Marro | Elena Cabrio | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2022

Argumentation is used by people both internally, by evaluating arguments and counterarguments to make sense of a situation and take a decision, and externally, e.g., in a debate, by exchanging arguments to reach an agreement or to promote an individual position. In this context, the assessment of the quality of the arguments is of extreme importance, as it strongly influences the evaluation of the overall argumentation, impacting on the decision making process. The automatic assessment of the quality of natural language arguments is recently attracting interest in the Argument Mining field. However, the issue of automatically assessing the quality of an argumentation largely remains a challenging unsolved task. Our contribution is twofold: first, we present a novel resource of 402 student persuasive essays, where three main quality dimensions (i.e., cogency, rhetoric, and reasonableness) have been annotated, leading to 1908 arguments tagged with quality facets; second, we address this novel task of argumentation quality assessment proposing a novel neural architecture based on graph embeddings, that combines both the textual features of the natural language arguments and the overall argument graph, i.e., considering also the support and attack relations holding among the arguments. Results on the persuasive essays dataset outperform state-of-the-art and standard baselines’ performance.

2021

“Don’t discuss”: Investigating Semantic and Argumentative Features for Supervised Propagandist Message Detection and Classification
Vorakit Vorakitphan | Elena Cabrio | Serena Villata
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

One of the mechanisms through which disinformation is spreading online, in particular through social media, is by employing propaganda techniques. These include specific rhetorical and psychological strategies, ranging from leveraging on emotions to exploiting logical fallacies. In this paper, our goal is to push forward research on propaganda detection based on text analysis, given the crucial role these methods may play to address this main societal issue. More precisely, we propose a supervised approach to classify textual snippets both as propaganda messages and according to the precise applied propaganda technique, as well as a detailed linguistic analysis of the features characterising propaganda information in text (e.g., semantic, sentiment and argumentation features). Extensive experiments conducted on two available propagandist resources (i.e., NLP4IF’19 and SemEval’20-Task 11 datasets) show that the proposed approach, leveraging different language models and the investigated linguistic features, achieves very promising results on propaganda classification, both at sentence- and at fragment-level.

Extraction d’arguments basée sur les transformateurs pour des applications dans le domaine de la santé (Transformer-based Argument Mining for Healthcare Applications)
Tobias Mayer | Elena Cabrio | Serena Villata
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nous présentons des résumés en français et en anglais de l’article (Mayer et al., 2020) présenté à la conférence 24th European Conference on Artificial Intelligence (ECAI-2020) en 2020.

Sifting French Tweets to Investigate the Impact of Covid-19 in Triggering Intense Anxiety
Mohamed Amine Romdhane | Elena Cabrio | Serena Villata
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Sifting French Tweets to Investigate the Impact of Covid-19 in Triggering Intense Anxiety. Social media can be leveraged to understand public sentiment and feelings in real-time, and target public health messages based on user interests and emotions. In this paper, we investigate the impact of the COVID-19 pandemic in triggering intense anxiety, relying on messages exchanged on Twitter. More specifically, we provide : i) a quantitative and qualitative analysis of a corpus of tweets in French related to coronavirus, and ii) a pipeline approach (a filtering mechanism followed by Neural Network methods) to satisfactory classify messages expressing intense anxiety on social media, considering the role played by emotions.

PROTECT - A Pipeline for Propaganda Detection and Classification
Vorakit Vorakitphan | Elena Cabrio | Serena Villata
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

Love Me, Love Me, Say (and Write!) that You Love Me: Enriching the WASABI Song Corpus with Lyrics Annotations
Michael Fell | Elena Cabrio | Elmahdi Korfed | Michel Buffa | Fabien Gandon
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the WASABI Song Corpus, a large corpus of songs enriched with metadata extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an important part of the semantics of a song, we focus here on the description of the methods we proposed to extract relevant information from the lyrics, as their structure segmentation, their topic, the explicitness of the lyrics content, the salient passages of a song and the emotions conveyed. The creation of the resource is still ongoing: so far, the corpus contains 1.73M songs with lyrics (1.41M unique lyrics) annotated at different levels with the output of the above mentioned methods. Such corpus labels and the provided methods can be exploited by music search engines and music professionals (e.g. journalists, radio presenters) to better handle large collections of lyrics, allowing an intelligent browsing, categorization and segmentation recommendation of songs.

Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emojis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero- shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.

Regrexit or not Regrexit: Aspect-based Sentiment Analysis in Polarized Contexts
Vorakit Vorakitphan | Marco Guerini | Elena Cabrio | Serena Villata
Proceedings of the 28th International Conference on Computational Linguistics

Emotion analysis in polarized contexts represents a challenge for Natural Language Processing modeling. As a step in the aforementioned direction, we present a methodology to extend the task of Aspect-based Sentiment Analysis (ABSA) toward the affect and emotion representation in polarized settings. In particular, we adopt the three-dimensional model of affect based on Valence, Arousal, and Dominance (VAD). We then present a Brexit scenario that proves how affect varies toward the same aspect when politically polarized stances are presented. Our approach captures aspect-based polarization from newspapers regarding the Brexit scenario of 1.2m entities at sentence-level. We demonstrate how basic constituents of emotions can be mapped to the VAD model, along with their interactions respecting the polarized context in ABSA settings using biased key-concepts (e.g., “stop Brexit” vs. “support Brexit”). Quite intriguingly, the framework achieves to produce coherent aspect evidences of Brexit’s stance from key-concepts, showing that VAD influence the support and opposition aspects.

Proceedings of the 7th Workshop on Argument Mining
Elena Cabrio | Serena Villata
Proceedings of the 7th Workshop on Argument Mining

2019

A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Third Workshop on Abusive Language Online

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

Comparing Automated Methods to Detect Explicit Content in Song Lyrics
Michael Fell | Elena Cabrio | Michele Corazza | Fabien Gandon
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children. Since 2015, digital providers – such as iTunes, Spotify, Amazon Music and Deezer – also follow PAL guidelines and tag such tracks as “explicit”. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discuss the inherent hardness and subjectivity of the task.

Song Lyrics Summarization Inspired by Audio Thumbnailing
Michael Fell | Elena Cabrio | Fabien Gandon | Alain Giboin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Given the peculiar structure of songs, applying generic text summarization methods to lyrics can lead to the generation of highly redundant and incoherent text. In this paper, we propose to enhance state-of-the-art text summarization approaches with a method inspired by audio thumbnailing. Instead of searching for the thumbnail clues in the audio of the song, we identify equivalent clues in the lyrics. We then show how these summaries that take into account the audio nature of the lyrics outperform the generic methods according to both an automatic evaluation and human judgments.

Yes, we can! Mining Arguments in 50 Years of US Presidential Campaign Debates
Shohreh Haddadan | Elena Cabrio | Serena Villata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Political debates offer a rare opportunity for citizens to compare the candidates’ positions on the most controversial topics of the campaign. Thus they represent a natural application scenario for Argument Mining. As existing research lacks solid empirical investigation of the typology of argument components in political debates, we fill this gap by proposing an Argument Mining approach to political debates. We address this task in an empirical manner by annotating 39 political debates from the last 50 years of US presidential campaigns, creating a new corpus of 29k argument components, labeled as premises and claims. We then propose two tasks: (1) identifying the argumentative components in such debates, and (2) classifying them as premises and claims. We show that feature-rich SVM learners and Neural Network architectures outperform standard baselines in Argument Mining over such complex data. We release the new corpus USElecDeb60To16 and the accompanying software under free licenses to the research community.

Cross-Platform Evaluation for Italian Hate Speech Detection
Michele Corazza | Stefano Menini | Elena Cabrio | Sara Tonelli | Serena Villata
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

2018

Evidence Type Classification in Randomized Controlled Trials
Tobias Mayer | Elena Cabrio | Serena Villata
Proceedings of the 5th Workshop on Argument Mining

Randomized Controlled Trials (RCT) are a common type of experimental studies in the medical domain for evidence-based decision making. The ability to automatically extract the arguments proposed therein can be of valuable support for clinicians and practitioners in their daily evidence-based decision making activities. Given the peculiarity of the medical domain and the required level of detail, standard approaches to argument component detection in argument(ation) mining are not fine-grained enough to support such activities. In this paper, we introduce a new sub-task of the argument component identification task: evidence type classification. To address it, we propose a supervised approach and we test it on a set of RCT abstracts on different medical topics.

Measuring Frame Instance Relatedness
Valerio Basile | Roque Lopez Condori | Elena Cabrio
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Frame semantics is a well-established framework to represent the meaning of natural language in computational terms. In this work, we aim to propose a quantitative measure of relatedness between pairs of frame instances. We test our method on a dataset of sentence pairs, highlighting the correlation between our metric and human judgments of semantic similarity. Furthermore, we propose an application of our measure for clustering frame instances to extract prototypical knowledge from natural language.

Lyrics Segmentation: Textual Macrostructure Detection using Convolutions
Michael Fell | Yaroslav Nechaev | Elena Cabrio | Fabien Gandon
Proceedings of the 27th International Conference on Computational Linguistics

Lyrics contain repeated patterns that are correlated with the repetitions found in the music they accompany. Repetitions in song texts have been shown to enable lyrics segmentation – a fundamental prerequisite of automatically detecting the building blocks (e.g. chorus, verse) of a song text. In this article we improve on the state-of-the-art in lyrics segmentation by applying a convolutional neural network to the task, and experiment with novel features as a step towards deeper macrostructure detection of lyrics.

The SEEMPAD Dataset for Emphatic and Persuasive Argumentation
Elena Cabrio | Serena Villata
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Preface
Elena Cabrio | Alessandro Mazzei | Fabio Tamburini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Elena Cabrio | Alessandro Mazzei | Fabio Tamburini
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

Graph-based Event Extraction from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Detecting which tweets describe a specific event and clustering them is one of the main challenging tasks related to Social Media currently addressed in the NLP community. Existing approaches have mainly focused on detecting spikes in clusters around specific keywords or Named Entities (NE). However, one of the main drawbacks of such approaches is the difficulty in understanding when the same keywords describe different events. In this paper, we propose a novel approach that exploits NE mentions in tweets and their entity context to create a temporal event graph. Then, using simple graph theory techniques and a PageRank-like algorithm, we process the event graphs to detect clusters of tweets describing the same events. Experiments on two gold standard datasets show that our approach achieves state-of-the-art results both in terms of evaluation performances and the quality of the detected events.

You’ll Never Tweet Alone: Building Sports Match Timelines from Microblog Posts
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we propose an approach to build a timeline with actions in a sports game based on tweets. We combine information provided by external knowledge bases to enrich the content of the tweets, and apply graph theory to model relations between actions and participants in a game. We demonstrate the validity of our approach using tweets collected during the EURO 2016 Championship and evaluate the output against live summaries produced by sports channels.

Building timelines of soccer matches from Twitter
Amosse Edouard | Elena Cabrio | Sara Tonelli | Nhan Le-Thanh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This demo paper presents a system that builds a timeline with salient actions of a soccer game, based on the tweets posted by users. It combines information provided by external knowledge bases to enrich the content of tweets and applies graph theory to model relations between actions (e.g. goals, penalties) and participants of a game (e.g. players, teams). In the demo, a web application displays in nearly real-time the actions detected from tweets posted by users for a given match of Euro 2016. Our tools are freely available at https://bitbucket.org/eamosse/event_tracking.

Argument Mining on Twitter: Arguments, Facts and Sources
Mihai Dusmanu | Elena Cabrio | Serena Villata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Social media collect and spread on the Web personal opinions, facts, fake news and all kind of information users may be interested in. Applying argument mining methods to such heterogeneous data sources is a challenging open research issue, in particular considering the peculiarities of the language used to write textual messages on social media. In addition, new issues emerge when dealing with arguments posted on such platforms, such as the need to make a distinction between personal opinions and actual facts, and to detect the source disseminating information about such facts to allow for provenance verification. In this paper, we apply supervised classification to identify arguments on Twitter, and we present two new tasks for argument mining, namely facts recognition and source identification. We study the feasibility of the approaches proposed to address these tasks on a set of tweets related to the Grexit and Brexit news topics.

2016

DART: a Dataset of Arguments and their Relations on Twitter
Tom Bosc | Elena Cabrio | Serena Villata
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The problem of understanding the stream of messages exchanged on social media such as Facebook and Twitter is becoming a major challenge for automated systems. The tremendous amount of data exchanged on these platforms as well as the specific form of language adopted by social media users constitute a new challenging context for existing argument mining techniques. In this paper, we describe a resource of natural language arguments called DART (Dataset of Arguments and their Relations on Twitter) where the complete argument mining pipeline over Twitter messages is considered: (i) we identify which tweets can be considered as arguments and which cannot, and (ii) we identify what is the relation, i.e., support or attack, linking such tweets to each other.

2014

Classifying Inconsistencies in DBpedia Language Specific Chapters
Elena Cabrio | Serena Villata | Fabien Gandon
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper proposes a methodology to identify and classify the semantic relations holding among the possible different answers obtained for a certain query on DBpedia language specific chapters. The goal is to reconcile information provided by language specific DBpedia chapters to obtain a consistent results set. Starting from the identified semantic relations between two pieces of information, we further classify them as positive or negative, and we exploit bipolar abstract argumentation to represent the result set as a unique graph, where using argumentation semantics we are able to detect the (possible multiple) consistent sets of elements of the query result. We experimented with the proposed methodology over a sample of triples extracted from 10 DBpedia ontology properties. We define the LingRel ontology to represent how the extracted information from different chapters is related to each other, and we map the properties of the LingRel ontology to the properties of the SIOC-Argumentation ontology to built argumentation graphs. The result is a pilot resource that can be profitably used both to train and to evaluate NLP applications querying linked data in detecting the semantic relations among the extracted values, in order to output consistent information sets.

2013

Detecting Bipolar Semantic Relations among Natural Language Arguments with Textual Entailment: a Study.
Elena Cabrio | Serena Villata
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2012

Extracting Context-Rich Entailment Rules from Wikipedia Revision History
Elena Cabrio | Bernardo Magnini | Angelina Ivanova
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

Key-concept extraction from French articles with KX
Sara Tonelli | Elena Cabrio | Emanuele Pianta
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
Elena Cabrio | Serena Villata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Hunting for Entailing Pairs in the Penn Discourse Treebank
Sara Tonelli | Elena Cabrio
Proceedings of COLING 2012

2011

Towards Component-Based Textual Entailment
Elena Cabrio | Bernardo Magnini
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

Contradiction-focused qualitative evaluation of textual entailment
Bernardo Magnini | Elena Cabrio
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference
Luisa Bentivogli | Elena Cabrio | Ido Dagan | Danilo Giampiccolo | Medea Lo Leggio | Bernardo Magnini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper proposes a methodology for the creation of specialized data sets for Textual Entailment, made of monothematic Text-Hypothesis pairs (i.e. pairs in which only one linguistic phenomenon relevant to the entailment relation is highlighted and isolated). The expected benefits derive from the intuition that investigating the linguistic phenomena separately, i.e. decomposing the complexity of the TE problem, would yield an improvement in the development of specific strategies to cope with them. The annotation procedure assumes that humans have knowledge about the linguistic phenomena relevant to inference, and a classification of such phenomena both into fine grained and macro categories is suggested. We experimented with the proposed methodology over a sample of pairs taken from the RTE-5 data set, and investigated critical issues arising when entailment, contradiction or unknown pairs are considered. The result is a new resource, which can be profitably used both to advance the comprehension of the linguistic phenomena relevant to entailment judgments and to make a first step towards the creation of large-scale specialized data sets.

Toward Qualitative Evaluation of Textual Entailment Systems
Elena Cabrio | Bernardo Magnini
Coling 2010: Posters

2008

The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering
Elena Cabrio | Milen Kouylekov | Bernardo Magnini | Matteo Negri | Laura Hasler | Constantin Orasan | David Tomás | Jose Luis Vicedo | Guenter Neumann | Corinna Weber
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the QALL-ME benchmark, a multilingual resource of annotated spoken requests in the tourism domain, freely available for research purposes. The languages currently involved in the project are Italian, English, Spanish and German. It introduces a semantic annotation scheme for spoken information access requests, specifically derived from Question Answering (QA) research. In addition to pragmatic and semantic annotations, we propose three QA-based annotation levels: the Expected Answer Type, the Expected Answer Quantifier and the Question Topical Target of a request, to fully capture the content of a request and extract the sought-after information. The QALL-ME benchmark is developed under the EU-FP6 QALL-ME project which aims at the realization of a shared and distributed infrastructure for Question Answering (QA) systems on mobile devices (e.g. mobile phones). Questions are formulated by the users in free natural language input, and the system returns the actual sequence of words which constitutes the answer from a collection of information sources (e.g. documents, databases). Within this framework, the benchmark has the twofold purpose of training machine learning based applications for QA, and testing their actual performance with a rapid turnaround in controlled laboratory setting.

Co-authors

Nicolás Benjamín Ocampo 4

Rodrigo Agerri 3

Amosse Edouard 3

Nhan Le-Thanh 3

Stefano Menini 3

Ekaterina Sviridova 3

Vorakit Vorakitphan 3

Valerio Basile 2

Pierpaolo Goffredo 2

Marco Guerini 2

Alessandro Mazzei 2

Anaïs Ollagnier 2

Fabio Tamburini 2

Mohamed Amine Romdhane 1

Lucas Anastasiou 1

Aitziber Atutxa Salazar 1

Jaione Bengoetxea 1

Luisa Bentivogli 1

Catherine Blaya 1

Helena Bonaldi 1

Blanca Calvo Figueras 1

Mariana Chaves 1

Mohamed Chenene 1

Jean Daniélou 1

Iker De La Iglesia 1

Anna De Liddo 1

Mihai Dusmanu 1

Sofiane Elguendouze 1

Ainara Estarrona 1

Iker García-Ferrero 1

Danilo Giampiccolo 1

Shohreh Haddadan 1

Maite Heredia 1

Angelina Ivanova 1

Elmahdi Korfed 1

Milen Kouylekov 1

Gabriella Lapesa 1

Anne Lauscher 1

Alberto Lavelli 1

Medea Lo Leggio 1

Roque Lopez Condori 1

Santiago Marro 1

Marie Francine Moens 1

Benjamin Molinet 1

Giovanni Moretti 1

Yaroslav Nechaev 1

Günter Neumann 1

Constantin Orasan 1

Emanuele Pianta 1

Johana Ramirez-Romero 1

Jeanne Rouhier 1

Eva Maria Vecchi 1

José Luis Vicedo 1

Jose Maria Villa-Gonzalez 1

Henning Wachsmuth 1

Corinna Weber 1

Anar Yeginbergen 1

Andrea Zaninello 1

Timon Ziegenbein 1

Venues

JEP/TALN/RECITAL3