Federico Ruggeri

2025

Overview of MM-ArgFallacy2025 on Multimodal Argumentative Fallacy Detection and Classification in Political Debates
Eleonora Mancini | Federico Ruggeri | Serena Villata | Paolo Torroni
Proceedings of the 12th Argument mining Workshop

We present an overview of the MM-ArgFallacy2025 shared task on Multimodal Argumentative Fallacy Detection and Classification in Political Debates, co-located with the 12th Workshop on Argument Mining at ACL 2025. The task focuses on identifying and classifying argumentative fallacies across three input modes: text-only, audio-only, and multimodal (text+audio), offering both binary detection (AFD) and multi-class classification (AFC) subtasks. The dataset comprises 18,925 instances for AFD and 3,388 instances for AFC, from the MM-USED-Fallacy corpus on U.S. presidential debates, annotated for six fallacy types: Ad Hominem, Appeal to Authority, Appeal to Emotion, False Cause, Slippery Slope, and Slogan. A total of 5 teams participated: 3 on classification and 2 on detection. Participants employed transformer-based models, particularly RoBERTa variants, with strategies including prompt-guided data augmentation, context integration, specialised loss functions, and various fusion techniques. Audio processing ranged from MFCC features to state-of-the-art speech models. Results demonstrated textual modality dominance, with best text-only performance reaching 0.4856 F1-score for classification and 0.34 for detection. Audio-only approaches underperformed relative to text but showed improvements over previous work, while multimodal fusion showed limited improvements. This task establishes important baselines for multimodal fallacy analysis in political discourse, contributing to computational argumentation and misinformation detection capabilities.

pdf bib abs

Interlocking-free Selective Rationalization Through Genetic-based Learning
Federico Ruggeri | Gaetano Signorelli
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A popular end-to-end architecture for selective rationalization is the select-then-predict pipeline, comprising a generator to extract highlights fed to a predictor. Such a cooperative system suffers from suboptimal equilibrium minima due to the dominance of one of the two modules, a phenomenon known as interlocking. While several contributions aimed at addressing interlocking, they only mitigate its effect, often by introducing feature-based heuristics, sampling, and ad-hoc regularizations. We present GenSPP, the first interlocking-free architecture for selective rationalization that does not require any learning overhead, as the above-mentioned. GenSPP avoids interlocking by performing disjoint training of the generator and predictor via genetic global search. Experiments on a synthetic and a real-world benchmark show that our model outperforms several state-of-the-art competitors.

pdf bib abs

Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains
Katerina Korre | Arianna Muti | Federico Ruggeri | Alberto Barrón-Cedeño
Findings of the Association for Computational Linguistics: NAACL 2025

Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of hate speech definitions encompassing 493 definitions from more than 100 cultures, drawn from five key domains: online dictionaries, academic research, Wikipedia, legal texts, and online platforms. By decomposing these definitions into semantic components,our analysis reveals significant variation across definitions, yet many domains borrow definitions from one another without taking into account the target culture. We conduct zero-shot model experiments using our proposed dataset, employing three popular open-sourced LLMs to understand the impact of different definitions on hate speech detection. Our findings indicate that LLMs are sensitive to definitions: responses for hate speech detection change according to the complexity of definitions used in the prompt.

pdf bib abs

Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification
John Dougrez-Lewis | Mahmud Elahi Akhter | Federico Ruggeri | Sebastian Löbbers | Yulan He | Maria Liakata
Findings of the Association for Computational Linguistics: ACL 2025

Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of in creasing complexity. We evaluate three state of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning prob lems, they consistently fail in cases of abductive reasoning. Moreover, we observe that enhancing LLMs with rationale generation is not always beneficial. Nonetheless, we find that generated rationales are semantically similar to those provided by humans, especially in deduc tive reasoning cases.

2024

pdf bib abs

PejorativITy: Disambiguating Pejorative Epithets to Improve Misogyny Detection in Italian Tweets
Arianna Muti | Federico Ruggeri | Cagri Toraman | Alberto Barrón-Cedeño | Samuel Algherini | Lorenzo Musetti | Silvia Ronchi | Gianmarco Saretto | Caterina Zapparoli
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Misogyny is often expressed through figurative language. Some neutral words can assume a negative connotation when functioning as pejorative epithets. Disambiguating the meaning of such terms might help the detection of misogyny. In order to address such task, we present PejorativITy, a novel corpus of 1,200 manually annotated Italian tweets for pejorative language at the word level and misogyny at the sentence level. We evaluate the impact of injecting information about disambiguated words into a model targeting misogyny detection. In particular, we explore two different approaches for injection: concatenation of pejorative information and substitution of ambiguous words with univocal terms. Our experimental results, both on our corpus and on two popular benchmarks on Italian tweets, show that both approaches lead to a major classification improvement, indicating that word sense disambiguation is a promising preliminary step for misogyny detection. Furthermore, we investigate LLMs’ understanding of pejorative epithets by means of contextual word embeddings analysis and prompting.

pdf bib abs

Multimodal Fallacy Classification in Political Debates
Eleonora Mancini | Federico Ruggeri | Paolo Torroni
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Recent advances in NLP suggest that some tasks, such as argument detection and relation classification, are better framed in a multimodal perspective. We propose multimodal argument mining for argumentative fallacy classification in political debates. To this end, we release the first corpus for multimodal fallacy classification. Our experiments show that the integration of the audio modality leads to superior classification performance. Our findings confirm that framing fallacy classification as a multimodal task is essential to capture paralinguistic aspects of fallacious arguments.

pdf bib abs

A Corpus for Sentence-Level Subjectivity Detection on English News Articles
Francesco Antici | Federico Ruggeri | Andrea Galassi | Katerina Korre | Arianna Muti | Alessandra Bardi | Alice Fedotova | Alberto Barrón-Cedeño
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.

pdf bib abs

A Grice-ful Examination of Offensive Language: Using NLP Methods to Assess the Co-operative Principle
Katerina Korre | Federico Ruggeri | Alberto Barrón-Cedeño
Proceedings of the 1st LUHME Workshop

Natural Language Processing (NLP) can provide tools for analyzing specific intricate language phenomena, such as offensiveness in language. In this study, we employ methods from pragmatics, more specifically Gricean theory, as well as NLP techniques, to analyze instances of online offensive language. We present a comparative analysis between offensive and non-offensive instances with regard to the degree to which the 4 Gricean Maxims (Quality, Quantity, Manner, and Relevance) are flouted or violated. To facilitate our analysis, we employ NLP tools to filter the instances and proceed to a more thorough qualitative analysis. Our findings reveal that offensive and non-offensive speech do not differ significantly when we evaluate with metrics that correspond to the Gricean Maxims, apart from some aspects of the Maxim of Quality and the Maxim of Manner. Through this paper, we advocate for a turn towards mixed approaches to linguistic topics by also paving the way for a modernization of discourse analysis and natural language understanding that encompasses computational methods. Warning: This paper contains offensive language that might be triggering for some individuals.

pdf bib abs

Multimodal Argument Mining (MAM) is a recent area of research aiming to extend argument analysis and improve discourse understanding by incorporating multiple modalities. Initial results confirm the importance of paralinguistic cues in this field. However, the research community still lacks a comprehensive platform where results can be easily reproduced, and methods and models can be stored, compared, and tested against a variety of benchmarks. To address these challenges, we propose MAMKit, an open, publicly available, PyTorch toolkit that consolidates datasets and models, providing a standardized platform for experimentation. MAMKit also includes some new baselines, designed to stimulate research on text and audio encoding and fusion for MAM tasks. Our initial results with MAMKit indicate that advancements in MAM require novel annotation processes to encompass auditory cues effectively.

pdf bib abs

Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts
Arianna Muti | Federico Ruggeri | Khalid Al Khatib | Alberto Barrón-Cedeño | Tommaso Caselli
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We propose misogyny detection as an Argumentative Reasoning task and we investigate the capacity of large language models (LLMs) to understand the implicit reasoning used to convey misogyny in both Italian and English. The central aim is to generate the missing reasoning link between a message and the implied meanings encoding the misogyny. Our study uses argumentation theory as a foundation to form a collection of prompts in both zero-shot and few-shot settings. These prompts integrate different techniques, including chain-of-thought reasoning and augmented knowledge. Our findings show that LLMs fall short on reasoning capabilities about misogynistic comments and that they mostly rely on their implicit knowledge derived from internalized common stereotypes about women to generate implied assumptions, rather than on inductive reasoning.

2023

pdf bib abs

A Dataset of Argumentative Dialogues on Scientific Papers
Federico Ruggeri | Mohsen Mesgar | Iryna Gurevych
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With recent advances in question-answering models, various datasets have been collected to improve and study the effectiveness of these models on scientific texts. Questions and answers in these datasets explore a scientific paper by seeking factual information from the paper’s content. However, these datasets do not tackle the argumentative content of scientific papers, which is of huge importance in persuasiveness of a scientific discussion. We introduce ArgSciChat, a dataset of 41 argumentative dialogues between scientists on 20 NLP papers. The unique property of our dataset is that it includes both exploratory and argumentative questions and answers in a dialogue discourse on a scientific paper. Moreover, the size of ArgSciChat demonstrates the difficulties in collecting dialogues for specialized domains. Thus, our dataset is a challenging resource to evaluate dialogue agents in low-resource domains, in which collecting training data is costly. We annotate all sentences of dialogues in ArgSciChat and analyze them extensively. The results confirm that dialogues in ArgSciChat include exploratory and argumentative interactions. Furthermore, we use our dataset to fine-tune and evaluate a pre-trained document-grounded dialogue agent. The agent achieves a low performance on our dataset, motivating a need for dialogue agents with a capability to reason and argue about their answers. We publicly release ArgSciChat.

2022

pdf bib abs

Multimodal Argument Mining: A Case Study in Political Debates
Eleonora Mancini | Federico Ruggeri | Andrea Galassi | Paolo Torroni
Proceedings of the 9th Workshop on Argument Mining

We propose a study on multimodal argument mining in the domain of political debates. We collate and extend existing corpora and provide an initial empirical study on multimodal architectures, with a special emphasis on input encoding methods. Our results provide interesting indications about future directions in this important domain.

pdf bib abs

The successful application of argument mining in the legal domain can dramatically impact many disciplines related to law. For this purpose, we present Demosthenes, a novel corpus for argument mining in legal documents, composed of 40 decisions of the Court of Justice of the European Union on matters of fiscal state aid. The annotation specifies three hierarchical levels of information: the argumentative elements, their types, and their argument schemes. In our experimental evaluation, we address 4 different classification tasks, combining advanced language models and traditional classifiers.

pdf bib abs

Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union.Our evaluation with human experts confirms that our method is more robust than the alternatives.

pdf bib abs

A Sentiment and Emotion Annotated Dataset for Bitcoin Price Forecasting Based on Reddit Posts
Pavlo Seroyizhko | Zhanel Zhexenova | Muhammad Zohaib Shafiq | Fabio Merizzi | Andrea Galassi | Federico Ruggeri
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Cryptocurrencies have gained enormous momentum in finance and are nowadays commonly adopted as a medium of exchange for online payments. After recent events during which GameStop’s stocks were believed to be influenced by WallStreetBets subReddit, Reddit has become a very hot topic on the cryptocurrency market. The influence of public opinions on cryptocurrency price trends has inspired researchers on exploring solutions that integrate such information in crypto price change forecasting. A popular integration technique regards representing social media opinions via sentiment features. However, this research direction is still in its infancy, where a limited number of publicly available datasets with sentiment annotations exists. We propose a novel Bitcoin Reddit Sentiment Dataset, a ready-to-use dataset annotated with state-of-the-art sentiment and emotion recognition. The dataset contains pre-processed Reddit posts and comments about Bitcoin from several domain-related subReddits along with Bitcoin’s financial data. We evaluate several widely adopted neural architectures for crypto price change forecasting. Our results show controversial benefits of sentiment and emotion features advocating for more sophisticated social media integration techniques. We make our dataset publicly available for research.