Elisabetta Fersini - ACL Anthology

Elisabetta Fersini

2026

Steering Large Language Models for Machine Translation Personalization
Daniel Scalena | Gabriele Sarti | Arianna Bisazza | Elisabetta Fersini | Malvina Nissim
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models have simplified the production of personalized translations reflecting predefined stylistic constraints. However, these systems still struggle when stylistic requirements are implicitly represented by a set of examples, such as texts produced by a specific human translator. In this work, we explore various strategies for personalizing automatically generated translations when few examples are available, with a focus on the challenging domain of literary translation. We begin by determining the feasibility of the task and how style information is encoded within model representations. Then, we evaluate various prompting strategies and inference-time interventions for steering model generations towards a personalized style, with a particular focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties. We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency than prompting approaches. We further examine the impact of steering on model activations, finding that layers encoding personalization properties are impacted similarly by prompting and SAE steering, suggesting a similar mechanism at play.

2025

BeaverTails-IT: Towards a Safety Benchmark for Evaluating Italian Large Language Models
Giuseppe Magazzù | Alberto Sormani | Giulia Rizzi | Francesca Pulerà | Daniel Scalena | Stefano Cariddi | Edoardo Michielon | Marco Pasqualini | Claudio Stamile | Elisabetta Fersini
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Uncovering Unsafety Traits in Italian Language Models
Giulia Rizzi | Giuseppe Magazzù | Alberto Sormani | Francesca Pulerà | Daniel Scalena | Elisabetta Fersini
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Beyond Raw Text: Knowledge-Augmented Italian Relation Extraction with Large Language Models
Gianmaria Balducci | Elisabetta Fersini | Messina Enza
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Is a bunch of words enough to detect disagreement in hateful content?
Giulia Rizzi | Paolo Rosso | Elisabetta Fersini
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

The complexity of the annotation process when adopting crowdsourcing platforms for labeling hateful content can be linked to the presence of textual constituents that can be ambiguous, misinterpreted, or characterized by a reduced surrounding context. In this paper, we address the problem of perspectivism in hateful speech by leveraging contextualized embedding representation of their constituents and weighted probability functions. The effectiveness of the proposed approach is assessed using four datasets provided for the SemEval 2023 Task 11 shared task. The results emphasize that a few elements can serve as a proxy to identify sentences that may be perceived differently by multiple readers, without the need of necessarily exploiting complex Large Language Models.

LeWiDi-2025 at NLPerspectives: Third Edition of the Learning with Disagreements Shared Task
Elisa Leonardelli | Silvia Casola | Siyao Peng | Giulia Rizzi | Valerio Basile | Elisabetta Fersini | Diego Frassinelli | Hyewon Jang | Maja Pavlovic | Barbara Plank | Massimo Poesio
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

Many researchers have reached the conclusion that ai models should be trained to be aware of the possibility of variation and disagreement in human judgments, and evaluated as per their ability to recognize such variation. The LeWiDi series of shared tasks on Learning With Disagreements was established to promote this approach to training and evaluating ai models, by making suitable datasets more accessible and by developing evaluation methods. The third edition of the task builds on this goal by extending the LeWiDi benchmark to four datasets spanning paraphrase identification, irony detection, sarcasm detection, and natural language inference, with labeling schemes that include not only categorical judgments as in previous editions, but ordinal judgments as well. Another novelty is that we adopt two complementary paradigms to evaluate disagreement-aware systems: the soft-label approach, in which models predict population-level distributions of judgments, and the perspectivist approach, in which models predict the interpretations of individual annotators. Crucially, we moved beyond standard metrics such as cross-entropy, and tested new evaluation metrics for the two paradigms. The task attracted diverse participation, and the results provide insights into the strengths and limitations of methods to modeling variation. Together, these contributions strengthen LeWiDi as a framework and provide new resources, benchmarks, and findings to support the development of disagreement-aware technologies.

C-SHAP: Collocation-Aware Explanations for Financial NLP
Martina Menzio | Elisabetta Fersini | Davide Paris
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Understanding the internal decision-making process of NLP models in high-stakes domains such as the financial sector is particularly challenging due to the complexity of domain-specific terminology and the need for transparency and accountability. Although SHAP is a widely used model-agnostic method for attributing model predictions to input features, its standard formulation treats input tokens as independent units, failing to capture the influence of collocations that often carry non-compositional meaning, instead modeled by the current language models. We introduce C-SHAP, an extension of SHAP that incorporates collocational dependencies into the explanation process to account for word combinations in the financial sector. C-SHAP dynamically groups tokens into significant collocations using a financial glossary and computes Shapley values over these structured units. The proposed approach has been evaluated to explain sentiment classification of Federal Reserve Minutes, demonstrating improved alignment with human rationales and better association to model behaviour compared to the standard token-level approach.

Gender Violence in Numbers: Prompting Italian LLMs to Characterize Crimes against Women
Giulia Rizzi | Daniel Scalena | Elisabetta Fersini
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

Financial News as a Proxy of European Central Bank Interest Rate Adjustments
Davide Paris | Martina Menzio | Elisabetta Fersini
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This paper examines the relationship between news coverage and the European Central Bank’s (ECB) interest rate decisions. In particular, the hypothesis of a linear relationship between financial news and ECB indications regarding interest rate variations is investigated by leveraging state-of-the-art large language models combined with domain experts and automatically selected keywords. The analysis revealed two key findings related to how news contents can signal the ECB’s decisions to raise or lower interest rates: (1) Sentence Transformer models, when combined with domain-specific keywords, exhibit a higher correlation with ECB decisions than state-of-the-art financial BERT architectures; (2) employing a grid search strategy to select subsets of informative keywords strengthened the relationships between news contents and ECB’s decisions, highlighting how media narratives can anticipate or reflect central bank policy actions.

MAMITA: Benchmarking Misogyny in Italian Memes
Elisabetta Fersini | Francesca Gasparini | Giulia Rizzi | Aurora Saibene
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

2024

Soft metrics for evaluation with disagreements: an assessment
Giulia Rizzi | Elisa Leonardelli | Massimo Poesio | Alexandra Uma | Maja Pavlovic | Silviu Paun | Paolo Rosso | Elisabetta Fersini
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Perspectives on Hate: General vs. Domain-Specific Models
Giulia Rizzi | Michele Fontana | Elisabetta Fersini
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

The rise of online hostility, combined with broad social media use, leads to the necessity of the comprehension of its human impact. However, the process of hate identification is challenging because, on the one hand, the line between healthy disagreement and poisonous speech is not well defined, and, on the other hand, multiple socio-cultural factors or prior beliefs shape people’s perceptions of potentially harmful text. To address disagreements in hate speech identification, Natural Language Processing (NLP) models must capture several perspectives. This paper introduces a strategy based on the Contrastive Learning paradigm for detecting disagreements in hate speech using pre-trained language models. Two approaches are proposed: the General Model, a comprehensive framework, and the Domain-Specific Model, which focuses on more specific hate-related tasks. The source code is available at ://anonymous.4open.science/r/Disagreement-530C.

Exploring Neural Topic Modeling on a Classical Latin Corpus
Ginevra Martinelli | Paola Impicciché | Elisabetta Fersini | Francesco Mambrini | Marco Passarotti
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The large availability of processable textual resources for Classical Latin has made it possible to study Latin literature through methods and tools that support distant reading. This paper describes a number of experiments carried out to test the possibility of investigating the thematic distribution of the Classical Latin corpus Opera Latina by means of topic modeling. For this purpose, we train, optimize and compare two neural models, Product-of-Experts LDA (ProdLDA) and Embedded Topic Model (ETM), opportunely revised to deal with the textual data from a Classical Latin corpus, to evaluate which one performs better both on the basis of topic diversity and topic coherence metrics, and from a human judgment point of view. Our results show that the topics extracted by neural models are coherent and interpretable and that they are significant from the perspective of a Latin scholar. The source code of the proposed model is available at https://github.com/MIND-Lab/LatinProdLDA.

Unveiling Currency Market Dynamics: Leveraging Federal Reserve Communications for Strategic Investment Insights
Martina Menzio | Davide Paris | Elisabetta Fersini
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

The purpose of this paper is to extract market signals for the major currencies (EUR, USD, GBP, JPY, CNY) analyzing the Federal Reserve System (FED) minutes and speeches, and, consequently, making suggestions about going long/short or remaining neutral to investors thanks to the causal relationships between FED sentiment and currency exchange rates. To this purpose, we aim to verify the hypothesis that the currency market dynamics follow a trend that is subject to the sentiment of FED minutes and speeches related to specific relevant currencies. The proposed paper has highlighted two main findings: (1) the sentiment expressed in the FED minutes has a strong influence on financial market predictability on major currencies trend and (2) the sentiment over time Granger-causes the exchange rate of currencies not only immediately but also at increasing lags according to a monotonically decreasing impact.

A Gentle Push Funziona Benissimo: Making Instructed Models in Italian via Contrastive Activation Steering
Daniel Scalena | Elisabetta Fersini | Malvina Nissim
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Adapting models to a language that was only partially present in the pre-training data requires fine-tuning, which is expensive in terms of both data and computational resources. As an alternative to fine-tuning, we explore the potential of activation steering-based techniques to enhance model performance on Italian tasks. Through our experiments we show that Italian steering (i) can be successfully applied to different models, (ii) achieves performances comparable to, or even better than, fine-tuned models for Italian, and (iii) yields higher quality and consistency in Italian generations. We also discuss the utility of steering and fine-tuning in the contemporary LLM landscape where models are anyway getting high Italian performances even if not explicitly trained in this language.

From Explanation to Detection: Multimodal Insights into Disagreement in Misogynous Memes
Giulia Rizzi | Paolo Rosso | Elisabetta Fersini
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

This paper presents a probabilistic approach to identifying the disagreement-related elements in misogynistic memes by considering both modalities that compose a meme (i.e., visual and textual sources). Several methodologies to exploit such elements in the identification of disagreement among annotators have been investigated and evaluated on the Multimedia Automatic Misogyny Identification (MAMI) dataset. The proposed unsupervised approach reaches comparable performances, and in some cases even better, with state-of-the-art approaches, but with a reduced number of parameters to be estimated.

2023

Bias Mitigation in Misogynous Meme Recognition: A Preliminary Study
Gianmaria Balducci | Giulia Rizzi | Elisabetta Fersini
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

On the Generalization of Projection-Based Gender Debiasing in Word Embedding
Elisabetta Fersini | Antonio Candelieri | Lorenzo Pastore
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Gender bias estimation and mitigation techniques in word embeddings lack an understanding of their generalization capabilities. In this work, we complement prior research by comparing in a systematic way four gender bias metrics (Word Embedding Association Tes, Relative Negative Sentiment Bias, Embedding Coherence Test and Bias Analogy Test), two types of projection-based gender mitigation strategies (hard- and soft-debiasing) on three well-known word embedding representations (Word2Vec, FastText and Glove). The experiments have shown that the considered word embeddings are consistent between them but the debiasing techniques are inconsistent across the different metrics, also highlighting the potential risk of unintended bias after the mitigation strategies.

Integrated Gradients as Proxy of Disagreement in Hateful Content
Alessandro Astorino | Giulia Rizzi | Elisabetta Fersini
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

MIND at SemEval-2023 Task 11: From Uncertain Predictions to Subjective Disagreement
Giulia Rizzi | Alessandro Astorino | Daniel Scalena | Paolo Rosso | Elisabetta Fersini
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes the participation of the research laboratory MIND, at the University of Milano-Bicocca, in the SemEval 2023 task related to Learning With Disagreements (Le-Wi-Di). The main goal is to identify the level of agreement/disagreement from a collection of textual datasets with different characteristics in terms of style, language and task. The proposed approach is grounded on the hypothesis that the disagreement between annotators could be grasped by the uncertainty that a model, based on several linguistic characteristics, could have on the prediction of a given gold label.

2022

SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification
Elisabetta Fersini | Francesca Gasparini | Giulia Rizzi | Aurora Saibene | Berta Chulvi | Paolo Rosso | Alyssa Lees | Jeffrey Sorensen
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The paper describes the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI),which explores the detection of misogynous memes on the web by taking advantage of available texts and images. The task has been organised in two related sub-tasks: the first one is focused on recognising whether a meme is misogynous or not (Sub-task A), while the second one is devoted to recognising types of misogyny (Sub-task B). MAMI has been one of the most popular tasks at SemEval-2022 with more than 400 participants, 65 teams involved in Sub-task A and 41 in Sub-task B from 13 countries. The MAMI challenge received 4214 submitted runs (of which 166 uploaded on the leader-board), denoting an enthusiastic participation for the proposed problem. The collection and annotation is described for the task dataset. The paper provides an overview of the systems proposed for the challenge, reports the results achieved in both sub-tasks and outlines a description of the main errors for a comprehension of the systems capabilities and for detailing future research perspectives.

2021

Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)
Elisabetta Fersini | Marco Passarotti | Viviana Patti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

Preface
Elisabetta Fersini | Marco Passarotti | Viviana Patti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

OCTIS: Comparing and Optimizing Topic models is Simple!
Silvia Terragni | Elisabetta Fersini | Bruno Giovanni Galuzzi | Pietro Tropeano | Antonio Candelieri
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

In this paper, we present OCTIS, a framework for training, analyzing, and comparing Topic Models, whose optimal hyper-parameters are estimated using a Bayesian Optimization approach. The proposed solution integrates several state-of-the-art topic models and evaluation metrics. These metrics can be targeted as objective by the underlying optimization procedure to determine the best hyper-parameter configuration. OCTIS allows researchers and practitioners to have a fair comparison between topic models of interest, using several benchmark datasets and well-known evaluation metrics, to integrate novel algorithms, and to have an interactive visualization of the results for understanding the behavior of each model. The code is available at the following link: https://github.com/MIND-Lab/OCTIS.

Cross-lingual Contextualized Topic Models with Zero-shot Learning
Federico Bianchi | Silvia Terragni | Dirk Hovy | Debora Nozza | Elisabetta Fersini
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduce a zero-shot cross-lingual topic model. Our model learns topics on one language (here, English), and predicts them for unseen documents in different languages (here, Italian, French, German, and Portuguese). We evaluate the quality of the topic predictions for the same document in different languages. Our results show that the transferred topics are coherent and stable across languages, which suggests exciting future research directions.

Deep Learning Representations in Automatic Misogyny Identification: What Do We Gain and What Do We Miss?
Elisabetta Fersini | Luca Rosato | Antonio Candelieri | Francesco Archetti | Enza Messina
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

OCTIS 2.0: Optimizing and Comparing Topic Models in Italian Is Even Simpler!
Silvia Terragni | Elisabetta Fersini
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

An Empirical Analysis of Topic Models: Uncovering the Relationships between Hyperparameters, Document Length and Performance Measures
Silvia Terragni | Elisabetta Fersini
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Neural Topic Models are recent neural models that aim at extracting the main themes from a collection of documents. The comparison of these models is usually limited because the hyperparameters are held fixed. In this paper, we present an empirical analysis and comparison of Neural Topic Models by finding the optimal hyperparameters of each model for four different performance measures adopting a single-objective Bayesian optimization. This allows us to determine the robustness of a topic model for several evaluation metrics. We also empirically show the effect of the length of the documents on different optimized metrics and discover which evaluation metrics are in conflict or agreement with each other.

2020

Profiling Italian Misogynist: An Empirical Study
Elisabetta Fersini | Debora Nozza | Giulia Boifava
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

Hate speech may take different forms in online social environments. In this paper, we address the problem of automatic detection of misogynous language on Italian tweets by focusing both on raw text and stylometric profiles. The proposed exploratory investigation about the adoption of stylometry for enhancing the recognition capabilities of machine learning models has demonstrated that profiling users can lead to good discrimination of misogynous and not misogynous contents.

Which Matters Most? Comparing the Impact of Concept and Document Relationships in Topic Models
Silvia Terragni | Debora Nozza | Elisabetta Fersini | Messina Enza
Proceedings of the First Workshop on Insights from Negative Results in NLP

Topic models have been widely used to discover hidden topics in a collection of documents. In this paper, we propose to investigate the role of two different types of relational information, i.e. document relationships and concept relationships. While exploiting the document network significantly improves topic coherence, the introduction of concepts and their relationships does not influence the results both quantitatively and qualitatively.

2019

SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter
Valerio Basile | Cristina Bosco | Elisabetta Fersini | Debora Nozza | Viviana Patti | Francisco Manuel Rangel Pardo | Paolo Rosso | Manuela Sanguinetti
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

2017

TWINE: A real-time system for TWeet analysis via INformation Extraction
Debora Nozza | Fausto Ristagno | Matteo Palmonari | Elisabetta Fersini | Pikakshi Manchanda | Enza Messina
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

In the recent years, the amount of user generated contents shared on the Web has significantly increased, especially in social media environment, e.g. Twitter, Facebook, Google+. This large quantity of data has generated the need of reactive and sophisticated systems for capturing and understanding the underlying information enclosed in them. In this paper we present TWINE, a real-time system for the big data analysis and exploration of information extracted from Twitter streams. The proposed system based on a Named Entity Recognition and Linking pipeline and a multi-dimensional spatial geo-localization is managed by a scalable and flexible architecture for an interactive visualization of micropost streams insights. The demo is available at http://twine-mind.cloudapp.net/streaming.

A Multi-View Sentiment Corpus
Debora Nozza | Elisabetta Fersini | Enza Messina
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Sentiment Analysis is a broad task that involves the analysis of various aspect of the natural language text. However, most of the approaches in the state of the art usually investigate independently each aspect, i.e. Subjectivity Classification, Sentiment Polarity Classification, Emotion Recognition, Irony Detection. In this paper we present a Multi-View Sentiment Corpus (MVSC), which comprises 3000 English microblog posts related the movie domain. Three independent annotators manually labelled MVSC, following a broad annotation schema about different aspects that can be grasped from natural language text coming from social networks. The contribution is therefore a corpus that comprises five different views for each message, i.e. subjective/objective, sentiment polarity, implicit/explicit, irony, emotion. In order to allow a more detailed investigation on the human labelling behaviour, we provide the annotations of each human annotator involved.

Co-authors

Antonio Candelieri 3

Martina Menzio 3

Marco Passarotti 3

Viviana Patti 3

Alessandro Astorino 2

Gianmaria Balducci 2

Valerio Basile 2

Francesca Gasparini 2

Elisa Leonardelli 2

Giuseppe Magazzù 2

Malvina Nissim 2

Maja Pavlovic 2

Massimo Poesio 2

Francesca Pulerà 2

Aurora Saibene 2

Alberto Sormani 2

Francesco Archetti 1

Federico Bianchi 1

Arianna Bisazza 1

Giulia Boifava 1

Cristina Bosco 1

Stefano Cariddi 1

Silvia Casola 1

Michele Fontana 1

Diego Frassinelli 1

Bruno Giovanni Galuzzi 1

Paola Impicciché 1

Francesco Mambrini 1

Pikakshi Manchanda 1

Ginevra Martinelli 1

Edoardo Michielon 1

Matteo Palmonari 1

Marco Pasqualini 1

Lorenzo Pastore 1

Barbara Plank 1

Francisco Manuel Rangel Pardo 1

Fausto Ristagno 1

Manuela Sanguinetti 1

Gabriele Sarti 1

Jeffrey Sorensen 1

Claudio Stamile 1

Pietro Tropeano 1

Alexandra Uma 1

Venues