Elena Sofia Ruzzetti

2025

Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
Elena Sofia Ruzzetti | Giancarlo A. Xompero | Davide Venditti | Fabio Massimo Zanzotto
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) memorize, and thus, among huge amounts of uncontrolled data, may memorize Personally Identifiable Information (PII), which should not be stored and, consequently, not leaked. In this paper, we introduce Private Memorization Editing (PME), an approach for preventing private data leakage that turns an apparent limitation, that is, the LLMs’ memorization ability, into a powerful privacy defense strategy. While attacks against LLMs have been performed exploiting previous knowledge regarding their training data, our approach aims to exploit the same kind of knowledge in order to make a model more robust. We detect a memorized PII and then mitigate the memorization of PII by editing a model knowledge of its training data. We verify that our procedure does not affect the underlying language model while making it more robust against privacy Training Data Extraction attacks. We demonstrate that PME can effectively reduce the number of leaked PII in a number of configurations, in some cases even reducing the accuracy of the privacy attacks to zero.

pdf bib

Protecting the Privacy in Velvet with Model Editing
Giancarlo A. Xompero | Elena Sofia Ruzzetti | Cristina Giannone | Andrea Favalli | Raniero Romagnoli | Fabio Massimo Zanzotto
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)

pdf bib abs

Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this position/theory paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle that memorization precedes learning. We introduce MeMo, a novel architecture for language modeling that explicitly memorizes sequences of tokens in layered associative memories. By design, MeMo offers transparency and the possibility of model editing, including forgetting texts. We experimented with the MeMo architecture, showing the memorization power of the one-layer and the multi-layer configurations.

2024

pdf bib abs

Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language
Dario Onorati | Davide Venditti | Elena Sofia Ruzzetti | Federico Ranaldi | Leonardo Ranaldi | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Instruction-Following Language Models (IFLMs) are the state-of-the-art for solving many downstream tasks. Given their widespread use, there is an urgent need to measure whether the sentences they generate contain toxic information or social biases. In this paper, we propose Prompt Association Test for the Italian language (ItaP-AT): a new resource for testing the presence of social bias in different domains in IFLMs. This work also aims to understand whether it is possible to make the responses of these models more fair by using context learning, using “one-shot anti-stereotypical prompts”.

pdf bib abs

The limits of Italian in Reasoning Tasks
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Previous studies have demonstrated the effectiveness of reasoning methods in eliciting multi-step reasoned answers from Large Language Models (LLMs) by leveraging in-context demonstrations. These methods, exemplified by Chain-of-Thought (CoT) and Program-Aided Language Models (PAL), have been shown to reason well in monolingual contexts, primarily in English. There has, however, been limited exploration of their abilities in other languages, especially in Italian.To gain a deeper understanding of the role of reasoning methods in in-context demonstrations, we propose a multidimensional analysis tailored to Italian, focusing on arithmetic and symbolic reasoning tasks. Our findings indicate that the effectiveness of reasoning methods varies significantly beyond English. Specifically, CoT, which relies on natural language demonstrations, is limited to English. Conversely, the structured nature of PAL in-context demonstrations facilitates multilingual comprehension, enabling LLMs to generate programmatic answers in Italian as well. Finally, for a more comprehensive overview, we observe that additional alignment methods do not improve downstream performances; in contrast, in some cases, they limit the abilities of the original models. This leads to significant improvements in the accuracy and quality of the generated responses.

pdf bib abs

Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures
Elena Sofia Ruzzetti | Federico Ranaldi | Dario Onorati | Davide Venditti | Leonardo Ranaldi | Tommaso Caselli | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

While LLMs get more proficient at solving tasks and generating sentences, we aim to investigate the role that differentsyntactic structures have on models’ performances on a battery of Natural Language Understanding tasks. We analyze theperformance of five LLMs on semantically equivalent sentences that are characterized by different syntactic structures. Tocorrectly solve the tasks, a model is implicitly required to correctly parse the sentence. We found out that LLMs strugglewhen there are more complex syntactic structures, with an average drop of 16.13(±11.14) points in accuracy on Q&A task.Additionally, we propose a method based on token attribution to spot which area of the LLMs encode syntactic knowledge,by identifying model heads and layers responsible for the generation of a correct answer

pdf bib abs

Termite Italian Text-to-SQL: A CALAMITA Challenge
Federico Ranaldi | Elena Sofia Ruzzetti | Dario Onorati | Fabio Massimo Zanzotto | Leonardo Ranaldi
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

We introduce Termite, which is a definitely unseen resource for evaluating Text-to-SQL in Italian. Specifically,we transfer evaluation pipelines beyond English, proposing novel, definitely unseen resources that avoid data-contamination phenomena while assessing the ability of models to perform Text-to-SQL tasks when natural language queries are written in Italian. We establish an evaluation grid based on execution accuracy.

pdf bib abs

A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: NAACL 2024

Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance.

pdf bib abs

Understanding textual description to generate code seems to be an achieved capability of instruction-following Large Language Models (LLMs) in zero-shot scenario. However, there is a severe possibility that this translation ability may be influenced by having seen target textual descriptions and the related code. This effect is known as Data Contamination.In this study, we investigate the impact of Data Contamination on the performance of GPT-3.5 in the Text-to-SQL code-generating tasks. Hence, we introduce a novel method to detect Data Contamination in GPTs and examine GPT-3.5’s Text-to-SQL performances using the known Spider Dataset and our new unfamiliar dataset Termite. Furthermore, we analyze GPT-3.5’s efficacy on databases with modified information via an adversarial table disconnection (ATD) approach, complicating Text-to-SQL tasks by removing structural pieces of information from the database. Our results indicate a significant performance drop in GPT-3.5 on the unfamiliar Termite dataset, even with ATD modifications, highlighting the effect of Data Contamination on LLMs in Text-to-SQL translation tasks.

pdf bib abs

A Trip Towards Fairness: Bias and De-Biasing in Large Language Models
Leonardo Ranaldi | Elena Sofia Ruzzetti | Davide Venditti | Dario Onorati | Fabio Massimo Zanzotto
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

Cheap-to-Build Very Large-Language Models (CtB-LLMs) with affordable training are emerging as the next big revolution in natural language processing and understanding. These CtB-LLMs are democratizing access to trainable Very Large-Language Models (VLLMs) and, thus, may represent the building blocks of many NLP systems solving downstream tasks. Hence, a little or a large bias in CtB-LLMs may cause huge harm. In this paper, we performed a large investigation of the bias of three families of CtB-LLMs, and we showed that debiasing techniques are effective and usable. Indeed, according to current tests, the LLaMA and the OPT families have an important bias in gender, race, religion, and profession. In contrast to the analysis for other LMMs, we discovered that bias depends not on the number of parameters but on the perplexity. Finally, the debiasing of OPT using LORA reduces bias up to 4.12 points in the normalized stereotype score.

2023

pdf bib

pdf bib

Teasing LLMs Adapted to Italian
Leonardo Ranaldi | Giulia Pucci | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto | André Freitas
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib

Investigating Gender Bias in Large Language Models for the Italian Language
Elena Sofia Ruzzetti | Dario Onorati | Leonardo Ranaldi | Davide Venditti | Fabio Massimo Zanzotto
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib abs

Measuring bias in Instruction-Following models with P-AT
Dario Onorati | Elena Sofia Ruzzetti | Davide Venditti | Leonardo Ranaldi | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: EMNLP 2023

Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.

pdf bib abs

Exploring Linguistic Properties of Monolingual BERTs with Typological Classification among Languages
Elena Sofia Ruzzetti | Federico Ranaldi | Felicia Logozzo | Michele Mastromattei | Leonardo Ranaldi | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: EMNLP 2023

The impressive achievements of transformers force NLP researchers to delve into how these models represent the underlying structure of natural language. In this paper, we propose a novel standpoint to investigate the above issue: using typological similarities among languages to observe how their respective monolingual models encode structural information. We aim to layer-wise compare transformers for typologically similar languages to observe whether these similarities emerge for particular layers. For this investigation, we propose to use Centered Kernel Alignment to measure similarity among weight matrices. We found that syntactic typological similarity is consistent with the similarity between the weights in the middle layers, which are the pretrained BERT layers to which syntax encoding is generally attributed. Moreover, we observe that a domain adaptation on semantically equivalent texts enhances this similarity among weight matrices.

pdf bib abs

Pre-trained Transformers are challenging human performances in many Natural Language Processing tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained natural language understanding models performs on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences.

pdf bib abs

PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
Leonardo Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Large Language Models (LLMs) are impressive machines with the ability to memorize, possibly generalized learning examples. We present here a small, focused contribution to the analysis of the interplay between memorization and performance of BERT in downstream tasks. We propose PreCog, a measure for evaluating memorization from pre-training, and we analyze its correlation with the BERT’s performance. Our experiments show that highly memorized examples are better classified, suggesting memorization is an essential key to success for BERT.

2022

pdf bib abs

Lacking the Embedding of a Word? Look it up into a Traditional Dictionary
Elena Sofia Ruzzetti | Leonardo Ranaldi | Michele Mastromattei | Francesca Fallucchi | Noemi Scarpato | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: ACL 2022

Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Neural Network (DefiNNet) and Define BERT (DefBERT). In our experiments, DefiNNet and DefBERT significantly outperform state-of-the-art as well as baseline methods devised for producing embeddings of unknown words. In fact, DefiNNet significantly outperforms FastText, which implements a method for the same task-based on n-grams, and DefBERT significantly outperforms the BERT method for OOV words. Then, definitions in traditional dictionaries are useful to build word embeddings for rare words.