Federico Ranaldi


2024

pdf bib
Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language
Dario Onorati | Davide Venditti | Elena Sofia Ruzzetti | Federico Ranaldi | Leonardo Ranaldi | Fabio Massimo Zanzotto
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Instruction-Following Language Models (IFLMs) are the state-of-the-art for solving many downstream tasks. Given their widespread use, there is an urgent need to measure whether the sentences they generate contain toxic information or social biases. In this paper, we propose Prompt Association Test for the Italian language (ItaP-AT): a new resource for testing the presence of social bias in different domains in IFLMs. This work also aims to understand whether it is possible to make the responses of these models more fair by using context learning, using “one-shot anti-stereotypical prompts”.

pdf bib
The limits of Italian in Reasoning Tasks
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Previous studies have demonstrated the effectiveness of reasoning methods in eliciting multi-step reasoned answers from Large Language Models (LLMs) by leveraging in-context demonstrations. These methods, exemplified by Chain-of-Thought (CoT) and Program-Aided Language Models (PAL), have been shown to reason well in monolingual contexts, primarily in English. There has, however, been limited exploration of their abilities in other languages, especially in Italian.To gain a deeper understanding of the role of reasoning methods in in-context demonstrations, we propose a multidimensional analysis tailored to Italian, focusing on arithmetic and symbolic reasoning tasks. Our findings indicate that the effectiveness of reasoning methods varies significantly beyond English. Specifically, CoT, which relies on natural language demonstrations, is limited to English. Conversely, the structured nature of PAL in-context demonstrations facilitates multilingual comprehension, enabling LLMs to generate programmatic answers in Italian as well. Finally, for a more comprehensive overview, we observe that additional alignment methods do not improve downstream performances; in contrast, in some cases, they limit the abilities of the original models. This leads to significant improvements in the accuracy and quality of the generated responses.

pdf bib
Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures
Elena Sofia Ruzzetti | Federico Ranaldi | Dario Onorati | Davide Venditti | Leonardo Ranaldi | Tommaso Caselli | Fabio Massimo Zanzotto
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

While LLMs get more proficient at solving tasks and generating sentences, we aim to investigate the role that differentsyntactic structures have on models’ performances on a battery of Natural Language Understanding tasks. We analyze theperformance of five LLMs on semantically equivalent sentences that are characterized by different syntactic structures. Tocorrectly solve the tasks, a model is implicitly required to correctly parse the sentence. We found out that LLMs strugglewhen there are more complex syntactic structures, with an average drop of 16.13(±11.14) points in accuracy on Q&A task.Additionally, we propose a method based on token attribution to spot which area of the LLMs encode syntactic knowledge,by identifying model heads and layers responsible for the generation of a correct answer

pdf bib
Termite Italian Text-to-SQL: A CALAMITA Challenge
Federico Ranaldi | Elena Sofia Ruzzetti | Dario Onorati | Fabio Massimo Zanzotto | Leonardo Ranaldi
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

We introduce Termite, which is a definitely unseen resource for evaluating Text-to-SQL in Italian. Specifically,we transfer evaluation pipelines beyond English, proposing novel, definitely unseen resources that avoid data-contamination phenomena while assessing the ability of models to perform Text-to-SQL tasks when natural language queries are written in Italian. We establish an evaluation grid based on execution accuracy.

pdf bib
A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
Leonardo Ranaldi | Giulia Pucci | Federico Ranaldi | Elena Sofia Ruzzetti | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: NAACL 2024

Reasoning methods, best exemplified by the well-known Chain-of-Thought (CoT), empower the reasoning abilities of Large Language Models (LLMs) by eliciting them to solve complex tasks in a step-by-step manner. Although they are achieving significant success, the ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data, which makes other languages a barrier. In this paper, we propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages. The proposed method, through a self-consistent cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, provides multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Experimental evaluations show that our method significantly outperforms existing prompting methods by reducing the number of interactions and achieving state-of-the-art performance.

pdf bib
Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL translation
Federico Ranaldi | Elena Sofia Ruzzetti | Dario Onorati | Leonardo Ranaldi | Cristina Giannone | Andrea Favalli | Raniero Romagnoli | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: ACL 2024

Understanding textual description to generate code seems to be an achieved capability of instruction-following Large Language Models (LLMs) in zero-shot scenario. However, there is a severe possibility that this translation ability may be influenced by having seen target textual descriptions and the related code. This effect is known as Data Contamination.In this study, we investigate the impact of Data Contamination on the performance of GPT-3.5 in the Text-to-SQL code-generating tasks. Hence, we introduce a novel method to detect Data Contamination in GPTs and examine GPT-3.5’s Text-to-SQL performances using the known Spider Dataset and our new unfamiliar dataset Termite. Furthermore, we analyze GPT-3.5’s efficacy on databases with modified information via an adversarial table disconnection (ATD) approach, complicating Text-to-SQL tasks by removing structural pieces of information from the database. Our results indicate a significant performance drop in GPT-3.5 on the unfamiliar Termite dataset, even with ATD modifications, highlighting the effect of Data Contamination on LLMs in Text-to-SQL translation tasks.

2023

pdf bib
Exploring Linguistic Properties of Monolingual BERTs with Typological Classification among Languages
Elena Sofia Ruzzetti | Federico Ranaldi | Felicia Logozzo | Michele Mastromattei | Leonardo Ranaldi | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: EMNLP 2023

The impressive achievements of transformers force NLP researchers to delve into how these models represent the underlying structure of natural language. In this paper, we propose a novel standpoint to investigate the above issue: using typological similarities among languages to observe how their respective monolingual models encode structural information. We aim to layer-wise compare transformers for typologically similar languages to observe whether these similarities emerge for particular layers. For this investigation, we propose to use Centered Kernel Alignment to measure similarity among weight matrices. We found that syntactic typological similarity is consistent with the similarity between the weights in the middle layers, which are the pretrained BERT layers to which syntax encoding is generally attributed. Moreover, we observe that a domain adaptation on semantically equivalent texts enhances this similarity among weight matrices.