Dario Onorati

2024

Understanding textual description to generate code seems to be an achieved capability of instruction-following Large Language Models (LLMs) in zero-shot scenario. However, there is a severe possibility that this translation ability may be influenced by having seen target textual descriptions and the related code. This effect is known as Data Contamination.In this study, we investigate the impact of Data Contamination on the performance of GPT-3.5 in the Text-to-SQL code-generating tasks. Hence, we introduce a novel method to detect Data Contamination in GPTs and examine GPT-3.5’s Text-to-SQL performances using the known Spider Dataset and our new unfamiliar dataset Termite. Furthermore, we analyze GPT-3.5’s efficacy on databases with modified information via an adversarial table disconnection (ATD) approach, complicating Text-to-SQL tasks by removing structural pieces of information from the database. Our results indicate a significant performance drop in GPT-3.5 on the unfamiliar Termite dataset, even with ATD modifications, highlighting the effect of Data Contamination on LLMs in Text-to-SQL translation tasks.

pdf bib abs

Termite Italian Text-to-SQL: A CALAMITA Challenge
Federico Ranaldi | Elena Sofia Ruzzetti | Dario Onorati | Fabio Massimo Zanzotto | Leonardo Ranaldi
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

We introduce Termite, which is a definitely unseen resource for evaluating Text-to-SQL in Italian. Specifically,we transfer evaluation pipelines beyond English, proposing novel, definitely unseen resources that avoid data-contamination phenomena while assessing the ability of models to perform Text-to-SQL tasks when natural language queries are written in Italian. We establish an evaluation grid based on execution accuracy.

pdf bib abs

A Trip Towards Fairness: Bias and De-Biasing in Large Language Models
Leonardo Ranaldi | Elena Sofia Ruzzetti | Davide Venditti | Dario Onorati | Fabio Massimo Zanzotto
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

Cheap-to-Build Very Large-Language Models (CtB-LLMs) with affordable training are emerging as the next big revolution in natural language processing and understanding. These CtB-LLMs are democratizing access to trainable Very Large-Language Models (VLLMs) and, thus, may represent the building blocks of many NLP systems solving downstream tasks. Hence, a little or a large bias in CtB-LLMs may cause huge harm. In this paper, we performed a large investigation of the bias of three families of CtB-LLMs, and we showed that debiasing techniques are effective and usable. Indeed, according to current tests, the LLaMA and the OPT families have an important bias in gender, race, religion, and profession. In contrast to the analysis for other LMMs, we discovered that bias depends not on the number of parameters but on the perplexity. Finally, the debiasing of OPT using LORA reduces bias up to 4.12 points in the normalized stereotype score.

pdf bib abs

Assessing the Asymmetric Behaviour of Italian Large Language Models across Different Syntactic Structures
Elena Sofia Ruzzetti | Federico Ranaldi | Dario Onorati | Davide Venditti | Leonardo Ranaldi | Tommaso Caselli | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

While LLMs get more proficient at solving tasks and generating sentences, we aim to investigate the role that differentsyntactic structures have on models’ performances on a battery of Natural Language Understanding tasks. We analyze theperformance of five LLMs on semantically equivalent sentences that are characterized by different syntactic structures. Tocorrectly solve the tasks, a model is implicitly required to correctly parse the sentence. We found out that LLMs strugglewhen there are more complex syntactic structures, with an average drop of 16.13(±11.14) points in accuracy on Q&A task.Additionally, we propose a method based on token attribution to spot which area of the LLMs encode syntactic knowledge,by identifying model heads and layers responsible for the generation of a correct answer

pdf bib abs

Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language
Dario Onorati | Davide Venditti | Elena Sofia Ruzzetti | Federico Ranaldi | Leonardo Ranaldi | Fabio Massimo Zanzotto
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Instruction-Following Language Models (IFLMs) are the state-of-the-art for solving many downstream tasks. Given their widespread use, there is an urgent need to measure whether the sentences they generate contain toxic information or social biases. In this paper, we propose Prompt Association Test for the Italian language (ItaP-AT): a new resource for testing the presence of social bias in different domains in IFLMs. This work also aims to understand whether it is possible to make the responses of these models more fair by using context learning, using “one-shot anti-stereotypical prompts”.

2023

pdf bib

Investigating Gender Bias in Large Language Models for the Italian Language
Elena Sofia Ruzzetti | Dario Onorati | Leonardo Ranaldi | Davide Venditti | Fabio Massimo Zanzotto
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

pdf bib abs

Pre-trained Transformers are challenging human performances in many Natural Language Processing tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained natural language understanding models performs on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences.

pdf bib abs

Measuring bias in Instruction-Following models with P-AT
Dario Onorati | Elena Sofia Ruzzetti | Davide Venditti | Leonardo Ranaldi | Fabio Massimo Zanzotto
Findings of the Association for Computational Linguistics: EMNLP 2023

Instruction-Following Language Models (IFLMs) are promising and versatile tools for solving many downstream, information-seeking tasks. Given their success, there is an urgent need to have a shared resource to determine whether existing and new IFLMs are prone to produce biased language interactions. In this paper, we propose Prompt Association Test (P-AT): a new resource for testing the presence of social biases in IFLMs. P-AT stems from WEAT (Caliskan et al., 2017) and generalizes the notion of measuring social biases to IFLMs. Basically, we cast WEAT word tests in promptized classification tasks, and we associate a metric - the bias score. Our resource consists of 2310 prompts. We then experimented with several families of IFLMs discovering gender and race biases in all the analyzed models. We expect P-AT to be an important tool for quantifying bias across different dimensions and, therefore, for encouraging the creation of fairer IFLMs before their distortions have consequences in the real world.

2021

pdf bib

KERMIT for Sentiment Analysis in Italian Healthcare Reviews
Leonardo Ranaldi | Michele Mastromattei | Dario Onorati | Elena Sofia Ruzzetti | Francesca Fallucchi | Fabio Massimo Zanzotto
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

2020

pdf bib abs

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations
Fabio Massimo Zanzotto | Andrea Santilli | Leonardo Ranaldi | Dario Onorati | Pierfrancesco Tommasino | Francesca Fallucchi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality in downstream tasks due to the success of large-scale textual representation learners. In this paper, we propose KERMIT (Kernel-inspired Encoder with Recursive Mechanism for Interpretable Trees) to embed symbolic syntactic parse trees into artificial neural networks and to visualize how syntax is used in inference. We experimented with KERMIT paired with two state-of-the-art transformer-based universal sentence encoders (BERT and XLNet) and we showed that KERMIT can indeed boost their performance by effectively embedding human-coded universal syntactic representations in neural networks