Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language

Dario Onorati, Davide Venditti, Elena Sofia Ruzzetti, Federico Ranaldi, Leonardo Ranaldi, Fabio Massimo Zanzotto


Abstract
Instruction-Following Language Models (IFLMs) are the state-of-the-art for solving many downstream tasks. Given their widespread use, there is an urgent need to measure whether the sentences they generate contain toxic information or social biases. In this paper, we propose Prompt Association Test for the Italian language (ItaP-AT): a new resource for testing the presence of social bias in different domains in IFLMs. This work also aims to understand whether it is possible to make the responses of these models more fair by using context learning, using “one-shot anti-stereotypical prompts”.
Anthology ID:
2024.clicit-1.76
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
679–706
Language:
URL:
https://aclanthology.org/2024.clicit-1.76/
DOI:
Bibkey:
Cite (ACL):
Dario Onorati, Davide Venditti, Elena Sofia Ruzzetti, Federico Ranaldi, Leonardo Ranaldi, and Fabio Massimo Zanzotto. 2024. Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 679–706, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Measuring Bias in Instruction-Following Models with ItaP-AT for the Italian Language (Onorati et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.76.pdf