ItaEval: A CALAMITA Challenge

Giuseppe Attanasio, Moreno La Quatra, Andrea Santilli, Beatrice Savoldi


Abstract
In recent years, new language models for Italian have been spurring.However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.
Anthology ID:
2024.clicit-1.117
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
1064–1073
Language:
URL:
https://aclanthology.org/2024.clicit-1.117/
DOI:
Bibkey:
Cite (ACL):
Giuseppe Attanasio, Moreno La Quatra, Andrea Santilli, and Beatrice Savoldi. 2024. ItaEval: A CALAMITA Challenge. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 1064–1073, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
ItaEval: A CALAMITA Challenge (Attanasio et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.117.pdf