Matteo Rinaldi


2024

pdf bib
CALAMITA: Challenge the Abilities of LAnguage Models in ITAlian
Giuseppe Attanasio | Pierpaolo Basile | Federico Borazio | Danilo Croce | Maria Francis | Jacopo Gili | Elio Musacchio | Malvina Nissim | Viviana Patti | Matteo Rinaldi | Daniel Scalena
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

The rapid development of Large Language Models (LLMs) has called for robust benchmarks to assess their abilities, track progress, and compare iterations. While existing benchmarks provide extensive evaluations across diverse tasks, they predominantly focus on English, leaving other languages underserved. For Italian, the EVALITA campaigns have provided a long-standing tradition of classification-focused shared tasks. However, their scope does not fully align with the nuanced evaluation required for modern LLMs. To address this gap, we introduce “Challenge the Abilities of LAnguage Models in ITAlian” (CALAMITA), a collaborative effort to create a dynamic and growing benchmark tailored to Italian. CALAMITA emphasizes diversity in task design to test a wide range of LLM capabilities through resources natively developed in Italian by the community. This initiative includes a shared platform, live leaderboard, and centralized evaluation framework. This paper outlines the collaborative process, initial challenges, and evaluation framework of CALAMITA.

pdf bib
GATTINA - GenerAtion of TiTles for Italian News Articles: A CALAMITA Challenge
Maria Francis | Matteo Rinaldi | Jacopo Gili | Leonardo De Cosmo | Sandro Iannaccone | Malvina Nissim | Viviana Patti
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

We introduce a new benchmark designed to evaluate the ability of Large Language Models (LLMs) to generate Italian-language headlines for science news articles. The benchmark is based on a large dataset of science news articles obtained from Ansa Scienza and Galileo, two important Italian media outlets. Effective headline generation requires more than summarizing article content; headlines must also be informative, engaging, and suitable for the topic and target audience, making automatic evaluation particularly challenging. To address this, we propose two novel transformer-based metrics to assess headline quality. We aim for this benchmark to support the evaluation of Italian LLMs and to foster the development of tools to assist in editorial workflows.

pdf bib
Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge
Matteo Rinaldi | Jacopo Gili | Maria Francis | Mattia Goffetti | Viviana Patti | Malvina Nissim
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

Multi-choice question answering (MCQA) is a powerful tool for evaluating the factual knowledge and reasoning capacities of Large Language Models (LLMs). However, there is a lack of large-scale MCQA datasets originally written in Italian. Existing Italian MCQA benchmarks are often automatically translated from English, an approach with two key drawbacks: Firstly, automatic translations may sound unnatural, contain errors, or use linguistics constructions that do not align with the target language. Secondly, they may introduce topical and ideological biases reflecting Anglo-centric perspectives. To addressthis gap, we present Mult-IT, an MCQA dataset comprising over 110,000 manually written questions across a wide range of topics. All questions are sourced directly from preparation quizzes for Italian university entrance exams, or for exams for public sector employment in Italy. We are hopeful that this contribution enables a more comprehensive evaluation of LLMs’ proficiency, not only in the Italian language, but also in their grasp of Italian cultural and contextual knowledge.