Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge

Matteo Rinaldi; Jacopo Gili; Maria Francis; Mattia Goffetti; Viviana Patti; Malvina Nissim

Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge

Matteo Rinaldi, Jacopo Gili, Maria Francis, Mattia Goffetti, Viviana Patti, Malvina Nissim

Abstract

Multi-choice question answering (MCQA) is a powerful tool for evaluating the factual knowledge and reasoning capacities of Large Language Models (LLMs). However, there is a lack of large-scale MCQA datasets originally written in Italian. Existing Italian MCQA benchmarks are often automatically translated from English, an approach with two key drawbacks: Firstly, automatic translations may sound unnatural, contain errors, or use linguistics constructions that do not align with the target language. Secondly, they may introduce topical and ideological biases reflecting Anglo-centric perspectives. To addressthis gap, we present Mult-IT, an MCQA dataset comprising over 110,000 manually written questions across a wide range of topics. All questions are sourced directly from preparation quizzes for Italian university entrance exams, or for exams for public sector employment in Italy. We are hopeful that this contribution enables a more comprehensive evaluation of LLMs’ proficiency, not only in the Italian language, but also in their grasp of Italian cultural and contextual knowledge.

Anthology ID:: 2024.clicit-1.131
Volume:: Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 1184–1201
Language:
URL:: https://aclanthology.org/2024.clicit-1.131/
DOI:
Bibkey:
Cite (ACL):: Matteo Rinaldi, Jacopo Gili, Maria Francis, Mattia Goffetti, Viviana Patti, and Malvina Nissim. 2024. Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 1184–1201, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: Mult-IT Multiple Choice Questions on Multiple Topics in Italian: A CALAMITA Challenge (Rinaldi et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.131.pdf

PDF Cite Search Fix data