ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian

Giuseppe Attanasio; Pieter Delobelle; Moreno La Quatra; Andrea Santilli; Beatrice Savoldi

ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian

Giuseppe Attanasio, Pieter Delobelle, Moreno La Quatra, Andrea Santilli, Beatrice Savoldi

Abstract

Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered.This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian.Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks.In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models;iii) natural language understanding tasks are challenging for current models.We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.

Anthology ID:: 2024.clicit-1.6
Volume:: Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 39–51
Language:
URL:: https://aclanthology.org/2024.clicit-1.6/
DOI:
Bibkey:
Cite (ACL):: Giuseppe Attanasio, Pieter Delobelle, Moreno La Quatra, Andrea Santilli, and Beatrice Savoldi. 2024. ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 39–51, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian (Attanasio et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.6.pdf

PDF Cite Search Fix data