Edoardo Michielon
2024
A Study on the Soundness of Closed-ended Evaluation of Large Language Models Adapted to the Italian Language
Elio Musacchio
|
Lucia Siciliani
|
Pierpaolo Basile
|
Edoardo Michielon
|
Marco Pasqualini
|
Asia Beatrice Uboldi
|
Giovanni Semeraro
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
With the rising interest in Large Language Models, deep architectures capable of solving a wide range of Natural LanguageGeneration tasks, an increasing number of open weights architectures have been developed and released online. In contrastwith older architectures, which were aimed at solving specific linguistic assignments, Large Language Models have shownoutstanding capabilities in solving several tasks at once, raising the question of whether they can truly comprehend naturallanguage. Nevertheless, evaluating this kind of capability is far from easy. One of the proposed solutions so far is usingbenchmarks that combine various types of tasks. This approach is based on the premise that achieving good performance ineach of these individual tasks can imply having developed a model capable of understanding language. However, while thisassumption is not incorrect, it is evident that it is not sufficient, and the evaluation of Large Language Models still remains anopen challenge. In this paper, we conduct a study aimed at highlighting the potential and limitations of current datasets andhow a new evaluation setting applied to language-adapted Large Language Models may provide more insight than traditionalapproaches.
Search
Fix data
Co-authors
- Pierpaolo Basile 1
- Elio Musacchio 1
- Marco Pasqualini 1
- Giovanni Semeraro 1
- Lucia Siciliani 1
- show all...