Giuseppe Gallipoli
2024
Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning
Daniele Rege Cambrin
|
Giuseppe Gallipoli
|
Irene Benedetto
|
Luca Cagliero
|
Paolo Garza
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) have demonstrated impressive performance across various tasks. However, current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. These solutions are often not scalable or feasible due to their associated costs, complexity, or resource requirements. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution for fine-tuning different architectures. We evaluate their effectiveness in solving Math Word Problems and question answering across different models of varying sizes. For the analyzed tasks, we found that the traditional Cross-Entropy loss represents a sub-optimal choice, while models trained to minimize alternative (task-dependent) losses, such as Focal or Lovász, achieve a mean improvement of +36% on exact match without requiring additional data or human feedback. These findings suggest a promising pathway for more efficient and accessible training processes.
Keyword-based Annotation of Visually-Rich Document Content for Trend and Risk Analysis Using Large Language Models
Giuseppe Gallipoli
|
Simone Papicchio
|
Lorenzo Vaiani
|
Luca Cagliero
|
Arianna Miola
|
Daniele Borghi
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
In the banking and finance sectors, members of the business units focused on Trend and Risk Analysis daily process internal and external visually-rich documents including text, images, and tables. Given a facet (i.e., topic) of interest, they are particularly interested in retrieving the top trending keywords related to it and then use them to annotate the most relevant document elements (e.g., text paragraphs, images or tables). In this paper, we explore the use of both open-source and proprietary Large Language Models to automatically generate lists of facet-relevant keywords, automatically produce free-text descriptions of both keywords and multimedia document content, and then annotate documents by leveraging textual similarity approaches. The preliminary results, achieved on English and Italian documents, show that OpenAI GPT-4 achieves superior performance in keyword description generation and multimedia content annotation, while the open-source Meta AI Llama2 model turns out to be highly competitive in generating additional keywords.
Search
Fix data
Co-authors
- Luca Cagliero 2
- Irene Benedetto 1
- Daniele Borghi 1
- Paolo Garza 1
- Arianna Miola 1
- show all...