Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning

Daniele Rege Cambrin; Giuseppe Gallipoli; Irene Benedetto; Luca Cagliero; Paolo Garza

Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning

Daniele Rege Cambrin, Giuseppe Gallipoli, Irene Benedetto, Luca Cagliero, Paolo Garza

Abstract

Large Language Models (LLMs) have demonstrated impressive performance across various tasks. However, current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. These solutions are often not scalable or feasible due to their associated costs, complexity, or resource requirements. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution for fine-tuning different architectures. We evaluate their effectiveness in solving Math Word Problems and question answering across different models of varying sizes. For the analyzed tasks, we found that the traditional Cross-Entropy loss represents a sub-optimal choice, while models trained to minimize alternative (task-dependent) losses, such as Focal or Lovász, achieve a mean improvement of +36% on exact match without requiring additional data or human feedback. These findings suggest a promising pathway for more efficient and accessible training processes.

Anthology ID:: 2024.findings-emnlp.704
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12060–12079
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.704
DOI:
Bibkey:
Cite (ACL):: Daniele Rege Cambrin, Giuseppe Gallipoli, Irene Benedetto, Luca Cagliero, and Paolo Garza. 2024. Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12060–12079, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning (Rege Cambrin et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.704.pdf

PDF Cite Search