Finetuning LLMs for Comparative Assessment Tasks

Vatsal Raina, Adian Liusie, Mark Gales


Abstract
Automated assessment in natural language generation is a challenging task. Instruction-tuned large language models (LLMs) have shown promise in reference-free evaluation, particularly through comparative assessment. However, the quadratic computational complexity of pairwise comparisons limits its scalability. To address this, efficient comparative assessment has been explored by applying comparative strategies on zero-shot LLM probabilities. We propose a framework for finetuning LLMs for comparative assessment to align the model’s output with the target distribution of comparative probabilities. By training on soft probabilities, our approach improves state-of-the-art performance while maintaining high performance with an efficient subset of comparisons.
Anthology ID:
2025.coling-main.225
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3345–3352
Language:
URL:
https://aclanthology.org/2025.coling-main.225/
DOI:
Bibkey:
Cite (ACL):
Vatsal Raina, Adian Liusie, and Mark Gales. 2025. Finetuning LLMs for Comparative Assessment Tasks. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3345–3352, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Finetuning LLMs for Comparative Assessment Tasks (Raina et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.225.pdf