A Framework for Fine-Tuning LLMs Using Heterogeneous Feedback

Ryan Aponte; Ryan A. Rossi; Shunan Guo; Franck Dernoncourt; Tong Yu; Xiang Chen; Subrata Mitra; Nedim Lipka

A Framework for Fine-Tuning LLMs Using Heterogeneous Feedback

Ryan Aponte, Ryan A. Rossi, Shunan Guo, Franck Dernoncourt, Tong Yu, Xiang Chen, Subrata Mitra, Nedim Lipka

Abstract

Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chat- bots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an un- supervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can vary extensively in supervision format, from numer- ical to binary as well as multi-dimensional with many different values. We present a framework for fine-tuning LLMs using heterogeneous feed- back, which has two main components. First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF. Next, given this unified feedback dataset, we extract a high- quality and diverse subset to obtain perfor- mance increases potentially exceeding the full dataset. We conduct extensive experiments to understand the effectiveness of these tech- niques for incorporating heterogeneous feed- back, and demonstrate improvements from us- ing a high-quality and diverse subset of the data. We find that our framework is able to improve models in multiple areas simultaneously, such as in instruction following and bias reduction.

Anthology ID:: 2025.ranlp-1.13
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 111–117
Language:
URL:: https://aclanthology.org/2025.ranlp-1.13/
DOI:
Bibkey:
Cite (ACL):: Ryan Aponte, Ryan A. Rossi, Shunan Guo, Franck Dernoncourt, Tong Yu, Xiang Chen, Subrata Mitra, and Nedim Lipka. 2025. A Framework for Fine-Tuning LLMs Using Heterogeneous Feedback. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 111–117, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: A Framework for Fine-Tuning LLMs Using Heterogeneous Feedback (Aponte et al., RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.13.pdf

PDF Cite Search Fix data