Teaching Language Models to Self-Improve by Learning from Language Feedback

Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, JingBo Zhu


Abstract
Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotations. SRT uses a base language model (e.g., Tulu2) to generate initial responses, which are critiqued and refined by a more advanced model (e.g., GPT-4-Turbo). This process enables the base model to self-evaluate and improve its outputs, facilitating continuous learning. SRT further optimizes the model by learning from its self-generated feedback and refinements, creating a feedback loop that promotes model improvement. Our empirical evaluations demonstrate that SRT significantly outperforms strong baselines across diverse tasks and model sizes. When applied to a 70B parameter model, SRT increases the win rate from 9.6% to 25.8% on the AlpacaEval 2.0 benchmark, surpassing well-established systems such as GPT-4-0314, Claude 2, and Gemini. Our analysis highlights the crucial role of language feedback in the success of SRT, suggesting potential for further exploration in this direction.
Anthology ID:
2024.findings-acl.364
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6090–6101
Language:
URL:
https://aclanthology.org/2024.findings-acl.364
DOI:
Bibkey:
Cite (ACL):
Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, and JingBo Zhu. 2024. Teaching Language Models to Self-Improve by Learning from Language Feedback. In Findings of the Association for Computational Linguistics ACL 2024, pages 6090–6101, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Teaching Language Models to Self-Improve by Learning from Language Feedback (Hu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.364.pdf