Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Leonardo Ranaldi, Andre Freitas


Abstract
The alignment of reasoning abilities between smaller and larger Language Models are largely conducted via supervised fine-tuning using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations.In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-improve their abilities.Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on synthetic demonstrations provided by LLMs, and then the instructed models self-improve their abilities through preference optimization strategies.In particular, the second phase operates refinement heuristics based on Direct Preference Optimization, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs.Results obtained on commonsense and math reasoning tasks show that this approach consistently outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger language models.
Anthology ID:
2024.emnlp-main.139
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2325–2347
Language:
URL:
https://aclanthology.org/2024.emnlp-main.139
DOI:
10.18653/v1/2024.emnlp-main.139
Bibkey:
Cite (ACL):
Leonardo Ranaldi and Andre Freitas. 2024. Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2325–2347, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models (Ranaldi & Freitas, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.139.pdf
Software:
 2024.emnlp-main.139.software.zip
Data:
 2024.emnlp-main.139.data.zip