Sample Efficient Alignment Learning With Episodic Control

Van Dai Do; Quan Hung Tran; Ahmed Kirmani; Lu Zhang; Hung Le

doi:10.18653/v1/2025.findings-emnlp.560

Sample Efficient Alignment Learning With Episodic Control

Van Dai Do, Quan Hung Tran, Ahmed Kirmani, Lu Zhang, Hung Le

Abstract

Aligning large language models (LLMs) with specific task objectives is challenging, especially when access to feedback signals for guiding the model is limited. While existing parametric methods perform reasonably, they rely heavily on large datasets and frequent feedback, making them impractical in scenarios with limited human feedback. We introduce Alignment Learning with Episodic Control (ALEC), a non-parametric framework that aligns LLM outputs during inference without fine-tuning. ALEC employs a key-value memory to store the associations between generated text and its corresponding values. It leverages a novel confidence-based writing scheme to update these stored values, maximizing the use of available data. During inference, ALEC utilizes a nearest-neighbor mechanism to estimate the values of generated texts, enabling the selection of the optimal text for decoding. Our method outperforms state-of-the-art baselines on harmless, helpful, and summarization tasks, demonstrating improved alignment with minimal interactions with the true reward model.

Anthology ID:: 2025.findings-emnlp.560
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10601–10618
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.560/
DOI:: 10.18653/v1/2025.findings-emnlp.560
Bibkey:
Cite (ACL):: Van Dai Do, Quan Hung Tran, Ahmed Kirmani, Lu Zhang, and Hung Le. 2025. Sample Efficient Alignment Learning With Episodic Control. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10601–10618, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Sample Efficient Alignment Learning With Episodic Control (Do et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.560.pdf
Checklist:: 2025.findings-emnlp.560.checklist.pdf

PDF Cite Search Checklist Fix data