When Evolution Strategy Meets Language Models Tuning

Bo Huang, Yuxin Jiang, Mingyang Chen, Yi Wang, Hongyang Chen, Wei Wang


Abstract
Supervised Fine-tuning has been pivotal in training autoregressive language models, yet it introduces exposure bias. To mitigate this, Post Fine-tuning, including on-policy and off-policy methods, has emerged as a solution to enhance models further. However, each has its limitations regarding performance enhancements and susceptibility to overfitting. In this paper, we introduce a novel on-policy approach called Evolution Strategy Optimization (ESO), which is designed by harnessing the principle of biological evolution, namely survival of the fittest. Particularly, we consider model tuning as an evolution process, and each output sentence generated by the model can provide a perturbation signal to the model parameter space. Then, the fitness of perturbation signals is quantified by the difference between its score and the averaged one offered by a reward function, which guides the optimization process. Empirically, the proposed method can achieve superior performance in various tasks and comparable performance in the human alignment task.
Anthology ID:
2025.coling-main.357
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5333–5344
Language:
URL:
https://aclanthology.org/2025.coling-main.357/
DOI:
Bibkey:
Cite (ACL):
Bo Huang, Yuxin Jiang, Mingyang Chen, Yi Wang, Hongyang Chen, and Wei Wang. 2025. When Evolution Strategy Meets Language Models Tuning. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5333–5344, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
When Evolution Strategy Meets Language Models Tuning (Huang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.357.pdf