Learning Temporally-Aware Sample Weights for Preference Optimization

Mengyang Li, Xudong Zhou, Pinlong Zhao


Abstract
Preference optimization is fundamental for aligning large language models. While existing methods use sample weighting, they typically rely on static functions of instantaneous model states and ignore temporal learning dynamics. We contend that a sample’s value evolves throughout training, characterized by patterns such as stable convergence or noisy oscillation. We propose MetaPO, a framework that meta-learns adaptive weights using three temporal features: reward margin evolution, learning volatility, and reference deviation. Through bilevel optimization on validation data, MetaPO automatically discovers weighting strategies tailored to specific datasets. Experiments on models ranging from 7B to 70B parameters demonstrate statistically significant improvements over strong baselines, achieving gains of up to 2.4 points on AlpacaEval 2.0 and Arena-Hard. Interpretability analysis confirms that temporal features drive over 70% of the weighting decisions and that the learned weights correlate strongly with sample quality.
Anthology ID:
2026.findings-acl.601
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12361–12377
Language:
URL:
https://aclanthology.org/2026.findings-acl.601/
DOI:
Bibkey:
Cite (ACL):
Mengyang Li, Xudong Zhou, and Pinlong Zhao. 2026. Learning Temporally-Aware Sample Weights for Preference Optimization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12361–12377, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Learning Temporally-Aware Sample Weights for Preference Optimization (Li et al., Findings 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.findings-acl.601.pdf
Checklist:
 2026.findings-acl.601.checklist.pdf