Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, Wai Lam


Abstract
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to save annotation budgets while achieving competitive or even better performances for iterative preference learning. Built on intuitions from active learning, we empirically show that annotating those response pairs with small margins is generally better than large or random. Besides, experiments under the multi-iteration scenario suggest allocating more annotation budgets in the earlier iterations rather than later ones.
Anthology ID:
2024.findings-emnlp.382
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6549–6561
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.382
DOI:
Bibkey:
Cite (ACL):
Sen Yang, Leyang Cui, Deng Cai, Xinting Huang, Shuming Shi, and Wai Lam. 2024. Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6549–6561, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning (Yang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.382.pdf