Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing

Baihe Huang, Hiteshi Sharma, Yi Mao


Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains. Within the training pipeline of LLMs, the Reinforcement Learning with Human Feedback (RLHF) phase is crucial for aligning LLMs with human preferences and values. Label smoothing, a technique that replaces hard labels with soft labels, emerges as promising techniques to enhance RLHF training. Despite the benefits, the choice of label smoothing parameters often relies on heuristic approaches and lack theoretical understanding. This paper addresses the challenge of selecting the label smoothing parameter in a principled manner. We introduce Confidence Aware Label Smoothing (CALS), a method that iteratively updates the label smoothing parameter based on preference labels and model forecasts. Our theoretical analysis characterizes the optimal label smoothing parameter, demonstrates its dependence on the confidence level, and reveals its influence on training dynamics and equilibrium. Empirical evaluations on state-of-the-art alignment tasks show that CALS achieves competitive performance, highlighting its potential for improving alignment.
Anthology ID:
2024.emnlp-main.1189
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21341–21352
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1189/
DOI:
10.18653/v1/2024.emnlp-main.1189
Bibkey:
Cite (ACL):
Baihe Huang, Hiteshi Sharma, and Yi Mao. 2024. Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21341–21352, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing (Huang et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1189.pdf