Hiteshi Sharma

2024

pdf bib abs
Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing
Baihe Huang | Hiteshi Sharma | Yi Mao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains. Within the training pipeline of LLMs, the Reinforcement Learning with Human Feedback (RLHF) phase is crucial for aligning LLMs with human preferences and values. Label smoothing, a technique that replaces hard labels with soft labels, emerges as promising techniques to enhance RLHF training. Despite the benefits, the choice of label smoothing parameters often relies on heuristic approaches and lack theoretical understanding. This paper addresses the challenge of selecting the label smoothing parameter in a principled manner. We introduce Confidence Aware Label Smoothing (CALS), a method that iteratively updates the label smoothing parameter based on preference labels and model forecasts. Our theoretical analysis characterizes the optimal label smoothing parameter, demonstrates its dependence on the confidence level, and reveals its influence on training dynamics and equilibrium. Empirical evaluations on state-of-the-art alignment tasks show that CALS achieves competitive performance, highlighting its potential for improving alignment.

Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of external logical solvers and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel language model that directly internalizes and emulates the reasoning processes of logical solvers and avoids parsing errors by learning strict adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers. Experimental results on two public deductive reasoning benchmarks show that LoGiPT outperforms state-of-the-art solver-augmented LMs and few-shot prompting methods on competitive LLMs like GPT-4. This project is available in https://github.com/Cyril-JZ/LoGiPT.

Co-authors

Yelong Shen 1

Ruochen Xu 1

Dongyan Zhao 1

Venues

emnlp1
findings1

Fix author