Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang


Abstract
Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Reject-sampling by Self-instruct with Continued Fine-tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.
Anthology ID:
2024.emnlp-main.229
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3942–3965
Language:
URL:
https://aclanthology.org/2024.emnlp-main.229
DOI:
10.18653/v1/2024.emnlp-main.229
Bibkey:
Cite (ACL):
Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, and Zhiheng Huang. 2024. Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3942–3965, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models (Wu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.229.pdf