Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Hongbang Yuan; Yubo Chen (陈玉博); Pengfei Cao; Zhuoran Jin; Kang Liu (刘康)

doi:10.18653/v1/2025.findings-naacl.354

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models

Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, Kang Liu

Abstract

Large language models (LLMs) have achieved remarkable success but still tend to generate factually erroneous responses, a phenomenon known as hallucination. A recent trend is to use preference learning to fine-tune models to align with factuality. However, existing work primarily evaluates fine-tuned models on in-domain (ID) datasets and the factuality on out-of-domain (OOD) datasets remains underexplored. In this paper, we conduct a comprehensive evaluation of the factuality of different models tuned by various preference learning algorithms and demonstrate that their performance on OOD datasets either increases minimally or decreases. Subsequently, we reveal that the main cause of model’s failure to uphold factuality under a distribution shift is under-alignment, rather than over-alignment, by analyzing the token distribution shift of the models before and after tuning. Finally, we propose APEFT (Atomic Preference Enhanced Factuality Tuning), a framework that enhances model’s awareness of factuality at the granularity of individual facts. Extensive experiments demonstrate that APEFT improves model performance by an average of on both ID and OOD datasets, which is highly effective.

Anthology ID:: 2025.findings-naacl.354
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6310–6323
Language:
URL:: https://aclanthology.org/2025.findings-naacl.354/
DOI:: 10.18653/v1/2025.findings-naacl.354
Bibkey:
Cite (ACL):: Hongbang Yuan, Yubo Chen, Pengfei Cao, Zhuoran Jin, and Kang Liu. 2025. Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 6310–6323, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models (Yuan et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.354.pdf

PDF Cite Search Fix data