Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang


Abstract
We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge state probing. However, LLMs often fail to faithfully express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, DreamCatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.
Anthology ID:
2024.knowledgenlp-1.4
Volume:
Proceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Wenhao Yu, Weijia Shi, Michihiro Yasunaga, Meng Jiang, Chenguang Zhu, Hannaneh Hajishirzi, Luke Zettlemoyer, Zhihan Zhang
Venues:
KnowledgeNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–58
Language:
URL:
https://aclanthology.org/2024.knowledgenlp-1.4
DOI:
10.18653/v1/2024.knowledgenlp-1.4
Bibkey:
Cite (ACL):
Yuxin Liang, Zhuoyang Song, Hao Wang, and Jiaxing Zhang. 2024. Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation. In Proceedings of the 3rd Workshop on Knowledge Augmented Methods for NLP, pages 44–58, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation (Liang et al., KnowledgeNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.knowledgenlp-1.4.pdf