LLM Factoscope: Uncovering LLMs’ Factual Discernment through Measuring Inner States

Jinwen He, Yujia Gong, Zijin Lin, Cheng’an Wei, Yue Zhao, Kai Chen


Abstract
Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. Inspired by human lie detectors using physiological responses, we introduce the LLM Factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs’ inner states when generating factual versus non-factual content. We demonstrate its effectiveness across various architectures, achieving over 96% accuracy on our custom-collected factual detection dataset. Our work opens a new avenue for utilizing LLMs’ inner states for factual detection and encourages further exploration into LLMs’ inner workings for enhanced reliability and transparency.
Anthology ID:
2024.findings-acl.608
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10218–10230
Language:
URL:
https://aclanthology.org/2024.findings-acl.608
DOI:
10.18653/v1/2024.findings-acl.608
Bibkey:
Cite (ACL):
Jinwen He, Yujia Gong, Zijin Lin, Cheng’an Wei, Yue Zhao, and Kai Chen. 2024. LLM Factoscope: Uncovering LLMs’ Factual Discernment through Measuring Inner States. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10218–10230, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
LLM Factoscope: Uncovering LLMs’ Factual Discernment through Measuring Inner States (He et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.608.pdf