Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? Kevin Liu author Stephen Casper author Dylan Hadfield-Menell author Jacob Andreas author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication liu-etal-2023-cognitive 10.18653/v1/2023.emnlp-main.291 https://aclanthology.org/2023.emnlp-main.291/ 2023-12 4791 4797