How Interpretable are Reasoning Explanations from Prompting Large Language Models?

Yeo Wei Jie, Ranjan Satapathy, Rick Goh, Erik Cambria


Abstract
Prompt Engineering has garnered significant attention for enhancing the performance of large language models across a multitude of tasks. Techniques such as the Chain-of-Thought not only bolster task performance but also delineate a clear trajectory of reasoning steps, offering a tangible form of explanation for the audience. Prior works on interpretability assess the reasoning chains yielded by Chain-of-Thought solely along a singular axis, namely faithfulness. We present a comprehensive and multifaceted evaluation of interpretability, examining not only faithfulness but also robustness and utility across multiple commonsense reasoning benchmarks. Likewise, our investigation is not confined to a single prompting technique; it expansively covers a multitude of prevalent prompting techniques employed in large language models, thereby ensuring a wide-ranging and exhaustive evaluation. In addition, we introduce a simple interpretability alignment technique, termed Self-Entailment-Alignment Chain-of-thought, that yields more than 70% improvements across multiple dimensions of interpretability. Code is available at https://github.com/SenticNet/CoT_interpretability
Anthology ID:
2024.findings-naacl.138
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2148–2164
Language:
URL:
https://aclanthology.org/2024.findings-naacl.138
DOI:
10.18653/v1/2024.findings-naacl.138
Bibkey:
Cite (ACL):
Yeo Wei Jie, Ranjan Satapathy, Rick Goh, and Erik Cambria. 2024. How Interpretable are Reasoning Explanations from Prompting Large Language Models?. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 2148–2164, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
How Interpretable are Reasoning Explanations from Prompting Large Language Models? (Wei Jie et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.138.pdf