Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli Celikyilmaz


Abstract
Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized. However, existing implementations, including “chain-of-thought” and its variants, fall short in capturing the *introspective* nature of knowledge required in commonsense reasoning, and in accounting for the mutual adaptation between the generation and utilization of knowledge. We propose a novel method to develop an introspective commonsense reasoner, **Crystal**. To tackle commonsense problems, it first introspects for knowledge statements related to the given question, and subsequently makes an informed prediction that is grounded in the previously introspected knowledge. The knowledge introspection and knowledge-grounded reasoning modes of the model are tuned via reinforcement learning to mutually adapt, where the reward derives from the feedback given by the model itself. Experiments show that Crystal significantly outperforms both the standard supervised finetuning and chain-of-thought distilled methods, and enhances the transparency of the commonsense reasoning process. Our work ultimately validates the feasibility and potential of reinforcing a neural model with self-feedback.
Anthology ID:
2023.emnlp-main.708
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11557–11572
Language:
URL:
https://aclanthology.org/2023.emnlp-main.708
DOI:
10.18653/v1/2023.emnlp-main.708
Bibkey:
Cite (ACL):
Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, and Asli Celikyilmaz. 2023. Crystal: Introspective Reasoners Reinforced with Self-Feedback. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11557–11572, Singapore. Association for Computational Linguistics.
Cite (Informal):
Crystal: Introspective Reasoners Reinforced with Self-Feedback (Liu et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.708.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.708.mp4