Preemptive Answer “Attacks” on Chain-of-Thought Reasoning

Rongwu Xu, Zehan Qi, Wei Xu


Abstract
Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users by prompt injection attacks. Experiments reveal that preemptive answers significantly impair the model’s reasoning capability across various CoT methods and a broad spectrum of datasets. To bolster the robustness of reasoning, we propose two measures aimed at mitigating this issue to some extent.
Anthology ID:
2024.findings-acl.876
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14708–14726
Language:
URL:
https://aclanthology.org/2024.findings-acl.876
DOI:
Bibkey:
Cite (ACL):
Rongwu Xu, Zehan Qi, and Wei Xu. 2024. Preemptive Answer “Attacks” on Chain-of-Thought Reasoning. In Findings of the Association for Computational Linguistics ACL 2024, pages 14708–14726, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Preemptive Answer “Attacks” on Chain-of-Thought Reasoning (Xu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.876.pdf