LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, Chitta Baral


Abstract
Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ~53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks. Data and source code are available at https://github.com/msantoshmadhav/LogicAttack.
Anthology ID:
2023.findings-emnlp.889
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13322–13334
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.889
DOI:
10.18653/v1/2023.findings-emnlp.889
Bibkey:
Cite (ACL):
Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, and Chitta Baral. 2023. LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13322–13334, Singapore. Association for Computational Linguistics.
Cite (Informal):
LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference (Nakamura et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.889.pdf