Santosh Mashetty

2023

pdf bib abs
LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference
Mutsumi Nakamura | Santosh Mashetty | Mihir Parmar | Neeraj Varshney | Chitta Baral
Findings of the Association for Computational Linguistics: EMNLP 2023

Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ~53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks. Data and source code are available at https://github.com/msantoshmadhav/LogicAttack.

Co-authors

Venues

findings1