Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs

Houman Mehrafarin, Arash Eshghi, Ioannis Konstas


Abstract
Evaluating Large Language Models (LLMs) on reasoning benchmarks demonstrates their ability to solve compositional questions. However, little is known of whether these models engage in genuine logical reasoning or simply rely on implicit cues to generate answers. In this paper, we investigate the transitive reasoning capabilities of two distinct LLM architectures, LLaMA 2 and Flan-T5, by manipulating facts within two compositional datasets: QASC and Bamboogle. We controlled for potential cues that might influence the models’ performance, including (a) word/phrase overlaps across sections of test input; (b) models’ inherent knowledge during pre-training or fine-tuning; and (c) Named Entities. Our findings reveal that while both models leverage (a), Flan-T5 shows more resilience to experiments (b and c), having less variance than LLaMA 2. This suggests that models may develop an understanding of transitivity through fine-tuning on knowingly relevant datasets, a hypothesis we leave to future work.
Anthology ID:
2024.emnlp-main.650
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11647–11662
Language:
URL:
https://aclanthology.org/2024.emnlp-main.650
DOI:
Bibkey:
Cite (ACL):
Houman Mehrafarin, Arash Eshghi, and Ioannis Konstas. 2024. Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11647–11662, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs (Mehrafarin et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.650.pdf
Data:
 2024.emnlp-main.650.data.zip