Com2 : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Kai Xiong; Xiao Ding; Yixin Cao; Yuxiong Yan; Li Du; Yufei Zhang; Jinglong Gao; Jiaqian Liu; Bing Qin (秦兵); Ting Liu

doi:10.18653/v1/2025.acl-long.785

Com² : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models

Kai Xiong, Xiao Ding, Yixin Cao, Yuxiong Yan, Li Du, Yufei Zhang, Jinglong Gao, Jiaqian Liu, Bing Qin, Ting Liu

Abstract

Large language models (LLMs) have mastered abundant simple and explicit commonsense knowledge through pre-training, enabling them to achieve human-like performance in simple commonsense reasoning. Nevertheless, LLMs struggle to reason with complex and implicit commonsense knowledge that is derived from simple ones (such as understanding the long-term effects of certain events), an aspect humans tend to focus on more. Existing works focus on complex tasks like math and code, while complex commonsense reasoning remains underexplored due to its uncertainty and lack of structure. To fill this gap and align with real-world concerns, we propose a benchmark Com² focusing on complex commonsense reasoning. We first incorporate causal event graphs to serve as structured complex commonsense. Then we adopt causal theory (e.g., intervention) to modify the causal event graphs and obtain different scenarios that meet human concerns. Finally, an LLM is employed to synthesize examples with slow thinking, which is guided by the logical relationships in the modified causal graphs. Furthermore, we use detective stories to construct a more challenging subset. Experiments show that LLMs struggle in reasoning depth and breadth, while post-training and slow thinking can alleviate this. The code and data are available at https://github.com/Waste-Wood/Com2.

Anthology ID:: 2025.acl-long.785
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16119–16140
Language:
URL:: https://aclanthology.org/2025.acl-long.785/
DOI:: 10.18653/v1/2025.acl-long.785
Bibkey:
Cite (ACL):: Kai Xiong, Xiao Ding, Yixin Cao, Yuxiong Yan, Li Du, Yufei Zhang, Jinglong Gao, Jiaqian Liu, Bing Qin, and Ting Liu. 2025. Com2 : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16119–16140, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Com2 : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models (Xiong et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.785.pdf

PDF Cite Search Fix data