Guo Tang

2025

Despite the remarkable reasoning capabilities demonstrated by large language models (LLM), the substantial computational overhead limits their practices. Some efforts have been directed toward distilling multi-step reasoning capabilities into smaller models through chain-of-thought (CoT). While CoT facilitates multi-step reasoning, the dependencies between reasoning steps are not always clearly discernible, which may lead to inconsistent reasoning. In this paper, we introduce fine-grained attribution reasoning distillation (FARD), which incorporates grounded citations to consolidate the relationships between reasoning steps. Specifically, FARD distills attribution reasoning rationales from LLMs to substitute CoT reasonings, which clarifies the dependencies among reasoning steps. Besides, we regularize the model’s attention pattern by leveraging the causal dependencies between reasoning steps, thereby enhancing the consistency of reasoning. Grounded attribution reasoning also enhances interpretability and verifiability, thereby facilitating faithful reasoning. We evaluate FARD on mathematical and general reasoning benchmarks. The experimental results indicate that FARD outperforms CoT distillation methods in mathematical reasoning, demonstrating its effectiveness. Furthermore, the small models trained with FARD have shown outstanding performance in out-of-distribution reasoning, proving strong generalization capabilities.

2024

pdf bib abs
Towards Benchmarking Situational Awareness of Large Language Models:Comprehensive Benchmark, Evaluation and Analysis
Guo Tang | Zheng Chu | Wenxiang Zheng | Ming Liu | Bing Qin
Findings of the Association for Computational Linguistics: EMNLP 2024

Situational awareness refers to the capacity to perceive and comprehend the present context and anticipate forthcoming events, which plays a critical role in aiding decision-making, anticipating potential issues, and adapting to dynamic circumstances. Nevertheless, the situational awareness capabilities of large language models have not yet been comprehensively assessed. To address this, we propose SA-Bench, a comprehensive benchmark that covers three tiers of situational awareness capabilities, covering environment perception, situation comprehension and future projection. SA-Bench provides a comprehensive evaluation to explore the situational awareness capabilities of LLMs. We conduct extensive experiments on advanced LLMs, including GPT-4, LLaMA3, Qwen1.5, among others. Our experimental results indicate that even SOTA LLMs still exhibit substantial capability gaps compared to humans. In addition, we thoroughly analysis and examine the challenges encountered by LLMs across various tasks, as well as emphasize the deficiencies they confront. We hope SA-Bench will foster research within the field of situational awareness.

Co-authors

Zhongjie Wang (王钟杰) 1

Wenxiang Zheng 1

Venues

coling1
findings1

Fix data