Yoichi Aoki
2026
LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step Arithmetics
Keito Kudo | Yoichi Aoki | Tatsuki Kuribayashi | Shusaku Sone | Masaya Taniguchi | Ana Brassard | Keisuke Sakaguchi | Kentaro Inui
Findings of the Association for Computational Linguistics: EACL 2026
Keito Kudo | Yoichi Aoki | Tatsuki Kuribayashi | Shusaku Sone | Masaya Taniguchi | Ana Brassard | Keisuke Sakaguchi | Kentaro Inui
Findings of the Association for Computational Linguistics: EACL 2026
This study investigates the internal information flow of large language models (LLMs) while performing chain-of-thought (CoT) style reasoning.Specifically, with a particular interest in the faithfulness of the CoT explanation to LLMs’ final answer, we explore (i) when the LLMs’ answer is (pre)determined, especially before the CoT begins or after, and (ii) how strongly the information from CoT specifically has a causal effect on the final answer.Our experiments with controlled arithmetic tasks reveal a systematic internal reasoning mechanism of LLMs.They have not derived an answer at the moment when input was fed into the model.Instead, they compute (sub-)answers while generating the reasoning chain on the fly.Therefore, the generated reasoning chains can be regarded as faithful reflections of the model’s internal computation.
2024
First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning
Yoichi Aoki | Keito Kudo | Tatsuki Kuribayashi | Shusaku Sone | Masaya Taniguchi | Keisuke Sakaguchi | Kentaro Inui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yoichi Aoki | Keito Kudo | Tatsuki Kuribayashi | Shusaku Sone | Masaya Taniguchi | Keisuke Sakaguchi | Kentaro Inui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Explicit multi-step reasoning, such as chain-of-thought, is widely adopted in the community to explore the better performance of language models (LMs). We report on the systematic strategy that LMs use in this process.Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning when more steps are required to reach an answer. Conversely, their reliance on heuristics decreases as LMs progress closer to the final answer. This suggests that LMs track only a limited number of future steps and dynamically combine heuristic strategies with rational ones in solving tasks involving multi-step reasoning.
2023
Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning?
Keito Kudo | Yoichi Aoki | Tatsuki Kuribayashi | Ana Brassard | Masashi Yoshikawa | Keisuke Sakaguchi | Kentaro Inui
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Keito Kudo | Yoichi Aoki | Tatsuki Kuribayashi | Ana Brassard | Masashi Yoshikawa | Keisuke Sakaguchi | Kentaro Inui
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Compositionality is a pivotal property of symbolic reasoning. However, how well recent neural models capture compositionality remains underexplored in the symbolic reasoning tasks. This study empirically addresses this question by systematically examining recently published pre-trained seq2seq models with a carefully controlled dataset of multi-hop arithmetic symbolic reasoning. We introduce a skill tree on compositionality in arithmetic symbolic reasoning that defines the hierarchical levels of complexity along with three compositionality dimensions: systematicity, productivity, and substitutivity. Our experiments revealed that among the three types of composition, the models struggled most with systematicity, performing poorly even with relatively simple compositions. That difficulty was not resolved even after training the models with intermediate reasoning steps.
Empirical Investigation of Neural Symbolic Reasoning Strategies
Yoichi Aoki | Keito Kudo | Tatsuki Kuribayashi | Ana Brassard | Masashi Yoshikawa | Keisuke Sakaguchi | Kentaro Inui
Findings of the Association for Computational Linguistics: EACL 2023
Yoichi Aoki | Keito Kudo | Tatsuki Kuribayashi | Ana Brassard | Masashi Yoshikawa | Keisuke Sakaguchi | Kentaro Inui
Findings of the Association for Computational Linguistics: EACL 2023
Neural reasoning accuracy improves when generating intermediate reasoning steps. However, the source of this improvement is yet unclear. Here, we investigate and factorize the benefit of generating intermediate steps for symbolic reasoning. Specifically, we decompose the reasoning strategy w.r.t. step granularity and chaining strategy. With a purely symbolic numerical reasoning dataset (e.g., A=1, B=3, C=A+3, C?), we found that the choice of reasoning strategies significantly affects the performance, with the gap becoming even larger as the extrapolation length becomes longer. Surprisingly, we also found that certain configurations lead to nearly perfect performance, even in the case of length extrapolation. Our results indicate the importance of further exploring effective strategies for neural reasoning models.