How do autoregressive transformers solve full addition?

Wang Peixu; Chen Yu; Yu Ming; Cheng Xiang

How do autoregressive transformers solve full addition?

Wang Peixu, Chen Yu, Yu Ming, Cheng Xiang

Abstract

Large pre-trained language models have demonstrated impressive capabilities, but there is still much to learn about how they operate. In this study, we conduct an investigation of the autoregressive transformer’s ability to perform basic addition operations. Specifically, by using causal analysis we found that a few different attention heads in the middle layers control the addition carry, with each head processing carries of different lengths. Due to the lack of global focus on the sequence within these attention heads, the model struggles to handle long-sequence addition tasks. By performing inference intervention on mistral-7B, partial task performance can be restored, with the accuracy on 20-digit long-sequence additions from 2% to 38%. Through fine-tuning, a new mechanism branches out for handling complex cases, yet it still faces challenges with length generalization. Our research reveals how the models perform basic arithmetic task, and further provides insights into the debate on whether these models are merely statistical.

Anthology ID:: 2025.emnlp-main.643
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12743–12767
Language:
URL:: https://aclanthology.org/2025.emnlp-main.643/
DOI:
Bibkey:
Cite (ACL):: Wang Peixu, Chen Yu, Yu Ming, and Cheng Xiang. 2025. How do autoregressive transformers solve full addition?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12743–12767, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: How do autoregressive transformers solve full addition? (Peixu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.643.pdf
Checklist:: 2025.emnlp-main.643.checklist.pdf

PDF Cite Search Checklist Fix data