FBS: Modeling Native Parallel Reading inside a Transformer

Tongxi Wang

FBS: Modeling Native Parallel Reading inside a Transformer

Abstract

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train–test consistency for preview/skimming. We propose the Fovea–Block–Skip Transformer (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.

Anthology ID:: 2026.findings-acl.200
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4106–4137
Language:
URL:: https://aclanthology.org/2026.findings-acl.200/
DOI:
Bibkey:
Cite (ACL):: Tongxi Wang. 2026. FBS: Modeling Native Parallel Reading inside a Transformer. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4106–4137, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: FBS: Modeling Native Parallel Reading inside a Transformer (Wang, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.200.pdf
Checklist:: 2026.findings-acl.200.checklist.pdf

PDF Cite Search Checklist Fix data