Polymorphic Universal Transformer

YiLong Chen; Zitian Gao; Yihao Xiao; Jason Klein Liu; Xinyu Yang; Yifan Luo; Haoming Luo; Zhengmao Ye; Tingwen Liu; Ran Tao; Bryan Dai

Polymorphic Universal Transformer

Yilong Chen, Zitian Gao, Yihao Xiao, Jason Klein Liu, Xinyu Yang, Yifan Luo, Haoming Luo, Zhengmao Ye, Tingwen Liu, Ran Tao, Bryan Dai

Abstract

Although the Universal Transformer (UT) mitigates the diminishing returns of standard LLM scaling by decoupling parameter count from depth, it remains constrained by linear computational costs and rigid weight-sharing mechanisms. These limitations lead to severe functional homogeneity, which subsequently induces over-smoothing, representation rank collapse, and degraded reasoning performance. In this work, we present the first systematic study of Compute Distribution Skew, identifying it as the primary driver of extrapolation failure. This is a pathological phenomenon in ultra-deep recurrent Transformers characterized by a disproportionate distribution of contributions across recurrent steps, resulting in distinct functional states during prefix and suffix processing phases. To address this challenge, we propose the Polymorphic Transformer, which aims to achieve functional polymorphism and depth sparsity within a shared-parameter framework. By integrating conditional sparse subspaces, SiLU Attention, and an uncertainty-aware depth scheduler, our architecture mitigates power-method collapse and effectively decouples logical depth from computational cost. Experiments demonstrate that our model significantly enhances representation rank and robustness, achieving complex reasoning performance comparable to baseline while reducing computation by 64.7%.

Anthology ID:: 2026.acl-long.1809
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39001–39013
Language:
URL:: https://aclanthology.org/2026.acl-long.1809/
DOI:
Bibkey:
Cite (ACL):: Yilong Chen, Zitian Gao, Yihao Xiao, Jason Klein Liu, Xinyu Yang, Yifan Luo, Haoming Luo, Zhengmao Ye, Tingwen Liu, Ran Tao, and Bryan Dai. 2026. Polymorphic Universal Transformer. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39001–39013, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Polymorphic Universal Transformer (Chen et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1809.pdf
Checklist:: 2026.acl-long.1809.checklist.pdf

PDF Cite Search Checklist Fix data