SRA: Span Representation Alignment for Large Language Model Distillation

Quoc Phong Dao; Hoang Son Nguyen; Pham Khanh Chi; Tung Nguyen; Linh Ngo Van; Nguyen Thi Ngoc Diep; Trung Le

SRA: Span Representation Alignment for Large Language Model Distillation

Quoc Phong Dao, Hoang Son Nguyen, Pham Khanh Chi, Tung Nguyen, Linh Ngo Van, Nguyen Thi Ngoc Diep, Trung Le

Abstract

Cross-Tokenizer Knowledge Distillation (CTKD) enables knowledge transfer between a large language model and a smaller student, even when they employ different tokenizers. While existing approaches mainly focus on token-level alignment strategies, which are often brittle and sensitive to discrepancies between tokenizers, we argue that the method of aggregating tokens into more robust representations before distillation is of equal importance. In this paper, we introduce SRA (Span Representation Alignment for Large Language Model Distillation), a novel framework that reframes CTKD through the physical lens of Multi-Particle Dynamical Systems. SRA shifts the fundamental unit of alignment from tokens to robust, tokenizer-agnostic spans. We model each span as a cluster of particles and represent its state by its Center of Mass (CoM) - an attention-weighted average that captures rich semantic information. We leverage the concept of span centers of mass with attention-derived weighting to prioritize the most salient spans. In addition, we employ a geometric regularizer to preserve the structural integrity of the representation space and introduce aligned span logit distillation to enhance knowledge transfer across models. In challenging cross-architecture distillation experiments, SRA consistently and significantly outperforms state-of-the-art CTKD baselines, validating our physically-grounded approach.

Anthology ID:: 2026.acl-long.1522
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32961–32975
Language:
URL:: https://aclanthology.org/2026.acl-long.1522/
DOI:
Bibkey:
Cite (ACL):: Quoc Phong Dao, Hoang Son Nguyen, Pham Khanh Chi, Tung Nguyen, Linh Ngo Van, Nguyen Thi Ngoc Diep, and Trung Le. 2026. SRA: Span Representation Alignment for Large Language Model Distillation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32961–32975, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SRA: Span Representation Alignment for Large Language Model Distillation (Dao et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1522.pdf
Checklist:: 2026.acl-long.1522.checklist.pdf

PDF Cite Search Checklist Fix data