Routing in Sparsely-gated Language Models responds to Context

Stefan Arnold, Marian Fietta, Dilara Yesilbas


Abstract
Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
Anthology ID:
2024.blackboxnlp-1.2
Volume:
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Yonatan Belinkov, Najoung Kim, Jaap Jumelet, Hosein Mohebbi, Aaron Mueller, Hanjie Chen
Venue:
BlackboxNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–22
Language:
URL:
https://aclanthology.org/2024.blackboxnlp-1.2
DOI:
Bibkey:
Cite (ACL):
Stefan Arnold, Marian Fietta, and Dilara Yesilbas. 2024. Routing in Sparsely-gated Language Models responds to Context. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 15–22, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Routing in Sparsely-gated Language Models responds to Context (Arnold et al., BlackboxNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.blackboxnlp-1.2.pdf