Routing in Sparsely-gated Language Models responds to Context

Stefan Arnold; Marian Fietta; Dilara Yesilbas

doi:10.18653/v1/2024.blackboxnlp-1.2

Routing in Sparsely-gated Language Models responds to Context

Stefan Arnold, Marian Fietta, Dilara Yesilbas

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.

Anthology ID:: 2024.blackboxnlp-1.2
Volume:: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Yonatan Belinkov, Najoung Kim, Jaap Jumelet, Hosein Mohebbi, Aaron Mueller, Hanjie Chen
Venues:: BlackboxNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15–22
Language:
URL:: https://aclanthology.org/2024.blackboxnlp-1.2/
DOI:: 10.18653/v1/2024.blackboxnlp-1.2
Bibkey:
Cite (ACL):: Stefan Arnold, Marian Fietta, and Dilara Yesilbas. 2024. Routing in Sparsely-gated Language Models responds to Context. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 15–22, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Routing in Sparsely-gated Language Models responds to Context (Arnold et al., BlackboxNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.blackboxnlp-1.2.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{arnold-etal-2024-routing,
    title = "Routing in Sparsely-gated Language Models responds to Context",
    author = "Arnold, Stefan  and
      Fietta, Marian  and
      Yesilbas, Dilara",
    editor = "Belinkov, Yonatan  and
      Kim, Najoung  and
      Jumelet, Jaap  and
      Mohebbi, Hosein  and
      Mueller, Aaron  and
      Chen, Hanjie",
    booktitle = "Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
    month = nov,
    year = "2024",
    address = "Miami, Florida, US",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.blackboxnlp-1.2/",
    doi = "10.18653/v1/2024.blackboxnlp-1.2",
    pages = "15--22",
    abstract = "Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="arnold-etal-2024-routing">
    <titleInfo>
        <title>Routing in Sparsely-gated Language Models responds to Context</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Stefan</namePart>
        <namePart type="family">Arnold</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Marian</namePart>
        <namePart type="family">Fietta</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dilara</namePart>
        <namePart type="family">Yesilbas</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2024-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Yonatan</namePart>
            <namePart type="family">Belinkov</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Najoung</namePart>
            <namePart type="family">Kim</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jaap</namePart>
            <namePart type="family">Jumelet</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Hosein</namePart>
            <namePart type="family">Mohebbi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Aaron</namePart>
            <namePart type="family">Mueller</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Hanjie</namePart>
            <namePart type="family">Chen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Miami, Florida, US</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.</abstract>
    <identifier type="citekey">arnold-etal-2024-routing</identifier>
    <identifier type="doi">10.18653/v1/2024.blackboxnlp-1.2</identifier>
    <location>
        <url>https://aclanthology.org/2024.blackboxnlp-1.2/</url>
    </location>
    <part>
        <date>2024-11</date>
        <extent unit="page">
            <start>15</start>
            <end>22</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Routing in Sparsely-gated Language Models responds to Context
%A Arnold, Stefan
%A Fietta, Marian
%A Yesilbas, Dilara
%Y Belinkov, Yonatan
%Y Kim, Najoung
%Y Jumelet, Jaap
%Y Mohebbi, Hosein
%Y Mueller, Aaron
%Y Chen, Hanjie
%S Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
%D 2024
%8 November
%I Association for Computational Linguistics
%C Miami, Florida, US
%F arnold-etal-2024-routing
%X Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.
%R 10.18653/v1/2024.blackboxnlp-1.2
%U https://aclanthology.org/2024.blackboxnlp-1.2/
%U https://doi.org/10.18653/v1/2024.blackboxnlp-1.2
%P 15-22

Download as File

Markdown (Informal)

[Routing in Sparsely-gated Language Models responds to Context](https://aclanthology.org/2024.blackboxnlp-1.2/) (Arnold et al., BlackboxNLP 2024)

Routing in Sparsely-gated Language Models responds to Context (Arnold et al., BlackboxNLP 2024)

ACL

Stefan Arnold, Marian Fietta, and Dilara Yesilbas. 2024. Routing in Sparsely-gated Language Models responds to Context. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 15–22, Miami, Florida, US. Association for Computational Linguistics.