SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks

Mohammadreza Salehi; Sachin Mehta; Aditya Kusupati; Ali Farhadi; Hannaneh Hajishirzi

doi:10.18653/v1/2023.findings-emnlp.706

SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks

Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.

Anthology ID:: 2023.findings-emnlp.706
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10519–10532
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.706/
DOI:: 10.18653/v1/2023.findings-emnlp.706
Bibkey:
Cite (ACL):: Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, and Hannaneh Hajishirzi. 2023. SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10519–10532, Singapore. Association for Computational Linguistics.
Cite (Informal):: SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks (Salehi et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.706.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{salehi-etal-2023-sharcs,
    title = "{SHARCS}: Efficient Transformers Through Routing with Dynamic Width Sub-networks",
    author = "Salehi, Mohammadreza  and
      Mehta, Sachin  and
      Kusupati, Aditya  and
      Farhadi, Ali  and
      Hajishirzi, Hannaneh",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.706/",
    doi = "10.18653/v1/2023.findings-emnlp.706",
    pages = "10519--10532",
    abstract = "We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="salehi-etal-2023-sharcs">
    <titleInfo>
        <title>SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Mohammadreza</namePart>
        <namePart type="family">Salehi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Sachin</namePart>
        <namePart type="family">Mehta</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Aditya</namePart>
        <namePart type="family">Kusupati</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ali</namePart>
        <namePart type="family">Farhadi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Hannaneh</namePart>
        <namePart type="family">Hajishirzi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2023-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Findings of the Association for Computational Linguistics: EMNLP 2023</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Houda</namePart>
            <namePart type="family">Bouamor</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Juan</namePart>
            <namePart type="family">Pino</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kalika</namePart>
            <namePart type="family">Bali</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Singapore</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.</abstract>
    <identifier type="citekey">salehi-etal-2023-sharcs</identifier>
    <identifier type="doi">10.18653/v1/2023.findings-emnlp.706</identifier>
    <location>
        <url>https://aclanthology.org/2023.findings-emnlp.706/</url>
    </location>
    <part>
        <date>2023-12</date>
        <extent unit="page">
            <start>10519</start>
            <end>10532</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks
%A Salehi, Mohammadreza
%A Mehta, Sachin
%A Kusupati, Aditya
%A Farhadi, Ali
%A Hajishirzi, Hannaneh
%Y Bouamor, Houda
%Y Pino, Juan
%Y Bali, Kalika
%S Findings of the Association for Computational Linguistics: EMNLP 2023
%D 2023
%8 December
%I Association for Computational Linguistics
%C Singapore
%F salehi-etal-2023-sharcs
%X We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.
%R 10.18653/v1/2023.findings-emnlp.706
%U https://aclanthology.org/2023.findings-emnlp.706/
%U https://doi.org/10.18653/v1/2023.findings-emnlp.706
%P 10519-10532

Download as File

Markdown (Informal)

[SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks](https://aclanthology.org/2023.findings-emnlp.706/) (Salehi et al., Findings 2023)

SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks (Salehi et al., Findings 2023)

ACL

Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, and Hannaneh Hajishirzi. 2023. SHARCS: Efficient Transformers Through Routing with Dynamic Width Sub-networks. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10519–10532, Singapore. Association for Computational Linguistics.