CoSMoEs: Compact Sparse Mixture of Experts

Patrick Huber; Akshat Shrivastava; Ernie Chang; Chinnadhurai Sankar; Ahmed A Aly; Adithya Sagar

CoSMoEs: Compact Sparse Mixture of Experts

Patrick Huber, Akshat Shrivastava, Ernie Chang, Chinnadhurai Sankar, Ahmed A Aly, Adithya Sagar

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment

Anthology ID:: 2026.alvr-main.4
Volume:: Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Qianqi Yan, Syrielle Montariol, Yue Fan, Jing Gu, Jiayi Pan, Manling Li, Parisa Kordjamshidi, Alane Suhr, Xin Eric Wang
Venues:: ALVR | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 46–56
Language:
URL:: https://aclanthology.org/2026.alvr-main.4/
DOI:
Bibkey:
Cite (ACL):: Patrick Huber, Akshat Shrivastava, Ernie Chang, Chinnadhurai Sankar, Ahmed A Aly, and Adithya Sagar. 2026. CoSMoEs: Compact Sparse Mixture of Experts. In Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), pages 46–56, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: CoSMoEs: Compact Sparse Mixture of Experts (Huber et al., ALVR 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.alvr-main.4.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{huber-etal-2026-cosmoes,
    title = "{C}o{SM}o{E}s: Compact Sparse Mixture of Experts",
    author = "Huber, Patrick  and
      Shrivastava, Akshat  and
      Chang, Ernie  and
      Sankar, Chinnadhurai  and
      Aly, Ahmed A  and
      Sagar, Adithya",
    editor = "Yan, Qianqi  and
      Montariol, Syrielle  and
      Fan, Yue  and
      Gu, Jing  and
      Pan, Jiayi  and
      Li, Manling  and
      Kordjamshidi, Parisa  and
      Suhr, Alane  and
      Wang, Xin Eric",
    booktitle = "Proceedings of the 4th Workshop on Advances in Language and Vision Research ({ALVR})",
    month = jul,
    year = "2026",
    address = "San Diego, California, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.alvr-main.4/",
    pages = "46--56",
    ISBN = "979-8-89176-398-2",
    abstract = "Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment"
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="huber-etal-2026-cosmoes">
    <titleInfo>
        <title>CoSMoEs: Compact Sparse Mixture of Experts</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Patrick</namePart>
        <namePart type="family">Huber</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Akshat</namePart>
        <namePart type="family">Shrivastava</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ernie</namePart>
        <namePart type="family">Chang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chinnadhurai</namePart>
        <namePart type="family">Sankar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ahmed</namePart>
        <namePart type="given">A</namePart>
        <namePart type="family">Aly</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Adithya</namePart>
        <namePart type="family">Sagar</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2026-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Qianqi</namePart>
            <namePart type="family">Yan</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Syrielle</namePart>
            <namePart type="family">Montariol</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yue</namePart>
            <namePart type="family">Fan</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jing</namePart>
            <namePart type="family">Gu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jiayi</namePart>
            <namePart type="family">Pan</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Manling</namePart>
            <namePart type="family">Li</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Parisa</namePart>
            <namePart type="family">Kordjamshidi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alane</namePart>
            <namePart type="family">Suhr</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Xin</namePart>
            <namePart type="given">Eric</namePart>
            <namePart type="family">Wang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">San Diego, California, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-398-2</identifier>
    </relatedItem>
    <abstract>Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment</abstract>
    <identifier type="citekey">huber-etal-2026-cosmoes</identifier>
    <location>
        <url>https://aclanthology.org/2026.alvr-main.4/</url>
    </location>
    <part>
        <date>2026-07</date>
        <extent unit="page">
            <start>46</start>
            <end>56</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T CoSMoEs: Compact Sparse Mixture of Experts
%A Huber, Patrick
%A Shrivastava, Akshat
%A Chang, Ernie
%A Sankar, Chinnadhurai
%A Aly, Ahmed A.
%A Sagar, Adithya
%Y Yan, Qianqi
%Y Montariol, Syrielle
%Y Fan, Yue
%Y Gu, Jing
%Y Pan, Jiayi
%Y Li, Manling
%Y Kordjamshidi, Parisa
%Y Suhr, Alane
%Y Wang, Xin Eric
%S Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
%D 2026
%8 July
%I Association for Computational Linguistics
%C San Diego, California, USA
%@ 979-8-89176-398-2
%F huber-etal-2026-cosmoes
%X Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment
%U https://aclanthology.org/2026.alvr-main.4/
%P 46-56

Download as File

Markdown (Informal)

[CoSMoEs: Compact Sparse Mixture of Experts](https://aclanthology.org/2026.alvr-main.4/) (Huber et al., ALVR 2026)

CoSMoEs: Compact Sparse Mixture of Experts (Huber et al., ALVR 2026)

ACL

Patrick Huber, Akshat Shrivastava, Ernie Chang, Chinnadhurai Sankar, Ahmed A Aly, and Adithya Sagar. 2026. CoSMoEs: Compact Sparse Mixture of Experts. In Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), pages 46–56, San Diego, California, USA. Association for Computational Linguistics.