Meta-Learning Fast Weight Language Models

Kevin Clark; Kelvin Guu; Ming-Wei Chang; Panupong Pasupat; Geoffrey Hinton; Mohammad Norouzi

doi:10.18653/v1/2022.emnlp-main.661

Meta-Learning Fast Weight Language Models

Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, Mohammad Norouzi

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time, so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.

Anthology ID:: 2022.emnlp-main.661
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9751–9757
Language:
URL:: https://aclanthology.org/2022.emnlp-main.661/
DOI:: 10.18653/v1/2022.emnlp-main.661
Bibkey:
Cite (ACL):: Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, and Mohammad Norouzi. 2022. Meta-Learning Fast Weight Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9751–9757, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Meta-Learning Fast Weight Language Models (Clark et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.661.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{clark-etal-2022-meta,
    title = "Meta-Learning Fast Weight Language Models",
    author = "Clark, Kevin  and
      Guu, Kelvin  and
      Chang, Ming-Wei  and
      Pasupat, Panupong  and
      Hinton, Geoffrey  and
      Norouzi, Mohammad",
    editor = "Goldberg, Yoav  and
      Kozareva, Zornitsa  and
      Zhang, Yue",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.661/",
    doi = "10.18653/v1/2022.emnlp-main.661",
    pages = "9751--9757",
    abstract = "Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time, so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="clark-etal-2022-meta">
    <titleInfo>
        <title>Meta-Learning Fast Weight Language Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Kevin</namePart>
        <namePart type="family">Clark</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kelvin</namePart>
        <namePart type="family">Guu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Ming-Wei</namePart>
        <namePart type="family">Chang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Panupong</namePart>
        <namePart type="family">Pasupat</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Geoffrey</namePart>
        <namePart type="family">Hinton</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Mohammad</namePart>
        <namePart type="family">Norouzi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Yoav</namePart>
            <namePart type="family">Goldberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zornitsa</namePart>
            <namePart type="family">Kozareva</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yue</namePart>
            <namePart type="family">Zhang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Abu Dhabi, United Arab Emirates</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time, so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.</abstract>
    <identifier type="citekey">clark-etal-2022-meta</identifier>
    <identifier type="doi">10.18653/v1/2022.emnlp-main.661</identifier>
    <location>
        <url>https://aclanthology.org/2022.emnlp-main.661/</url>
    </location>
    <part>
        <date>2022-12</date>
        <extent unit="page">
            <start>9751</start>
            <end>9757</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Meta-Learning Fast Weight Language Models
%A Clark, Kevin
%A Guu, Kelvin
%A Chang, Ming-Wei
%A Pasupat, Panupong
%A Hinton, Geoffrey
%A Norouzi, Mohammad
%Y Goldberg, Yoav
%Y Kozareva, Zornitsa
%Y Zhang, Yue
%S Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
%D 2022
%8 December
%I Association for Computational Linguistics
%C Abu Dhabi, United Arab Emirates
%F clark-etal-2022-meta
%X Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time, so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.
%R 10.18653/v1/2022.emnlp-main.661
%U https://aclanthology.org/2022.emnlp-main.661/
%U https://doi.org/10.18653/v1/2022.emnlp-main.661
%P 9751-9757

Download as File

Markdown (Informal)

[Meta-Learning Fast Weight Language Models](https://aclanthology.org/2022.emnlp-main.661/) (Clark et al., EMNLP 2022)

Meta-Learning Fast Weight Language Models (Clark et al., EMNLP 2022)

ACL

Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, and Mohammad Norouzi. 2022. Meta-Learning Fast Weight Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9751–9757, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.