Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Maxim Kodryan; Artem Grachev; Dmitry Ignatov; Dmitry Vetrov

doi:10.18653/v1/W19-4306

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Maxim Kodryan, Artem Grachev, Dmitry Ignatov, Dmitry Vetrov

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.

Anthology ID:: W19-4306
Volume:: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
Venue:: RepL4NLP
SIG:: SIGREP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40–48
Language:
URL:: https://aclanthology.org/W19-4306/
DOI:: 10.18653/v1/W19-4306
Bibkey:
Cite (ACL):: Maxim Kodryan, Artem Grachev, Dmitry Ignatov, and Dmitry Vetrov. 2019. Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 40–48, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks (Kodryan et al., RepL4NLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-4306.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{kodryan-etal-2019-efficient,
    title = "Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks",
    author = "Kodryan, Maxim  and
      Grachev, Artem  and
      Ignatov, Dmitry  and
      Vetrov, Dmitry",
    editor = "Augenstein, Isabelle  and
      Gella, Spandana  and
      Ruder, Sebastian  and
      Kann, Katharina  and
      Can, Burcu  and
      Welbl, Johannes  and
      Conneau, Alexis  and
      Ren, Xiang  and
      Rei, Marek",
    booktitle = "Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-4306/",
    doi = "10.18653/v1/W19-4306",
    pages = "40--48",
    abstract = "Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90{\%} of the weights in both encoder and decoder layers can be removed with a minimal quality loss."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="kodryan-etal-2019-efficient">
    <titleInfo>
        <title>Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Maxim</namePart>
        <namePart type="family">Kodryan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Artem</namePart>
        <namePart type="family">Grachev</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dmitry</namePart>
        <namePart type="family">Ignatov</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dmitry</namePart>
        <namePart type="family">Vetrov</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Isabelle</namePart>
            <namePart type="family">Augenstein</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Spandana</namePart>
            <namePart type="family">Gella</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sebastian</namePart>
            <namePart type="family">Ruder</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Katharina</namePart>
            <namePart type="family">Kann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Burcu</namePart>
            <namePart type="family">Can</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Johannes</namePart>
            <namePart type="family">Welbl</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Alexis</namePart>
            <namePart type="family">Conneau</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Xiang</namePart>
            <namePart type="family">Ren</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marek</namePart>
            <namePart type="family">Rei</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.</abstract>
    <identifier type="citekey">kodryan-etal-2019-efficient</identifier>
    <identifier type="doi">10.18653/v1/W19-4306</identifier>
    <location>
        <url>https://aclanthology.org/W19-4306/</url>
    </location>
    <part>
        <date>2019-08</date>
        <extent unit="page">
            <start>40</start>
            <end>48</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks
%A Kodryan, Maxim
%A Grachev, Artem
%A Ignatov, Dmitry
%A Vetrov, Dmitry
%Y Augenstein, Isabelle
%Y Gella, Spandana
%Y Ruder, Sebastian
%Y Kann, Katharina
%Y Can, Burcu
%Y Welbl, Johannes
%Y Conneau, Alexis
%Y Ren, Xiang
%Y Rei, Marek
%S Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
%D 2019
%8 August
%I Association for Computational Linguistics
%C Florence, Italy
%F kodryan-etal-2019-efficient
%X Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.
%R 10.18653/v1/W19-4306
%U https://aclanthology.org/W19-4306/
%U https://doi.org/10.18653/v1/W19-4306
%P 40-48

Download as File

Markdown (Informal)

[Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks](https://aclanthology.org/W19-4306/) (Kodryan et al., RepL4NLP 2019)

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks (Kodryan et al., RepL4NLP 2019)

ACL

Maxim Kodryan, Artem Grachev, Dmitry Ignatov, and Dmitry Vetrov. 2019. Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 40–48, Florence, Italy. Association for Computational Linguistics.