Parallelizable Stack Long Short-Term Memory

Shuoyang Ding; Philipp Koehn

doi:10.18653/v1/W19-1501

Parallelizable Stack Long Short-Term Memory

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.

Anthology ID:: W19-1501
Volume:: Proceedings of the Third Workshop on Structured Prediction for NLP
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Andre Martins, Andreas Vlachos, Zornitsa Kozareva, Sujith Ravi, Gerasimos Lampouras, Vlad Niculae, Julia Kreutzer
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–6
Language:
URL:: https://aclanthology.org/W19-1501/
DOI:: 10.18653/v1/W19-1501
Bibkey:
Cite (ACL):: Shuoyang Ding and Philipp Koehn. 2019. Parallelizable Stack Long Short-Term Memory. In Proceedings of the Third Workshop on Structured Prediction for NLP, pages 1–6, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Parallelizable Stack Long Short-Term Memory (Ding & Koehn, NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-1501.pdf
Presentation:: W19-1501.Presentation.pdf

PDF Cite Search Presentation Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{ding-koehn-2019-parallelizable,
    title = "Parallelizable Stack Long Short-Term Memory",
    author = "Ding, Shuoyang  and
      Koehn, Philipp",
    editor = "Martins, Andre  and
      Vlachos, Andreas  and
      Kozareva, Zornitsa  and
      Ravi, Sujith  and
      Lampouras, Gerasimos  and
      Niculae, Vlad  and
      Kreutzer, Julia",
    booktitle = "Proceedings of the Third Workshop on Structured Prediction for {NLP}",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-1501/",
    doi = "10.18653/v1/W19-1501",
    pages = "1--6",
    abstract = "Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="ding-koehn-2019-parallelizable">
    <titleInfo>
        <title>Parallelizable Stack Long Short-Term Memory</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Shuoyang</namePart>
        <namePart type="family">Ding</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Philipp</namePart>
        <namePart type="family">Koehn</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-06</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Third Workshop on Structured Prediction for NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Andre</namePart>
            <namePart type="family">Martins</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Andreas</namePart>
            <namePart type="family">Vlachos</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zornitsa</namePart>
            <namePart type="family">Kozareva</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sujith</namePart>
            <namePart type="family">Ravi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Gerasimos</namePart>
            <namePart type="family">Lampouras</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Vlad</namePart>
            <namePart type="family">Niculae</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Julia</namePart>
            <namePart type="family">Kreutzer</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Minneapolis, Minnesota</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.</abstract>
    <identifier type="citekey">ding-koehn-2019-parallelizable</identifier>
    <identifier type="doi">10.18653/v1/W19-1501</identifier>
    <location>
        <url>https://aclanthology.org/W19-1501/</url>
    </location>
    <part>
        <date>2019-06</date>
        <extent unit="page">
            <start>1</start>
            <end>6</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Parallelizable Stack Long Short-Term Memory
%A Ding, Shuoyang
%A Koehn, Philipp
%Y Martins, Andre
%Y Vlachos, Andreas
%Y Kozareva, Zornitsa
%Y Ravi, Sujith
%Y Lampouras, Gerasimos
%Y Niculae, Vlad
%Y Kreutzer, Julia
%S Proceedings of the Third Workshop on Structured Prediction for NLP
%D 2019
%8 June
%I Association for Computational Linguistics
%C Minneapolis, Minnesota
%F ding-koehn-2019-parallelizable
%X Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.
%R 10.18653/v1/W19-1501
%U https://aclanthology.org/W19-1501/
%U https://doi.org/10.18653/v1/W19-1501
%P 1-6

Download as File

Markdown (Informal)

[Parallelizable Stack Long Short-Term Memory](https://aclanthology.org/W19-1501/) (Ding & Koehn, NAACL 2019)

Parallelizable Stack Long Short-Term Memory (Ding & Koehn, NAACL 2019)

ACL

Shuoyang Ding and Philipp Koehn. 2019. Parallelizable Stack Long Short-Term Memory. In Proceedings of the Third Workshop on Structured Prediction for NLP, pages 1–6, Minneapolis, Minnesota. Association for Computational Linguistics.