SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi; Danqi Chen; Yinhan Liu; Daniel S. Weld; Luke Zettlemoyer; Omer Levy

doi:10.1162/tacl_a_00300

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1

Anthology ID:: 2020.tacl-1.5
Volume:: Transactions of the Association for Computational Linguistics, Volume 8
Month:
Year:: 2020
Address:: Cambridge, MA
Editors:: Mark Johnson, Brian Roark, Ani Nenkova
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 64–77
Language:
URL:: https://aclanthology.org/2020.tacl-1.5/
DOI:: 10.1162/tacl_a_00300
Bibkey:
Cite (ACL):: Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8:64–77.
Cite (Informal):: SpanBERT: Improving Pre-training by Representing and Predicting Spans (Joshi et al., TACL 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.tacl-1.5.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@article{joshi-etal-2020-spanbert,
    title = "{S}pan{BERT}: Improving Pre-training by Representing and Predicting Spans",
    author = "Joshi, Mandar  and
      Chen, Danqi  and
      Liu, Yinhan  and
      Weld, Daniel S.  and
      Zettlemoyer, Luke  and
      Levy, Omer",
    editor = "Johnson, Mark  and
      Roark, Brian  and
      Nenkova, Ani",
    journal = "Transactions of the Association for Computational Linguistics",
    volume = "8",
    year = "2020",
    address = "Cambridge, MA",
    publisher = "MIT Press",
    url = "https://aclanthology.org/2020.tacl-1.5/",
    doi = "10.1162/tacl_a_00300",
    pages = "64--77",
    abstract = "We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6{\%} and 88.7{\%} F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6{\%} F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1"
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="joshi-etal-2020-spanbert">
    <titleInfo>
        <title>SpanBERT: Improving Pre-training by Representing and Predicting Spans</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Mandar</namePart>
        <namePart type="family">Joshi</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Danqi</namePart>
        <namePart type="family">Chen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Yinhan</namePart>
        <namePart type="family">Liu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Daniel</namePart>
        <namePart type="given">S</namePart>
        <namePart type="family">Weld</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Luke</namePart>
        <namePart type="family">Zettlemoyer</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Omer</namePart>
        <namePart type="family">Levy</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <genre authority="bibutilsgt">journal article</genre>
    <relatedItem type="host">
        <titleInfo>
            <title>Transactions of the Association for Computational Linguistics</title>
        </titleInfo>
        <originInfo>
            <issuance>continuing</issuance>
            <publisher>MIT Press</publisher>
            <place>
                <placeTerm type="text">Cambridge, MA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">periodical</genre>
        <genre authority="bibutilsgt">academic journal</genre>
    </relatedItem>
    <abstract>We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1</abstract>
    <identifier type="citekey">joshi-etal-2020-spanbert</identifier>
    <identifier type="doi">10.1162/tacl_a_00300</identifier>
    <location>
        <url>https://aclanthology.org/2020.tacl-1.5/</url>
    </location>
    <part>
        <date>2020</date>
        <detail type="volume"><number>8</number></detail>
        <extent unit="page">
            <start>64</start>
            <end>77</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Journal Article
%T SpanBERT: Improving Pre-training by Representing and Predicting Spans
%A Joshi, Mandar
%A Chen, Danqi
%A Liu, Yinhan
%A Weld, Daniel S.
%A Zettlemoyer, Luke
%A Levy, Omer
%J Transactions of the Association for Computational Linguistics
%D 2020
%V 8
%I MIT Press
%C Cambridge, MA
%F joshi-etal-2020-spanbert
%X We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERTlarge, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0 respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even gains on GLUE.1
%R 10.1162/tacl_a_00300
%U https://aclanthology.org/2020.tacl-1.5/
%U https://doi.org/10.1162/tacl_a_00300
%P 64-77

Download as File

Markdown (Informal)

[SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://aclanthology.org/2020.tacl-1.5/) (Joshi et al., TACL 2020)

SpanBERT: Improving Pre-training by Representing and Predicting Spans (Joshi et al., TACL 2020)

ACL

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics, 8:64–77.