Pre-trained Language Model Based Active Learning for Sentence Matching

Guirong Bai; Shizhu He; Kang Liu; Jun Zhao; Zaiqing Nie

doi:10.18653/v1/2020.coling-main.130

Pre-trained Language Model Based Active Learning for Sentence Matching

Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, Zaiqing Nie

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.

Anthology ID:: 2020.coling-main.130
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 1495–1504
Language:
URL:: https://aclanthology.org/2020.coling-main.130/
DOI:: 10.18653/v1/2020.coling-main.130
Bibkey:
Cite (ACL):: Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, and Zaiqing Nie. 2020. Pre-trained Language Model Based Active Learning for Sentence Matching. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1495–1504, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Pre-trained Language Model Based Active Learning for Sentence Matching (Bai et al., COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.130.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{bai-etal-2020-pre,
    title = "Pre-trained Language Model Based Active Learning for Sentence Matching",
    author = "Bai, Guirong  and
      He, Shizhu  and
      Liu, Kang  and
      Zhao, Jun  and
      Nie, Zaiqing",
    editor = "Scott, Donia  and
      Bel, Nuria  and
      Zong, Chengqing",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2020.coling-main.130/",
    doi = "10.18653/v1/2020.coling-main.130",
    pages = "1495--1504",
    abstract = "Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="bai-etal-2020-pre">
    <titleInfo>
        <title>Pre-trained Language Model Based Active Learning for Sentence Matching</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Guirong</namePart>
        <namePart type="family">Bai</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Shizhu</namePart>
        <namePart type="family">He</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Kang</namePart>
        <namePart type="family">Liu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jun</namePart>
        <namePart type="family">Zhao</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Zaiqing</namePart>
        <namePart type="family">Nie</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 28th International Conference on Computational Linguistics</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Donia</namePart>
            <namePart type="family">Scott</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nuria</namePart>
            <namePart type="family">Bel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Chengqing</namePart>
            <namePart type="family">Zong</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>International Committee on Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Barcelona, Spain (Online)</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.</abstract>
    <identifier type="citekey">bai-etal-2020-pre</identifier>
    <identifier type="doi">10.18653/v1/2020.coling-main.130</identifier>
    <location>
        <url>https://aclanthology.org/2020.coling-main.130/</url>
    </location>
    <part>
        <date>2020-12</date>
        <extent unit="page">
            <start>1495</start>
            <end>1504</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Pre-trained Language Model Based Active Learning for Sentence Matching
%A Bai, Guirong
%A He, Shizhu
%A Liu, Kang
%A Zhao, Jun
%A Nie, Zaiqing
%Y Scott, Donia
%Y Bel, Nuria
%Y Zong, Chengqing
%S Proceedings of the 28th International Conference on Computational Linguistics
%D 2020
%8 December
%I International Committee on Computational Linguistics
%C Barcelona, Spain (Online)
%F bai-etal-2020-pre
%X Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.
%R 10.18653/v1/2020.coling-main.130
%U https://aclanthology.org/2020.coling-main.130/
%U https://doi.org/10.18653/v1/2020.coling-main.130
%P 1495-1504

Download as File

Markdown (Informal)

[Pre-trained Language Model Based Active Learning for Sentence Matching](https://aclanthology.org/2020.coling-main.130/) (Bai et al., COLING 2020)

Pre-trained Language Model Based Active Learning for Sentence Matching (Bai et al., COLING 2020)

ACL

Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, and Zaiqing Nie. 2020. Pre-trained Language Model Based Active Learning for Sentence Matching. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1495–1504, Barcelona, Spain (Online). International Committee on Computational Linguistics.