Syntax-driven Data Augmentation for Named Entity Recognition

Arie Sutiono; Gus Hahn-Powell

Syntax-driven Data Augmentation for Named Entity Recognition

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.

Anthology ID:: 2022.pandl-1.7
Volume:: Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Laura Chiticariu, Yoav Goldberg, Gus Hahn-Powell, Clayton T. Morrison, Aakanksha Naik, Rebecca Sharp, Mihai Surdeanu, Marco Valenzuela-Escárcega, Enrique Noriega-Atala
Venue:: PANDL
SIG:
Publisher:: International Conference on Computational Linguistics
Note:
Pages:: 56–60
Language:
URL:: https://aclanthology.org/2022.pandl-1.7/
DOI:
Bibkey:
Cite (ACL):: Arie Sutiono and Gus Hahn-Powell. 2022. Syntax-driven Data Augmentation for Named Entity Recognition. In Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, pages 56–60, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):: Syntax-driven Data Augmentation for Named Entity Recognition (Sutiono & Hahn-Powell, PANDL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.pandl-1.7.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{sutiono-hahn-powell-2022-syntax,
    title = "Syntax-driven Data Augmentation for Named Entity Recognition",
    author = "Sutiono, Arie  and
      Hahn-Powell, Gus",
    editor = "Chiticariu, Laura  and
      Goldberg, Yoav  and
      Hahn-Powell, Gus  and
      Morrison, Clayton T.  and
      Naik, Aakanksha  and
      Sharp, Rebecca  and
      Surdeanu, Mihai  and
      Valenzuela-Esc{\'a}rcega, Marco  and
      Noriega-Atala, Enrique",
    booktitle = "Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Conference on Computational Linguistics",
    url = "https://aclanthology.org/2022.pandl-1.7/",
    pages = "56--60",
    abstract = "In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="sutiono-hahn-powell-2022-syntax">
    <titleInfo>
        <title>Syntax-driven Data Augmentation for Named Entity Recognition</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Arie</namePart>
        <namePart type="family">Sutiono</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Gus</namePart>
        <namePart type="family">Hahn-Powell</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2022-10</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Laura</namePart>
            <namePart type="family">Chiticariu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yoav</namePart>
            <namePart type="family">Goldberg</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Gus</namePart>
            <namePart type="family">Hahn-Powell</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Clayton</namePart>
            <namePart type="given">T</namePart>
            <namePart type="family">Morrison</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Aakanksha</namePart>
            <namePart type="family">Naik</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rebecca</namePart>
            <namePart type="family">Sharp</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mihai</namePart>
            <namePart type="family">Surdeanu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marco</namePart>
            <namePart type="family">Valenzuela-Escárcega</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Enrique</namePart>
            <namePart type="family">Noriega-Atala</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>International Conference on Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Gyeongju, Republic of Korea</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.</abstract>
    <identifier type="citekey">sutiono-hahn-powell-2022-syntax</identifier>
    <location>
        <url>https://aclanthology.org/2022.pandl-1.7/</url>
    </location>
    <part>
        <date>2022-10</date>
        <extent unit="page">
            <start>56</start>
            <end>60</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Syntax-driven Data Augmentation for Named Entity Recognition
%A Sutiono, Arie
%A Hahn-Powell, Gus
%Y Chiticariu, Laura
%Y Goldberg, Yoav
%Y Hahn-Powell, Gus
%Y Morrison, Clayton T.
%Y Naik, Aakanksha
%Y Sharp, Rebecca
%Y Surdeanu, Mihai
%Y Valenzuela-Escárcega, Marco
%Y Noriega-Atala, Enrique
%S Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
%D 2022
%8 October
%I International Conference on Computational Linguistics
%C Gyeongju, Republic of Korea
%F sutiono-hahn-powell-2022-syntax
%X In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce semantically incongruent and ungrammatical examples. In this work, we compare simple masked language model replacement and an augmentation method using constituency tree mutations to improve the performance of named entity recognition in low-resource settings with the aim of preserving linguistic cohesion of the augmented sentences.
%U https://aclanthology.org/2022.pandl-1.7/
%P 56-60

Download as File

Markdown (Informal)

[Syntax-driven Data Augmentation for Named Entity Recognition](https://aclanthology.org/2022.pandl-1.7/) (Sutiono & Hahn-Powell, PANDL 2022)

Syntax-driven Data Augmentation for Named Entity Recognition (Sutiono & Hahn-Powell, PANDL 2022)

ACL

Arie Sutiono and Gus Hahn-Powell. 2022. Syntax-driven Data Augmentation for Named Entity Recognition. In Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, pages 56–60, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.