Dataset Mention Extraction and Classification

Animesh Prasad; Chenglei Si; Min-Yen Kan

doi:10.18653/v1/W19-2604

Dataset Mention Extraction and Classification

Animesh Prasad, Chenglei Si, Min-Yen Kan

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative’s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques.

Anthology ID:: W19-2604
Volume:: Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Vivi Nastase, Benjamin Roth, Laura Dietz, Andrew McCallum
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31–36
Language:
URL:: https://aclanthology.org/W19-2604/
DOI:: 10.18653/v1/W19-2604
Bibkey:
Cite (ACL):: Animesh Prasad, Chenglei Si, and Min-Yen Kan. 2019. Dataset Mention Extraction and Classification. In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications, pages 31–36, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Dataset Mention Extraction and Classification (Prasad et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2604.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{prasad-etal-2019-dataset,
    title = "Dataset Mention Extraction and Classification",
    author = "Prasad, Animesh  and
      Si, Chenglei  and
      Kan, Min-Yen",
    editor = "Nastase, Vivi  and
      Roth, Benjamin  and
      Dietz, Laura  and
      McCallum, Andrew",
    booktitle = "Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-2604/",
    doi = "10.18653/v1/W19-2604",
    pages = "31--36",
    abstract = "Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative{'}s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="prasad-etal-2019-dataset">
    <titleInfo>
        <title>Dataset Mention Extraction and Classification</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Animesh</namePart>
        <namePart type="family">Prasad</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Chenglei</namePart>
        <namePart type="family">Si</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Min-Yen</namePart>
        <namePart type="family">Kan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-06</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Vivi</namePart>
            <namePart type="family">Nastase</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Benjamin</namePart>
            <namePart type="family">Roth</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Laura</namePart>
            <namePart type="family">Dietz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Andrew</namePart>
            <namePart type="family">McCallum</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Minneapolis, Minnesota</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative’s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques.</abstract>
    <identifier type="citekey">prasad-etal-2019-dataset</identifier>
    <identifier type="doi">10.18653/v1/W19-2604</identifier>
    <location>
        <url>https://aclanthology.org/W19-2604/</url>
    </location>
    <part>
        <date>2019-06</date>
        <extent unit="page">
            <start>31</start>
            <end>36</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Dataset Mention Extraction and Classification
%A Prasad, Animesh
%A Si, Chenglei
%A Kan, Min-Yen
%Y Nastase, Vivi
%Y Roth, Benjamin
%Y Dietz, Laura
%Y McCallum, Andrew
%S Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
%D 2019
%8 June
%I Association for Computational Linguistics
%C Minneapolis, Minnesota
%F prasad-etal-2019-dataset
%X Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative’s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches - exploring the synergy between the tasks - and report the issues with such techniques.
%R 10.18653/v1/W19-2604
%U https://aclanthology.org/W19-2604/
%U https://doi.org/10.18653/v1/W19-2604
%P 31-36

Download as File

Markdown (Informal)

[Dataset Mention Extraction and Classification](https://aclanthology.org/W19-2604/) (Prasad et al., NAACL 2019)

Dataset Mention Extraction and Classification (Prasad et al., NAACL 2019)

ACL

Animesh Prasad, Chenglei Si, and Min-Yen Kan. 2019. Dataset Mention Extraction and Classification. In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications, pages 31–36, Minneapolis, Minnesota. Association for Computational Linguistics.