From Dataset Recycling to Multi-Property Extraction and Beyond

Tomasz Dwojak; Michał Pietruszka; Łukasz Borchmann; Jakub Chłędowski; Filip Gralinski

doi:10.18653/v1/2020.conll-1.52

From Dataset Recycling to Multi-Property Extraction and Beyond

Tomasz Dwojak, Michał Pietruszka, Łukasz Borchmann, Jakub Chłędowski, Filip Graliński

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor’s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.

Anthology ID:: 2020.conll-1.52
Volume:: Proceedings of the 24th Conference on Computational Natural Language Learning
Month:: November
Year:: 2020
Address:: Online
Editors:: Raquel Fernández, Tal Linzen
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 641–651
Language:
URL:: https://aclanthology.org/2020.conll-1.52/
DOI:: 10.18653/v1/2020.conll-1.52
Bibkey:
Cite (ACL):: Tomasz Dwojak, Michał Pietruszka, Łukasz Borchmann, Jakub Chłędowski, and Filip Graliński. 2020. From Dataset Recycling to Multi-Property Extraction and Beyond. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 641–651, Online. Association for Computational Linguistics.
Cite (Informal):: From Dataset Recycling to Multi-Property Extraction and Beyond (Dwojak et al., CoNLL 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.conll-1.52.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{dwojak-etal-2020-dataset,
    title = "From Dataset Recycling to Multi-Property Extraction and Beyond",
    author = "Dwojak, Tomasz  and
      Pietruszka, Micha{\l}  and
      Borchmann, {\L}ukasz  and
      Ch{\l}{\k{e}}dowski, Jakub  and
      Grali{\'n}ski, Filip",
    editor = "Fern{\'a}ndez, Raquel  and
      Linzen, Tal",
    booktitle = "Proceedings of the 24th Conference on Computational Natural Language Learning",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.conll-1.52/",
    doi = "10.18653/v1/2020.conll-1.52",
    pages = "641--651",
    abstract = "This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor{'}s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="dwojak-etal-2020-dataset">
    <titleInfo>
        <title>From Dataset Recycling to Multi-Property Extraction and Beyond</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Tomasz</namePart>
        <namePart type="family">Dwojak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Michał</namePart>
        <namePart type="family">Pietruszka</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Łukasz</namePart>
        <namePart type="family">Borchmann</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jakub</namePart>
        <namePart type="family">Chłędowski</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Filip</namePart>
        <namePart type="family">Graliński</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2020-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 24th Conference on Computational Natural Language Learning</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Raquel</namePart>
            <namePart type="family">Fernández</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Tal</namePart>
            <namePart type="family">Linzen</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor’s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.</abstract>
    <identifier type="citekey">dwojak-etal-2020-dataset</identifier>
    <identifier type="doi">10.18653/v1/2020.conll-1.52</identifier>
    <location>
        <url>https://aclanthology.org/2020.conll-1.52/</url>
    </location>
    <part>
        <date>2020-11</date>
        <extent unit="page">
            <start>641</start>
            <end>651</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T From Dataset Recycling to Multi-Property Extraction and Beyond
%A Dwojak, Tomasz
%A Pietruszka, Michał
%A Borchmann, Łukasz
%A Chłędowski, Jakub
%A Graliński, Filip
%Y Fernández, Raquel
%Y Linzen, Tal
%S Proceedings of the 24th Conference on Computational Natural Language Learning
%D 2020
%8 November
%I Association for Computational Linguistics
%C Online
%F dwojak-etal-2020-dataset
%X This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor’s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.
%R 10.18653/v1/2020.conll-1.52
%U https://aclanthology.org/2020.conll-1.52/
%U https://doi.org/10.18653/v1/2020.conll-1.52
%P 641-651

Download as File

Markdown (Informal)

[From Dataset Recycling to Multi-Property Extraction and Beyond](https://aclanthology.org/2020.conll-1.52/) (Dwojak et al., CoNLL 2020)

From Dataset Recycling to Multi-Property Extraction and Beyond (Dwojak et al., CoNLL 2020)

ACL

Tomasz Dwojak, Michał Pietruszka, Łukasz Borchmann, Jakub Chłędowski, and Filip Graliński. 2020. From Dataset Recycling to Multi-Property Extraction and Beyond. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 641–651, Online. Association for Computational Linguistics.