A Closer Look at Data Bias in Neural Extractive Summarization Models

Ming Zhong; Danqing Wang; Pengfei Liu; Xipeng Qiu (邱锡鹏); Xuan-Jing Huang (黄萱菁)

doi:10.18653/v1/D19-5410

A Closer Look at Data Bias in Neural Extractive Summarization Models

Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, Xuanjing Huang

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use ... for bold, ... for italic, ... for underline, <sc>...</sc> for small-caps, <tt>...<tt> for typewriter text, <url>...</url> for URLs, <a href=...> for hyperlinks, and <par/> for paragraph breaks.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model.

Anthology ID:: D19-5410
Volume:: Proceedings of the 2nd Workshop on New Frontiers in Summarization
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, Fei Liu
Venues:: NewSum | WS
SIG:: SIGSUMM
Publisher:: Association for Computational Linguistics
Note:
Pages:: 80–89
Language:
URL:: https://aclanthology.org/D19-5410/
DOI:: 10.18653/v1/D19-5410
Bibkey:
Cite (ACL):: Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2019. A Closer Look at Data Bias in Neural Extractive Summarization Models. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 80–89, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: A Closer Look at Data Bias in Neural Extractive Summarization Models (Zhong et al., NewSum 2019)
Copy Citation:
PDF:: https://aclanthology.org/D19-5410.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{zhong-etal-2019-closer,
    title = "A Closer Look at Data Bias in Neural Extractive Summarization Models",
    author = "Zhong, Ming  and
      Wang, Danqing  and
      Liu, Pengfei  and
      Qiu, Xipeng  and
      Huang, Xuanjing",
    editor = "Wang, Lu  and
      Cheung, Jackie Chi Kit  and
      Carenini, Giuseppe  and
      Liu, Fei",
    booktitle = "Proceedings of the 2nd Workshop on New Frontiers in Summarization",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D19-5410/",
    doi = "10.18653/v1/D19-5410",
    pages = "80--89",
    abstract = "In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="zhong-etal-2019-closer">
    <titleInfo>
        <title>A Closer Look at Data Bias in Neural Extractive Summarization Models</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Ming</namePart>
        <namePart type="family">Zhong</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Danqing</namePart>
        <namePart type="family">Wang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Pengfei</namePart>
        <namePart type="family">Liu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xipeng</namePart>
        <namePart type="family">Qiu</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Xuanjing</namePart>
        <namePart type="family">Huang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 2nd Workshop on New Frontiers in Summarization</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Lu</namePart>
            <namePart type="family">Wang</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Jackie</namePart>
            <namePart type="given">Chi</namePart>
            <namePart type="given">Kit</namePart>
            <namePart type="family">Cheung</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Giuseppe</namePart>
            <namePart type="family">Carenini</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Fei</namePart>
            <namePart type="family">Liu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Hong Kong, China</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model.</abstract>
    <identifier type="citekey">zhong-etal-2019-closer</identifier>
    <identifier type="doi">10.18653/v1/D19-5410</identifier>
    <location>
        <url>https://aclanthology.org/D19-5410/</url>
    </location>
    <part>
        <date>2019-11</date>
        <extent unit="page">
            <start>80</start>
            <end>89</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T A Closer Look at Data Bias in Neural Extractive Summarization Models
%A Zhong, Ming
%A Wang, Danqing
%A Liu, Pengfei
%A Qiu, Xipeng
%A Huang, Xuanjing
%Y Wang, Lu
%Y Cheung, Jackie Chi Kit
%Y Carenini, Giuseppe
%Y Liu, Fei
%S Proceedings of the 2nd Workshop on New Frontiers in Summarization
%D 2019
%8 November
%I Association for Computational Linguistics
%C Hong Kong, China
%F zhong-etal-2019-closer
%X In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model.
%R 10.18653/v1/D19-5410
%U https://aclanthology.org/D19-5410/
%U https://doi.org/10.18653/v1/D19-5410
%P 80-89

Download as File

Markdown (Informal)

[A Closer Look at Data Bias in Neural Extractive Summarization Models](https://aclanthology.org/D19-5410/) (Zhong et al., NewSum 2019)

A Closer Look at Data Bias in Neural Extractive Summarization Models (Zhong et al., NewSum 2019)

ACL

Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2019. A Closer Look at Data Bias in Neural Extractive Summarization Models. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 80–89, Hong Kong, China. Association for Computational Linguistics.