A Closer Look at Data Bias in Neural Extractive Summarization Models

Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, Xuanjing Huang


Abstract
In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing state-of-the-art model.
Anthology ID:
D19-5410
Volume:
Proceedings of the 2nd Workshop on New Frontiers in Summarization
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, Fei Liu
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
80–89
Language:
URL:
https://aclanthology.org/D19-5410/
DOI:
10.18653/v1/D19-5410
Bibkey:
Cite (ACL):
Ming Zhong, Danqing Wang, Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2019. A Closer Look at Data Bias in Neural Extractive Summarization Models. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 80–89, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
A Closer Look at Data Bias in Neural Extractive Summarization Models (Zhong et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5410.pdf
Data
CNN/Daily MailNEWSROOMNew York Times Annotated Corpus