Detection and Mitigation of the Negative Impact of Dataset Extractivity on Abstractive Summarization

Yubin Ge, Sullam Jeoung, Ly Dinh, Jana Diesner


Abstract
In text summarization, extractivity is defined as a measurement of the degree of overlap between a source document and its summary. Previous research has shown that the extractivity level of training data can influence both output extractivity and the amount of factual information (i.e. faithfulness) in outputs for abstractive summarization. However, it remains unclear if and how extractivity impacts the performance of abstractive models. In this work, we investigate the relationship between dataset extractivity and model performance by comparing the performance of trained models under different degrees of extractivity. We find that while low levels of extractivity can improve performance, as extractivity increases, performance is negatively impacted. Furthermore, through an analysis of the model’s copy continuity of content, we discover that higher extractivity leads to a greater tendency for the model to copy text continuously from the source document rather than identifying and summarizing important content that should be covered in the target summary. To address these issues, we propose a simple and effective method to design copy labels for fixing the model’s copying behaviors and train the model with a copy mechanism. The experimental results illustrate the effectiveness of our strategy in alleviating the negative impact on model performance resulting from high dataset extractivity, and that our method outperforms several competitive baselines.
Anthology ID:
2023.findings-acl.877
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13963–13976
Language:
URL:
https://aclanthology.org/2023.findings-acl.877
DOI:
10.18653/v1/2023.findings-acl.877
Bibkey:
Cite (ACL):
Yubin Ge, Sullam Jeoung, Ly Dinh, and Jana Diesner. 2023. Detection and Mitigation of the Negative Impact of Dataset Extractivity on Abstractive Summarization. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13963–13976, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Detection and Mitigation of the Negative Impact of Dataset Extractivity on Abstractive Summarization (Ge et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.877.pdf