Quantifying Appropriateness of Summarization Data for Curriculum Learning

Ryuji Kano, Takumi Takahashi, Toru Nishino, Motoki Taniguchi, Tomoki Taniguchi, Tomoko Ohkuma


Abstract
Much research has reported the training data of summarization models are noisy; summaries often do not reflect what is written in the source texts. We propose an effective method of curriculum learning to train summarization models from such noisy data. Curriculum learning is used to train sequence-to-sequence models with noisy data. In translation tasks, previous research quantified noise of the training data using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, we propose a model that can quantify noise from a single noisy corpus. We conduct experiments on three summarization models; one pretrained model and two non-pretrained models, and verify our method improves the performance. Furthermore, we analyze how different curricula affect the performance of pretrained and non-pretrained summarization models. Our result on human evaluation also shows our method improves the performance of summarization models.
Anthology ID:
2021.eacl-main.119
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1395–1405
Language:
URL:
https://aclanthology.org/2021.eacl-main.119
DOI:
10.18653/v1/2021.eacl-main.119
Bibkey:
Cite (ACL):
Ryuji Kano, Takumi Takahashi, Toru Nishino, Motoki Taniguchi, Tomoki Taniguchi, and Tomoko Ohkuma. 2021. Quantifying Appropriateness of Summarization Data for Curriculum Learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1395–1405, Online. Association for Computational Linguistics.
Cite (Informal):
Quantifying Appropriateness of Summarization Data for Curriculum Learning (Kano et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.119.pdf
Data
Reddit TIFU