Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

Ahmed Magooda, Diane Litman


Abstract
This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.
Anthology ID:
2021.findings-emnlp.175
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2043–2052
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.175
DOI:
10.18653/v1/2021.findings-emnlp.175
Bibkey:
Cite (ACL):
Ahmed Magooda and Diane Litman. 2021. Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2043–2052, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization (Magooda & Litman, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.175.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.175.mp4