LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization

Fajri Koto, Timothy Baldwin, Jey Han Lau


Abstract
Summaries, keyphrases, and titles are different ways of concisely capturing the content of a document. While most previous work has released the datasets of keyphrases and summarization separately, in this work, we introduce LipKey, the largest news corpus with human-written abstractive summaries, absent keyphrases, and titles. We jointly use the three elements via multi-task training and training as joint structured inputs, in the context of document summarization. We find that including absent keyphrases and titles as additional context to the source document improves transformer-based summarization models.
Anthology ID:
2022.coling-1.303
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3427–3437
Language:
URL:
https://aclanthology.org/2022.coling-1.303
DOI:
Bibkey:
Cite (ACL):
Fajri Koto, Timothy Baldwin, and Jey Han Lau. 2022. LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3427–3437, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization (Koto et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.303.pdf
Data
IndoSumKPTimesLiputan6