Gold Corpus for Telegraphic Summarization

Chanakya Malireddy, Srivenkata N M Somisetty, Manish Shrivastava


Abstract
Most extractive summarization techniques operate by ranking all the source sentences and then select the top ranked sentences as the summary. Such methods are known to produce good summaries, especially when applied to news articles and scientific texts. However, they don’t fare so well when applied to texts such as fictional narratives, which don’t have a single central or recurrent theme. This is because usually the information or plot of the story is spread across several sentences. In this paper, we discuss a different summarization technique called Telegraphic Summarization. Here, we don’t select whole sentences, rather pick short segments of text spread across sentences, as the summary. We have tailored a set of guidelines to create such summaries and, using the same, annotate a gold corpus of 200 English short stories.
Anthology ID:
W18-3810
Volume:
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Peter Machonis, Anabela Barreiro, Kristina Kocijan, Max Silberztein
Venue:
LR4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–77
Language:
URL:
https://aclanthology.org/W18-3810
DOI:
Bibkey:
Cite (ACL):
Chanakya Malireddy, Srivenkata N M Somisetty, and Manish Shrivastava. 2018. Gold Corpus for Telegraphic Summarization. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 71–77, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Gold Corpus for Telegraphic Summarization (Malireddy et al., LR4NLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3810.pdf
Code
 m-chanakya/shortstories
Data
Telegraphic Summaries