Linguistic Compression in Single-Sentence Human-Written Summaries

Fangcong Yin, Marten van Schijndel


Abstract
Summarizing texts involves significant cognitive efforts to compress information. While advances in automatic summarization systems have drawn attention from the NLP and linguistics communities to this topic, there is a lack of computational studies of linguistic patterns in human-written summaries. This work presents a large-scale corpus study of human-written single-sentence summaries. We analyzed the linguistic compression patterns from source documents to summaries at different granularities, and we found that summaries are generally written with morphological expansion, increased lexical diversity, and similar positional arrangements of specific words compared to the source across different genres. We also studied how linguistic compressions of different factors affect reader judgments of quality through a human study, with the results showing that the use of morphological and syntactic changes by summary writers matches reader preferences while lexical diversity and word specificity preferences are not aligned between summary writers and readers.
Anthology ID:
2023.findings-emnlp.532
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7922–7935
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.532
DOI:
10.18653/v1/2023.findings-emnlp.532
Bibkey:
Cite (ACL):
Fangcong Yin and Marten van Schijndel. 2023. Linguistic Compression in Single-Sentence Human-Written Summaries. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7922–7935, Singapore. Association for Computational Linguistics.
Cite (Informal):
Linguistic Compression in Single-Sentence Human-Written Summaries (Yin & van Schijndel, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.532.pdf