How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature

Simeng Sun; Ori Shapira; Ido Dagan; Ani Nenkova

doi:10.18653/v1/W19-2303

How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature

Simeng Sun, Ori Shapira, Ido Dagan, Ani Nenkova

Abstract

We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length during evaluation, we have proposed a new method which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length. A pilot human evaluation has shown that humans prefer short summaries in terms of the verbosity of a summary but overall consider longer summaries to be of higher quality. While human evaluations are more expensive in time and resources, it is clear that normalization, such as the one we proposed for automatic evaluation, will make human evaluations more meaningful.

Anthology ID:: W19-2303
Volume:: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Antoine Bosselut, Asli Celikyilmaz, Marjan Ghazvininejad, Srinivasan Iyer, Urvashi Khandelwal, Hannah Rashkin, Thomas Wolf
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21–29
Language:
URL:: https://aclanthology.org/W19-2303/
DOI:: 10.18653/v1/W19-2303
Bibkey:
Cite (ACL):: Simeng Sun, Ori Shapira, Ido Dagan, and Ani Nenkova. 2019. How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pages 21–29, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature (Sun et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2303.pdf
Data: NEWSROOM

PDF Cite Search Fix data