Revisiting Summarization Evaluation for Scientific Articles

Arman Cohan; Nazli Goharian

Revisiting Summarization Evaluation for Scientific Articles

Abstract

Evaluation of text summarization approaches have been mostly based on metrics that measure similarities of system generated summaries with a set of human written gold-standard summaries. The most widely used metric in summarization evaluation has been the ROUGE family. ROUGE solely relies on lexical overlaps between the terms and phrases in the sentences; therefore, in cases of terminology variations and paraphrasing, ROUGE is not as effective. Scientific article summarization is one such case that is different from general domain summarization (e.g. newswire data). We provide an extensive analysis of ROUGE’s effectiveness as an evaluation metric for scientific summarization; we show that, contrary to the common belief, ROUGE is not much reliable in evaluating scientific summaries. We furthermore show how different variants of ROUGE result in very different correlations with the manual Pyramid scores. Finally, we propose an alternative metric for summarization evaluation which is based on the content relevance between a system generated summary and the corresponding human written summaries. We call our metric SERA (Summarization Evaluation by Relevance Analysis). Unlike ROUGE, SERA consistently achieves high correlations with manual scores which shows its effectiveness in evaluation of scientific article summarization.

Anthology ID:: L16-1130
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 806–813
Language:
URL:: https://aclanthology.org/L16-1130/
DOI:
Bibkey:
Cite (ACL):: Arman Cohan and Nazli Goharian. 2016. Revisiting Summarization Evaluation for Scientific Articles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 806–813, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: Revisiting Summarization Evaluation for Scientific Articles (Cohan & Goharian, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1130.pdf

PDF Cite Search Fix data