Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Ori Shapira; David Gabay; Yang Gao (扬 高); Hadar Ronen; Ramakanth Pasunuru; Mohit Bansal; Yael Amsterdamer; Ido Dagan

doi:10.18653/v1/N19-1072

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Abstract

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

Anthology ID:: N19-1072
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Jill Burstein, Christy Doran, Thamar Solorio
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 682–687
Language:
URL:: https://aclanthology.org/N19-1072/
DOI:: 10.18653/v1/N19-1072
Bibkey:
Cite (ACL):: Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, and Ido Dagan. 2019. Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 682–687, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation (Shapira et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/N19-1072.pdf
Video:: https://aclanthology.org/N19-1072.mp4

PDF Cite Search Video Fix data