Question Answering as an Automatic Evaluation Metric for News Article Summarization

Matan Eyal, Tal Baumel, Michael Elhadad


Abstract
Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results.
Anthology ID:
N19-1395
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3938–3948
Language:
URL:
https://aclanthology.org/N19-1395
DOI:
10.18653/v1/N19-1395
Bibkey:
Cite (ACL):
Matan Eyal, Tal Baumel, and Michael Elhadad. 2019. Question Answering as an Automatic Evaluation Metric for News Article Summarization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3938–3948, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Question Answering as an Automatic Evaluation Metric for News Article Summarization (Eyal et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1395.pdf
Code
 mataney/APES-optimizer +  additional community code
Data
CNN/Daily Mail