FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Esin Durmus; He He; Mona Diab

doi:10.18653/v1/2020.acl-main.454

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Abstract

Neural abstractive summarization models are prone to generate content inconsistent with the source document, i.e. unfaithful. Existing automatic metrics do not capture such mistakes effectively. We tackle the problem of evaluating faithfulness of a generated summary given its source document. We first collected human annotations of faithfulness for outputs from numerous models on two datasets. We find that current models exhibit a trade-off between abstractiveness and faithfulness: outputs with less word overlap with the source document are more likely to be unfaithful. Next, we propose an automatic question answering (QA) based metric for faithfulness, FEQA, which leverages recent advances in reading comprehension. Given question-answer pairs generated from the summary, a QA model extracts answers from the document; non-matched answers indicate unfaithful information in the summary. Among metrics based on word overlap, embedding similarity, and learned language understanding models, our QA-based metric has significantly higher correlation with human faithfulness scores, especially on highly abstractive summaries.

Anthology ID:: 2020.acl-main.454
Volume:: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2020
Address:: Online
Editors:: Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5055–5070
Language:
URL:: https://aclanthology.org/2020.acl-main.454
DOI:: 10.18653/v1/2020.acl-main.454
Bibkey:
Cite (ACL):: Esin Durmus, He He, and Mona Diab. 2020. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5055–5070, Online. Association for Computational Linguistics.
Cite (Informal):: FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization (Durmus et al., ACL 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.acl-main.454.pdf
Video:: http://slideslive.com/38929353
Code: esdurmus/summary-faithfulness + additional community code
Data: SQuAD

PDF Cite Search Code Video