Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization

Yiran Chen, Pengfei Liu, Xipeng Qiu


Abstract
With the continuous upgrading of the summarization systems driven by deep neural networks, researchers have higher requirements on the quality of the generated summaries, which should be not only fluent and informative but also factually correct. As a result, the field of factual evaluation has developed rapidly recently. Despite its initial progress in evaluating generated summaries, the meta-evaluation methodologies of factuality metrics are limited in their opacity, leading to the insufficient understanding of factuality metrics’ relative advantages and their applicability. In this paper, we present an adversarial meta-evaluation methodology that allows us to (i) diagnose the fine-grained strengths and weaknesses of 6 existing top-performing metrics over 24 diagnostic test datasets, (ii) search for directions for further improvement by data augmentation. Our observations from this work motivate us to propose several calls for future research. We make all codes, diagnostic test datasets, trained factuality models available: https://github.com/zide05/AdvFact.
Anthology ID:
2021.findings-emnlp.179
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2082–2095
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.179
DOI:
10.18653/v1/2021.findings-emnlp.179
Bibkey:
Cite (ACL):
Yiran Chen, Pengfei Liu, and Xipeng Qiu. 2021. Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2082–2095, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization (Chen et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.179.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.179.mp4
Code
 zide05/advfact
Data
CNN/Daily MailMultiNLI