ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings

Oleg Vasilyev, John Bohannon


Abstract
We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is based on finding and counting all probable potential inconsistencies of the summary with respect to the source document. The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.
Anthology ID:
2021.eval4nlp-1.10
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–103
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.10
DOI:
10.18653/v1/2021.eval4nlp-1.10
Bibkey:
Cite (ACL):
Oleg Vasilyev and John Bohannon. 2021. ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 94–103, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings (Vasilyev & Bohannon, Eval4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.10.pdf
Video:
 https://aclanthology.org/2021.eval4nlp-1.10.mp4