Length Does Matter: Summary Length can Bias Summarization Metrics

Xiaobo Guo, Soroush Vosoughi


Abstract
Establishing the characteristics of an effective summary is a complicated and often subjective endeavor. Consequently, the development of metrics for the summarization task has become a dynamic area of research within natural language processing. In this paper, we reveal that existing summarization metrics exhibit a bias toward the length of generated summaries. Our thorough experiments, conducted on a variety of datasets, metrics, and models, substantiate these findings. The results indicate that most metrics tend to favor longer summaries, even after accounting for other factors. To address this issue, we introduce a Bayesian normalization technique that effectively diminishes this bias. We demonstrate that our approach significantly improves the concordance between human annotators and the majority of metrics in terms of summary coherence.
Anthology ID:
2023.emnlp-main.984
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15869–15879
Language:
URL:
https://aclanthology.org/2023.emnlp-main.984
DOI:
10.18653/v1/2023.emnlp-main.984
Bibkey:
Cite (ACL):
Xiaobo Guo and Soroush Vosoughi. 2023. Length Does Matter: Summary Length can Bias Summarization Metrics. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15869–15879, Singapore. Association for Computational Linguistics.
Cite (Informal):
Length Does Matter: Summary Length can Bias Summarization Metrics (Guo & Vosoughi, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.984.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.984.mp4