The Accuracy Evaluation Shared Task as a Retrospective Reproduction Study

Craig Thomson, Ehud Reiter


Abstract
We investigate the data collected for the Accuracy Evaluation Shared Task as a retrospective reproduction study. The shared task was based upon errors found by human annotation of computer generated summaries of basketball games. Annotation was performed in three separate stages, with texts taken from the same three systems and checked for errors by the same three annotators. We show that the mean count of errors was consistent at the highest level for each experiment, with increased variance when looking at per-system and/or per-error- type breakdowns.
Anthology ID:
2022.inlg-genchal.11
Volume:
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
Month:
July
Year:
2022
Address:
Waterville, Maine, USA and virtual meeting
Editors:
Samira Shaikh, Thiago Ferreira, Amanda Stent
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–79
Language:
URL:
https://aclanthology.org/2022.inlg-genchal.11
DOI:
Bibkey:
Cite (ACL):
Craig Thomson and Ehud Reiter. 2022. The Accuracy Evaluation Shared Task as a Retrospective Reproduction Study. In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pages 71–79, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
The Accuracy Evaluation Shared Task as a Retrospective Reproduction Study (Thomson & Reiter, INLG 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.inlg-genchal.11.pdf