A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019)

Anouck Braggaar; Frédéric Tomas; Peter Blomsma; Saar Hommes; Nadine Braun; Emiel Van Miltenburg; Chris van der Lee; Martijn Goudbeek; Emiel Krahmer

A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019)

Anouck Braggaar, Frédéric Tomas, Peter Blomsma, Saar Hommes, Nadine Braun, Emiel van Miltenburg, Chris van der Lee, Martijn Goudbeek, Emiel Krahmer

Abstract

In this paper, we describe our reproduction ef- fort of the paper: Towards Best Experiment Design for Evaluating Dialogue System Output by Santhanam and Shaikh (2019) for the 2022 ReproGen shared task. We aim to produce the same results, using different human evaluators, and a different implementation of the automatic metrics used in the original paper. Although overall the study posed some challenges to re- produce (e.g. difficulties with reproduction of automatic metrics and statistics), in the end we did find that the results generally replicate the findings of Santhanam and Shaikh (2019) and seem to follow similar trends.

Anthology ID:: 2022.inlg-genchal.13
Volume:: Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
Month:: July
Year:: 2022
Address:: Waterville, Maine, USA and virtual meeting
Editors:: Samira Shaikh, Thiago Ferreira, Amanda Stent
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 86–93
Language:
URL:: https://aclanthology.org/2022.inlg-genchal.13/
DOI:
Bibkey:
Cite (ACL):: Anouck Braggaar, Frédéric Tomas, Peter Blomsma, Saar Hommes, Nadine Braun, Emiel van Miltenburg, Chris van der Lee, Martijn Goudbeek, and Emiel Krahmer. 2022. A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019). In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pages 86–93, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019) (Braggaar et al., INLG 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.inlg-genchal.13.pdf

PDF Cite Search Fix data