Reproduction and Replication: A Case Study with Automatic Essay Scoring

Eva Huber, Çağrı Çöltekin


Abstract
As in many experimental sciences, reproducibility of experiments has gained ever more attention in the NLP community. This paper presents our reproduction efforts of an earlier study of automatic essay scoring (AES) for determining the proficiency of second language learners in a multilingual setting. We present three sets of experiments with different objectives. First, as prescribed by the LREC 2020 REPROLANG shared task, we rerun the original AES system using the code published by the original authors on the same dataset. Second, we repeat the same experiments on the same data with a different implementation. And third, we test the original system on a different dataset and a different language. Most of our findings are in line with the findings of the original paper. Nevertheless, there are some discrepancies between our results and the results presented in the original paper. We report and discuss these differences in detail. We further go into some points related to confirmation of research findings through reproduction, including the choice of the dataset, reporting and accounting for variability, use of appropriate evaluation metrics, and making code and data available. We also discuss the varying uses and differences between the terms reproduction and replication, and we argue that reproduction, the confirmation of conclusions through independent experiments in varied settings is more valuable than exact replication of the published values.
Anthology ID:
2020.lrec-1.688
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5603–5613
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.688
DOI:
Bibkey:
Cite (ACL):
Eva Huber and Çağrı Çöltekin. 2020. Reproduction and Replication: A Case Study with Automatic Essay Scoring. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5603–5613, Marseille, France. European Language Resources Association.
Cite (Informal):
Reproduction and Replication: A Case Study with Automatic Essay Scoring (Huber & Çöltekin, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.688.pdf