The 2024 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results

Anya Belz, Craig Thomson


Abstract
This paper presents an overview of, and the results from, the 2024 Shared Task on Reproducibility of Evaluations in NLP (ReproNLP’24), following on from three previous shared tasks on reproducibility of evaluations in NLP, ReproNLP’23, ReproGen’22 and ReproGen’21. This shared task series forms part of an ongoing research programme designed to develop theory and practice of reproducibility assessment in NLP and machine learning, against a backdrop of increasing recognition of the importance of reproducibility across the two fields. We describe the ReproNLP’24 shared task, summarise results from the reproduction studies submitted, and provide additional comparative analysis of their results.
Anthology ID:
2024.humeval-1.9
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
91–105
Language:
URL:
https://aclanthology.org/2024.humeval-1.9
DOI:
Bibkey:
Cite (ACL):
Anya Belz and Craig Thomson. 2024. The 2024 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 91–105, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The 2024 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results (Belz & Thomson, HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.9.pdf