ReproHum #0712-01: Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

Lewis N. Watson; Dimitra Gkatzia

ReproHum #0712-01: Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

Abstract

Reproducibility is a cornerstone of scientific research, ensuring the reliability and generalisability of findings. The ReproNLP Shared Task on Reproducibility of Evaluations in NLP aims to assess the reproducibility of human evaluation studies. This paper presents a reproduction study of the human evaluation experiment presented in “Hierarchical Sketch Induction for Paraphrase Generation” by Hosking et al. (2022). The original study employed a human evaluation on Amazon Mechanical Turk, assessing the quality of paraphrases generated by their proposed model using three criteria: meaning preservation, fluency, and dissimilarity. In our reproduction study, we focus on the meaning preservation criterion and utilise the Prolific platform for participant recruitment, following the ReproNLP challenge’s common approach to reproduction. We discuss the methodology, results, and implications of our reproduction study, comparing them to the original findings. Our findings contribute to the understanding of reproducibility in NLP research and highlights the potential impact of platform changes and evaluation criteria on the reproducibility of human evaluation studies.

Anthology ID:: 2024.humeval-1.19
Volume:: Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:: HumEval | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 221–228
Language:
URL:: https://aclanthology.org/2024.humeval-1.19/
DOI:
Bibkey:
Cite (ACL):: Lewis N. Watson and Dimitra Gkatzia. 2024. ReproHum #0712-01: Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 221–228, Torino, Italia. ELRA and ICCL.
Cite (Informal):: ReproHum #0712-01: Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation (Watson & Gkatzia, HumEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.humeval-1.19.pdf
Optionalsupplementarymaterial:: 2024.humeval-1.19.OptionalSupplementaryMaterial.zip

PDF Cite Search Optionalsupplementarymaterial Fix data