ReproHum #0927-3: Reproducing The Human Evaluation Of The DExperts Controlled Text Generation Method

Javier González Corbelle, Ainhoa Vivel Couso, Jose Maria Alonso-Moral, Alberto Bugarín-Diz


Abstract
This paper presents a reproduction study aimed at reproducing and validating a human NLP evaluation performed for the DExperts text generation method. The original study introduces DExperts, a controlled text generation method, evaluated using non-toxic prompts from the RealToxicityPrompts dataset. Our reproduction study aims to reproduce the human evaluation of the continuations generated by DExperts in comparison with four baseline methods, in terms of toxicity, topicality, and fluency. We first describe the agreed approach for reproduction within the ReproHum project and detail the configuration of the original evaluation, including necessary adaptations for reproduction. Then, we make a comparison of our reproduction results with those reported in the reproduced paper. Interestingly, we observe how the human evaluators in our experiment appreciate higher quality in the texts generated by DExperts in terms of less toxicity and better fluency. All in all, new scores are higher, also for the baseline methods. This study contributes to ongoing efforts in ensuring the reproducibility and reliability of findings in NLP evaluation and emphasizes the critical role of robust methodologies in advancing the field.
Anthology ID:
2024.humeval-1.15
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
153–162
Language:
URL:
https://aclanthology.org/2024.humeval-1.15
DOI:
Bibkey:
Cite (ACL):
Javier González Corbelle, Ainhoa Vivel Couso, Jose Maria Alonso-Moral, and Alberto Bugarín-Diz. 2024. ReproHum #0927-3: Reproducing The Human Evaluation Of The DExperts Controlled Text Generation Method. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 153–162, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ReproHum #0927-3: Reproducing The Human Evaluation Of The DExperts Controlled Text Generation Method (González Corbelle et al., HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.15.pdf
Optional supplementary material:
 2024.humeval-1.15.OptionalSupplementaryMaterial.zip