Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

Michela Lorandi, Anya Belz


Abstract
Rerunning a metric-based evaluation should be more straightforward and results should be closer than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this brief report of our efforts to rerun a metric-based evaluation of a set of multi-aspect controllable text generation (CTG) techniques shows however, such reruns of evaluations do not always produce results that are the same as the original results, and can reveal errors in the orginal work.
Anthology ID:
2024.humeval-1.12
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
125–131
Language:
URL:
https://aclanthology.org/2024.humeval-1.12
DOI:
Bibkey:
Cite (ACL):
Michela Lorandi and Anya Belz. 2024. Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 125–131, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques (Lorandi & Belz, HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.12.pdf