ReproHum#0043: Human Evaluation Reproducing Language Model as an Annotator: Exploring Dialogue Summarization on AMI Dataset

Vivian Fresen, Mei-Shin Wu-Urbanek, Steffen Eger


Abstract
This study, conducted as part of the ReproHum project, aimed to replicate and evaluate the experiment presented in “Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization” by Feng et al. (2021). By employing DialoGPT, BART, and PGN models, the study assessed dialogue summarization’s informativeness. Based on the ReproHum project’s baselines, we conducted a human evaluation for the AIMI dataset, aiming to compare the results of the original study with our own experiments. Our objective is to contribute to the research on human evaluation and the reproducibility of the original study’s findings in the field of Natural Language Processing (NLP). Through this endeavor, we seek to enhance understanding and establish reliable benchmarks in human evaluation methodologies within the NLP domain.
Anthology ID:
2024.humeval-1.17
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
199–209
Language:
URL:
https://aclanthology.org/2024.humeval-1.17
DOI:
Bibkey:
Cite (ACL):
Vivian Fresen, Mei-Shin Wu-Urbanek, and Steffen Eger. 2024. ReproHum#0043: Human Evaluation Reproducing Language Model as an Annotator: Exploring Dialogue Summarization on AMI Dataset. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 199–209, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ReproHum#0043: Human Evaluation Reproducing Language Model as an Annotator: Exploring Dialogue Summarization on AMI Dataset (Fresen et al., HumEval-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.humeval-1.17.pdf