Kyle Seakgwa


2024

pdf bib
ReproHum #0866-04: Another Evaluation of Readers’ Reactions to News Headlines
Zola Mahlaza | Toky Hajatiana Raboanary | Kyle Seakgwa | C. Maria Keet
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

The reproduction of Natural Language Processing (NLP) studies is important in establishing their reliability. Nonetheless, many papers in NLP have never been reproduced. This paper presents a reproduction of Gabriel et al. (2022)’s work to establish the extent to which their findings, pertaining to the utility of large language models (T5 and GPT2) to automatically generate writer’s intents when given headlines to curb misinformation, can be confirmed. Our results show no evidence to support two of their four findings and they partially support the rest of the original findings. Specifically, while we confirmed that all the models are judged to be capable of influencing readers’ trust or distrust, there was a difference in T5’s capability to reduce trust. Our results show that its generations are more likely to have greater influence in reducing trust while Gabriel et al. (2022) found more cases where they had no impact at all. In addition, most of the model generations are considered socially acceptable only if we relax the criteria for determining a majority to mean more than chance rather than the apparent > 70% of the original study. Overall, while they found that “machine-generated MRF implications alongside news headlines to readers can increase their trust in real news while decreasing their trust in misinformation”, we found that they are more likely to decrease trust in both cases vs. having no impact at all.