Common Flaws in Running Human Evaluation Experiments in NLP Craig Thomson author Ehud Reiter author Anya Belz author 2024-06 text journal article Computational Linguistics continuing MIT Press Cambridge, MA periodical academic journal thomson-etal-2024-common 10.1162/coli_a_00508 https://aclanthology.org/2024.cl-2.9/ 2024-06 50 2 795 805