Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation

Aleksandar Savkov, Francesco Moramarco, Alex Papadopoulos Korfiatis, Mark Perera, Anya Belz, Ehud Reiter


Abstract
Evaluating automatically generated text is generally hard due to the inherently subjective nature of many aspects of the output quality. This difficulty is compounded in automatic consultation note generation by differing opinions between medical experts both about which patient statements should be included in generated notes and about their respective importance in arriving at a diagnosis. Previous real-world evaluations of note-generation systems saw substantial disagreement between expert evaluators. In this paper we propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists, which are created in a preliminary step and then used as a common point of reference during quality assessment. We observed good levels of inter-annotator agreement in a first evaluation study using the protocol; further, using Consultation Checklists produced in the study as reference for automatic metrics such as ROUGE or BERTScore improves their correlation with human judgements compared to using the original human note.
Anthology ID:
2022.emnlp-industry.10
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
111–120
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.10
DOI:
10.18653/v1/2022.emnlp-industry.10
Bibkey:
Cite (ACL):
Aleksandar Savkov, Francesco Moramarco, Alex Papadopoulos Korfiatis, Mark Perera, Anya Belz, and Ehud Reiter. 2022. Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 111–120, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation (Savkov et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.10.pdf