Promoting Fairness in Classification of Quality of Medical Evidence

Simon Suster, Timothy Baldwin, Karin Verspoor


Abstract
Automatically rating the quality of published research is a critical step in medical evidence synthesis. While several methods have been proposed, their algorithmic fairness has been overlooked even though significant risks may follow when such systems are deployed in biomedical contexts. In this work, we study fairness on two systems along two sensitive attributes, participant sex and medical area. In some cases, we find important inequalities, leading us to apply various debiasing methods. Upon examining an interplay of systems’ predictive performance, fairness, as well as medically critical selective classification capabilities and calibration performance, we find that fairness can sometimes improve through debiasing, but at a cost in other performance measures.
Anthology ID:
2023.bionlp-1.39
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
413–426
Language:
URL:
https://aclanthology.org/2023.bionlp-1.39
DOI:
10.18653/v1/2023.bionlp-1.39
Bibkey:
Cite (ACL):
Simon Suster, Timothy Baldwin, and Karin Verspoor. 2023. Promoting Fairness in Classification of Quality of Medical Evidence. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 413–426, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Promoting Fairness in Classification of Quality of Medical Evidence (Suster et al., BioNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bionlp-1.39.pdf
Video:
 https://aclanthology.org/2023.bionlp-1.39.mp4