Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations

Russell Richie, Sachin Grover, Fuchiang (Rich) Tsui


Abstract
It is commonly claimed that inter-annotator agreement (IAA) is the ceiling of machine learning (ML) performance, i.e., that the agreement between an ML system’s predictions and an annotator can not be higher than the agreement between two annotators. Although Boguslav & Cohen (2017) showed that this claim is falsified by many real-world ML systems, the claim has persisted. As a complement to this real-world evidence, we conducted a comprehensive set of simulations, and show that an ML model can beat IAA even if (and especially if) annotators are noisy and differ in their underlying classification functions, as long as the ML model is reasonably well-specified. Although the latter condition has long been elusive, leading ML models to underperform IAA, we anticipate that this condition will be increasingly met in the era of big data and deep learning. Our work has implications for (1) maximizing the value of machine learning, (2) adherence to ethical standards in computing, and (3) economical use of annotated resources, which is paramount in settings where annotation is especially expensive, like biomedical natural language processing.
Anthology ID:
2022.bionlp-1.26
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
275–284
Language:
URL:
https://aclanthology.org/2022.bionlp-1.26
DOI:
10.18653/v1/2022.bionlp-1.26
Bibkey:
Cite (ACL):
Russell Richie, Sachin Grover, and Fuchiang (Rich) Tsui. 2022. Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 275–284, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations (Richie et al., BioNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bionlp-1.26.pdf
Video:
 https://aclanthology.org/2022.bionlp-1.26.mp4