Misleading Failures of Partial-input Baselines
Shi
Feng
author
Eric
Wallace
author
Jordan
Boyd-Graber
author
2019-07
text
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Anna
Korhonen
editor
David
Traum
editor
Lluís
Màrquez
editor
Association for Computational Linguistics
Florence, Italy
conference publication
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only model for SNLI or question-only model for VQA). A successful partial-input baseline indicates that the dataset is cheatable. But the converse is not necessarily true: failures of partial-input baselines do not mean the dataset is free of artifacts. We first design artificial datasets to illustrate how the trivial patterns that are only visible in the full input can evade any partial-input baseline. Next, we identify such artifacts in the SNLI dataset—a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of previously-thought “hard” examples. Our work provides a caveat for the use and creation of partial-input baselines for datasets.
feng-etal-2019-misleading
10.18653/v1/P19-1554
https://aclanthology.org/P19-1554
2019-07
5533
5538