Misleading Failures of Partial-input Baselines

Shi Feng, Eric Wallace, Jordan Boyd-Graber


Abstract
Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only model for SNLI or question-only model for VQA). A successful partial-input baseline indicates that the dataset is cheatable. But the converse is not necessarily true: failures of partial-input baselines do not mean the dataset is free of artifacts. We first design artificial datasets to illustrate how the trivial patterns that are only visible in the full input can evade any partial-input baseline. Next, we identify such artifacts in the SNLI dataset—a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of previously-thought “hard” examples. Our work provides a caveat for the use and creation of partial-input baselines for datasets.
Anthology ID:
P19-1554
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5533–5538
Language:
URL:
https://aclanthology.org/P19-1554
DOI:
10.18653/v1/P19-1554
Bibkey:
Cite (ACL):
Shi Feng, Eric Wallace, and Jordan Boyd-Graber. 2019. Misleading Failures of Partial-input Baselines. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5533–5538, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Misleading Failures of Partial-input Baselines (Feng et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1554.pdf
Data
SNLISWAG