We Need To Talk About Random Splits

Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, Katja Filippova


Abstract
(CITATION) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits.
Anthology ID:
2021.eacl-main.156
Original:
2021.eacl-main.156v1
Version 2:
2021.eacl-main.156v2
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Editors:
Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1823–1832
Language:
URL:
https://aclanthology.org/2021.eacl-main.156
DOI:
10.18653/v1/2021.eacl-main.156
Award:
 Honorable Mention for Best Short Paper
Bibkey:
Cite (ACL):
Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, and Katja Filippova. 2021. We Need To Talk About Random Splits. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1823–1832, Online. Association for Computational Linguistics.
Cite (Informal):
We Need To Talk About Random Splits (Søgaard et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.156.pdf
Code
 google-research/google-research
Data
Penn Treebank