BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance R Thomas McCoy author Junghyun Min author Tal Linzen author 2020-11 text Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP Afra Alishahi editor Yonatan Belinkov editor Grzegorz Chrupała editor Dieuwke Hupkes editor Yuval Pinter editor Hassan Sajjad editor Association for Computational Linguistics Online conference publication mccoy-etal-2020-berts 10.18653/v1/2020.blackboxnlp-1.21 https://aclanthology.org/2020.blackboxnlp-1.21/ 2020-11 217 227