%0 Conference Proceedings %T Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics %A Bhargava, Prajjwal %A Drozd, Aleksandr %A Rogers, Anna %Y Sedoc, João %Y Rogers, Anna %Y Rumshisky, Anna %Y Tafreshi, Shabnam %S Proceedings of the Second Workshop on Insights from Negative Results in NLP %D 2021 %8 November %I Association for Computational Linguistics %C Online and Punta Cana, Dominican Republic %F bhargava-etal-2021-generalization %X Much of recent progress in NLU was shown to be due to models’ learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize. %R 10.18653/v1/2021.insights-1.18 %U https://aclanthology.org/2021.insights-1.18 %U https://doi.org/10.18653/v1/2021.insights-1.18 %P 125-135