Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness

Ashim Gupta, Rishanth Rajendhran, Nathan Stringham, Vivek Srikumar, Ana Marasovic


Abstract
*Do larger and more performant models resolve NLP’s longstanding robustness issues?* We investigate this question using over 20 models of different sizes spanning different architectural choices and pretraining objectives. We conduct evaluations using (a) out-of-domain and challenge test sets, (b) behavioral testing with CheckLists, (c) contrast sets, and (d) adversarial inputs. Our analysis reveals that not all out-of-domain tests provide insight into robustness. Evaluating with CheckLists and contrast sets shows significant gaps in model performance; merely scaling models does not make them adequately robust. Finally, we point out that current approaches for adversarial evaluations of models are themselves problematic: they can be easily thwarted, and in their current forms, do not represent a sufficiently deep probe of model robustness. We conclude that not only is the question of robustness in NLP as yet unresolved, but even some of the approaches to measure robustness need to be reassessed.
Anthology ID:
2024.naacl-long.310
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5533–5590
Language:
URL:
https://aclanthology.org/2024.naacl-long.310
DOI:
10.18653/v1/2024.naacl-long.310
Bibkey:
Cite (ACL):
Ashim Gupta, Rishanth Rajendhran, Nathan Stringham, Vivek Srikumar, and Ana Marasovic. 2024. Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5533–5590, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness (Gupta et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.310.pdf