Zachary Taschdjian


2021

pdf bib
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel | Nazneen Fatema Rajani | Jesse Vig | Zachary Taschdjian | Mohit Bansal | Christopher Ré
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

Despite impressive performance on standard benchmarks, natural language processing (NLP) models are often brittle when deployed in real-world systems. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG), a simple and extensible evaluation toolkit that unifies 4 standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks. By providing a common platform for evaluation, RG enables practitioners to compare results from disparate evaluation paradigms with a single click, and to easily develop and share novel evaluation methods using a built-in set of abstractions. RG is under active development and we welcome feedback & contributions from the community.