Zachary Taschdjian
2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
|
Nazneen Fatema Rajani
|
Jesse Vig
|
Zachary Taschdjian
|
Mohit Bansal
|
Christopher Ré
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations
Despite impressive performance on standard benchmarks, natural language processing (NLP) models are often brittle when deployed in real-world systems. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG), a simple and extensible evaluation toolkit that unifies 4 standard evaluation paradigms: subpopulations, transformations, evaluation sets, and adversarial attacks. By providing a common platform for evaluation, RG enables practitioners to compare results from disparate evaluation paradigms with a single click, and to easily develop and share novel evaluation methods using a built-in set of abstractions. RG is under active development and we welcome feedback & contributions from the community.
Search