Adaptive Testing and Debugging of NLP Models

Marco Tulio Ribeiro, Scott Lundberg


Abstract
Current approaches to testing and debugging NLP models rely on highly variable human creativity and extensive labor, or only work for a very restrictive class of bugs. We present AdaTest, a process which uses large scale language models (LMs) in partnership with human feedback to automatically write unit tests highlighting bugs in a target model. Such bugs are then addressed through an iterative text-fix-retest loop, inspired by traditional software development. In experiments with expert and non-expert users and commercial / research models for 8 different tasks, AdaTest makes users 5-10x more effective at finding bugs than current approaches, and helps users effectively fix bugs without adding new bugs.
Anthology ID:
2022.acl-long.230
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3253–3267
Language:
URL:
https://aclanthology.org/2022.acl-long.230
DOI:
10.18653/v1/2022.acl-long.230
Bibkey:
Cite (ACL):
Marco Tulio Ribeiro and Scott Lundberg. 2022. Adaptive Testing and Debugging of NLP Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3253–3267, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Adaptive Testing and Debugging of NLP Models (Ribeiro & Lundberg, ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.230.pdf
Data
PAWS