TestAug: A Framework for Augmenting Capability-based NLP Tests

Guanqun Yang, Mirazul Haque, Qiaochu Song, Wei Yang, Xueqing Liu


Abstract
The recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures for models with good held-out evaluation scores. However, existing work on capability-based testing requires the developer to compose each individual test template from scratch. Such approach thus requires extensive manual efforts and is less scalable. In this paper, we investigate a different approach that requires the developer to only annotate a few test templates, while leveraging the GPT-3 engine to generate the majority of test cases. While our approach saves the manual efforts by design, it guarantees the correctness of the generated suites with a validity checker. Moreover, our experimental results show that the test suites generated by GPT-3 are more diverse than the manually created ones; they can also be used to detect more errors compared to manually created counterparts. Our test suites can be downloaded at https://anonymous-researcher-nlp.github.io/testaug/.
Anthology ID:
2022.coling-1.307
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3480–3495
Language:
URL:
https://aclanthology.org/2022.coling-1.307
DOI:
Bibkey:
Cite (ACL):
Guanqun Yang, Mirazul Haque, Qiaochu Song, Wei Yang, and Xueqing Liu. 2022. TestAug: A Framework for Augmenting Capability-based NLP Tests. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3480–3495, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
TestAug: A Framework for Augmenting Capability-based NLP Tests (Yang et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.307.pdf
Code
 guanqun-yang/testaug
Data
HELPSST