Towards Stronger Adversarial Baselines Through Human-AI Collaboration

Wencong You; Daniel Lowd

doi:10.18653/v1/2022.nlppower-1.2

Towards Stronger Adversarial Baselines Through Human-AI Collaboration

Abstract

Natural language processing (NLP) systems are often used for adversarial tasks such as detecting spam, abuse, hate speech, and fake news. Properly evaluating such systems requires dynamic evaluation that searches for weaknesses in the model, rather than a static test set. Prior work has evaluated such models on both manually and automatically generated examples, but both approaches have limitations: manually constructed examples are time-consuming to create and are limited by the imagination and intuition of the creators, while automatically constructed examples are often ungrammatical or labeled inconsistently. We propose to combine human and AI expertise in generating adversarial examples, benefiting from humans’ expertise in language and automated attacks’ ability to probe the target system more quickly and thoroughly. We present a system that facilitates attack construction, combining human judgment with automated attacks to create better attacks more efficiently. Preliminary results from our own experimentation suggest that human-AI hybrid attacks are more effective than either human-only or AI-only attacks. A complete user study to validate these hypotheses is still pending.

Anthology ID:: 2022.nlppower-1.2
Volume:: Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Tatiana Shavrina, Vladislav Mikhailov, Valentin Malykh, Ekaterina Artemova, Oleg Serikov, Vitaly Protasov
Venue:: nlppower
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11–21
Language:
URL:: https://aclanthology.org/2022.nlppower-1.2/
DOI:: 10.18653/v1/2022.nlppower-1.2
Bibkey:
Cite (ACL):: Wencong You and Daniel Lowd. 2022. Towards Stronger Adversarial Baselines Through Human-AI Collaboration. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pages 11–21, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Towards Stronger Adversarial Baselines Through Human-AI Collaboration (You & Lowd, nlppower 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.nlppower-1.2.pdf
Video:: https://aclanthology.org/2022.nlppower-1.2.mp4

PDF Cite Search Video Fix data