Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Shanshan Xu, Irina Broda, Rashid Haddad, Marco Negrini, Matthias Grabmair


Abstract
Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.
Anthology ID:
2022.nllp-1.21
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
238–245
Language:
URL:
https://aclanthology.org/2022.nllp-1.21
DOI:
10.18653/v1/2022.nllp-1.21
Bibkey:
Cite (ACL):
Shanshan Xu, Irina Broda, Rashid Haddad, Marco Negrini, and Matthias Grabmair. 2022. Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 238–245, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers (Xu et al., NLLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nllp-1.21.pdf