Evaluating adversarial attacks against multiple fact verification systems

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal


Abstract
Automated fact verification has been progressing owing to advancements in modeling and availability of large datasets. Due to the nature of the task, it is critical to understand the vulnerabilities of these systems against adversarial instances designed to make them predict incorrectly. We introduce two novel scoring metrics, attack potency and system resilience which take into account the correctness of the adversarial instances, an aspect often ignored in adversarial evaluations. We consider six fact verification systems from the recent Fact Extraction and VERification (FEVER) challenge: the four best-scoring ones and two baselines. We evaluate adversarial instances generated by a recently proposed state-of-the-art method, a paraphrasing method, and rule-based attacks devised for fact verification. We find that our rule-based attacks have higher potency, and that while the rankings among the top systems changed, they exhibited higher resilience than the baselines.
Anthology ID:
D19-1292
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2944–2953
Language:
URL:
https://aclanthology.org/D19-1292
DOI:
10.18653/v1/D19-1292
Bibkey:
Cite (ACL):
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2019. Evaluating adversarial attacks against multiple fact verification systems. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2944–2953, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Evaluating adversarial attacks against multiple fact verification systems (Thorne et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1292.pdf