Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts

Samuel Carton, Qiaozhu Mei, Paul Resnick


Abstract
We introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. Building on an existing architecture for extractive explanations via hard attention, we add an adversarial layer which scans the residual of the attention for remaining predictive signal. Motivated by the important domain of detecting personal attacks in social media comments, we additionally demonstrate the importance of manually setting a semantically appropriate “default” behavior for the model by explicitly manipulating its bias term. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes.
Anthology ID:
D18-1386
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3497–3507
Language:
URL:
https://aclanthology.org/D18-1386/
DOI:
10.18653/v1/D18-1386
Bibkey:
Cite (ACL):
Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2018. Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3497–3507, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts (Carton et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1386.pdf