What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations

What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations Zhouhang Xie author Jonathan Brophy author Adam Noack author Wencong You author Kalyani Asthana author Carter Perkins author Sabrina Reis author Zayd Hammoudeh author Daniel Lowd author Sameer Singh author 2021-11 text Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP Jasmijn Bastings editor Yonatan Belinkov editor Emmanuel Dupoux editor Mario Giulianelli editor Dieuwke Hupkes editor Yuval Pinter editor Hassan Sajjad editor Association for Computational Linguistics Punta Cana, Dominican Republic conference publication xie-etal-2021-models 10.18653/v1/2021.blackboxnlp-1.6 https://aclanthology.org/2021.blackboxnlp-1.6/ 2021-11 69 78