Generating Realistic Natural Language Counterfactuals

Marcel Robeer, Floris Bex, Ad Feelders


Abstract
Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.
Anthology ID:
2021.findings-emnlp.306
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3611–3625
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.306
DOI:
10.18653/v1/2021.findings-emnlp.306
Bibkey:
Cite (ACL):
Marcel Robeer, Floris Bex, and Ad Feelders. 2021. Generating Realistic Natural Language Counterfactuals. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3611–3625, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Generating Realistic Natural Language Counterfactuals (Robeer et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.306.pdf
Software:
 2021.findings-emnlp.306.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.306.mp4
Data
SNLISSTSST-2