HILDIF: Interactive Debugging of NLI Models Using Influence Functions

Hugo Zylberajch, Piyawat Lertvittayakumjorn, Francesca Toni


Abstract
Biases and artifacts in training data can cause unwelcome behavior in text classifiers (such as shallow pattern matching), leading to lack of generalizability. One solution to this problem is to include users in the loop and leverage their feedback to improve models. We propose a novel explanatory debugging pipeline called HILDIF, enabling humans to improve deep text classifiers using influence functions as an explanation method. We experiment on the Natural Language Inference (NLI) task, showing that HILDIF can effectively alleviate artifact problems in fine-tuned BERT models and result in increased model generalizability.
Anthology ID:
2021.internlp-1.1
Volume:
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing
Month:
August
Year:
2021
Address:
Online
Editors:
Kianté Brantley, Soham Dan, Iryna Gurevych, Ji-Ung Lee, Filip Radlinski, Hinrich Schütze, Edwin Simpson, Lili Yu
Venue:
InterNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/2021.internlp-1.1
DOI:
10.18653/v1/2021.internlp-1.1
Bibkey:
Cite (ACL):
Hugo Zylberajch, Piyawat Lertvittayakumjorn, and Francesca Toni. 2021. HILDIF: Interactive Debugging of NLI Models Using Influence Functions. In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing, pages 1–6, Online. Association for Computational Linguistics.
Cite (Informal):
HILDIF: Interactive Debugging of NLI Models Using Influence Functions (Zylberajch et al., InterNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.internlp-1.1.pdf
Data
MultiNLI