Jean-Noël Vittaut
Also published as: Jean-noel Vittaut
2024
Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations
Milan Bhan
|
Jean-Noël Vittaut
|
Nicolas Chesneau
|
Marie-Jeanne Lesot
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Incorporating natural language rationales in the prompt and In-Context Learning (ICL) have led to a significant improvement of Large Language Models (LLMs) performance. However, generating high-quality rationales require human-annotation or the use of auxiliary proxy models. In this work, we propose Self-AMPLIFY to automatically generate rationales from post hoc explanation methods applied to Small Language Models (SLMs) to improve their own performance. Self-AMPLIFY is a 3-step method that targets samples, generates rationales and builds a final prompt to leverage ICL. Self-AMPLIFY performance is evaluated on four SLMs and five datasets requiring strong reasoning abilities. Self-AMPLIFY achieves good results against competitors, leading to strong accuracy improvement. Self-AMPLIFY is the first method to apply post hoc explanation methods to autoregressive language models to generate rationales to improve their own performance in a fully automated manner.
2023
Enhancing textual counterfactual explanation intelligibility through Counterfactual Feature Importance
Milan Bhan
|
Jean-noel Vittaut
|
Nicolas Chesneau
|
Marie-jeanne Lesot
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Textual counterfactual examples explain a prediction by modifying the tokens of an initial instance in order to flip the outcome of a classifier. Even under sparsity constraint, counterfactual generation can lead to numerous changes from the initial text, making the explanation hard to understand. We propose Counterfactual Feature Importance, a method to make non-sparse counterfactual explanations more intelligible. Counterfactual Feature Importance assesses token change importance between an instance to explain and its counterfactual example. We develop two ways of computing Counterfactual Feature Importance, respectively based on classifier gradient computation and counterfactual generator loss evolution during counterfactual search. Then we design a global version of Counterfactual Feature Importance, providing rich information about semantic fields globally impacting classifier predictions. Counterfactual Feature Importance enables to focus on impacting parts of counterfactual explanations, making counterfactual explanations involving numerous changes more understandable.