On the Interplay between Fairness and Explainability

Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis


Abstract
In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable. Usually these two objectives, fairness and explainability, are optimized and/or examined independently of each other. Instead, we argue that forthcoming, trustworthy NLP systems should consider both.In this work, we perform a first study to understand how they influence each other: do fair(er) models rely on more plausible explanations? and vice versa. To this end, we conduct experiments on two English multi-class text classification datasets, BIOS and ECtHR, that provide information on gender and nationality, respectively, as well as human-annotated rationales. We fine-tune pre-trained language models with several methods for (i) bias mitigation, which aims to improve fairness; (ii) rationale extraction, which aims to produce plausible explanations.We find that bias mitigation algorithms do not always lead to fairer models. Moreover, in our analysis, we see that empirical fairness and explainability are orthogonal.
Anthology ID:
2024.trustnlp-1.10
Volume:
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Anaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–108
Language:
URL:
https://aclanthology.org/2024.trustnlp-1.10
DOI:
10.18653/v1/2024.trustnlp-1.10
Bibkey:
Cite (ACL):
Stephanie Brandl, Emanuele Bugliarello, and Ilias Chalkidis. 2024. On the Interplay between Fairness and Explainability. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 94–108, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
On the Interplay between Fairness and Explainability (Brandl et al., TrustNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.trustnlp-1.10.pdf