Towards Fair Dataset Distillation for Text Classification

Xudong Han, Aili Shen, Yitong Li, Lea Frermann, Timothy Baldwin, Trevor Cohn


Abstract
With the growing prevalence of large-scale language models, their energy footprint and potential to learn and amplify historical biases are two pressing challenges. Dataset distillation (DD) — a method for reducing the dataset size by learning a small number of synthetic samples which encode the information in the original dataset — is a method for reducing the cost of model training, however its impact on fairness has not been studied. We investigate how DD impacts on group bias, with experiments over two language classification tasks, concluding that vanilla DD preserves the bias of the dataset. We then show how existing debiasing methods can be combined with DD to produce models that are fair and accurate, at reduced training cost.
Anthology ID:
2022.sustainlp-1.13
Volume:
Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Angela Fan, Iryna Gurevych, Yufang Hou, Zornitsa Kozareva, Sasha Luccioni, Nafise Sadat Moosavi, Sujith Ravi, Gyuwan Kim, Roy Schwartz, Andreas Rücklé
Venue:
sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
65–72
Language:
URL:
https://aclanthology.org/2022.sustainlp-1.13
DOI:
10.18653/v1/2022.sustainlp-1.13
Bibkey:
Cite (ACL):
Xudong Han, Aili Shen, Yitong Li, Lea Frermann, Timothy Baldwin, and Trevor Cohn. 2022. Towards Fair Dataset Distillation for Text Classification. In Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 65–72, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Towards Fair Dataset Distillation for Text Classification (Han et al., sustainlp 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sustainlp-1.13.pdf
Video:
 https://aclanthology.org/2022.sustainlp-1.13.mp4