Rethinking Evaluation Methods for Machine Unlearning

Leon Wichert, Sandipan Sikdar


Abstract
Machine *unlearning* refers to methods for deleting information about specific training instances from a trained machine learning model. This enables models to delete user information and comply with privacy regulations. While retraining the model from scratch on the training set excluding the instances to be “*forgotten*” would result in a desired unlearned model, owing to the size of datasets and models, it is infeasible. Hence, unlearning algorithms have been developed, where the goal is to obtain an unlearned model that behaves as closely as possible to the retrained model. Consequently, evaluating an unlearning method involves - (i) randomly selecting a *forget* set (i.e., the training instances to be unlearned), (ii) obtaining an unlearned and a retrained model, and (iii) comparing the performance of the unlearned and the retrained model on the test and forget set. However, when the forget set is randomly selected, the unlearned model is almost often similar to the original (i.e., prior to unlearning) model. Hence, it is unclear if the model did really unlearn or simply copied the weights from the original model. For a more robust evaluation, we instead propose to consider training instances with significant influence on the trained model. When such influential instances are considered in the forget set, we observe that the unlearned model deviates significantly from the retrained model. Such deviations are also observed when the size of the forget set is increased. Lastly, choice of dataset for evaluation could also lead to misleading interpretation of results.
Anthology ID:
2024.findings-emnlp.271
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4727–4739
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.271
DOI:
Bibkey:
Cite (ACL):
Leon Wichert and Sandipan Sikdar. 2024. Rethinking Evaluation Methods for Machine Unlearning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4727–4739, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Rethinking Evaluation Methods for Machine Unlearning (Wichert & Sikdar, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.271.pdf