Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Kerem Zaman, Leshem Choshen, Shashank Srivastava


Abstract
Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.
Anthology ID:
2024.emnlp-main.1045
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18763–18783
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1045
DOI:
Bibkey:
Cite (ACL):
Kerem Zaman, Leshem Choshen, and Shashank Srivastava. 2024. Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18763–18783, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion (Zaman et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1045.pdf
Software:
 2024.emnlp-main.1045.software.zip