Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic Rishabh Bhardwaj author Duc Anh Do author Soujanya Poria author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication bhardwaj-etal-2024-language 10.18653/v1/2024.acl-long.762 https://aclanthology.org/2024.acl-long.762/ 2024-08 14138 14149