Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack

Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack Yu Fu author Yufei Li author Wen Xiao author Cong Liu author Yue Dong author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication fu-etal-2024-safety 10.18653/v1/2024.acl-long.461 https://aclanthology.org/2024.acl-long.461/ 2024-08 8483 8502