Chenkun Tan

Also published as: ChenKun Tan, 臣坤


2024

pdf bib
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
Pengyu Wang | Dong Zhang | Linyang Li | Chenkun Tan | Xinghao Wang | Mozhi Zhang | Ke Ren | Botian Jiang | Xipeng Qiu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

As large language models (LLMs) rapidly evolve, they are increasingly being customized through fine-tuning to suit the specific needs of various applications. A critical aspect of this advancement is the alignment process, which ensures that these models perform tasks in ways that align with human values and expectations. Current alignment methods, such as direct preference optimization (DPO) and reinforcement learning from human feedback (RLHF), focus primarily on alignment during training phase. However, these methods often involve complex and resource-intensive training processes, posing significant challenge for their implementation. Therefore, we propose InferAligner, a simple yet effective method for harmlessness alignment during inference phase. InferAligner decouples harmlessness from helpfulness. During the training phase, it focuses solely on enhancing the target model’s capabilities on downstream tasks. In the inference phase, it utilizes safety steering vectors extracted from the aligned model to guide the target model towards harmlessness alignment. Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics, as well as to multimodal large language models (MLLMs) such as LLaVA. It significantly diminishes the attack success rate (ASR) of both harmful instructions and jailbreak instructions, while maintaining almost unchanged performance in downstream tasks.

2023

pdf bib
CCL23-Eval任务4系统报告:基于深度学习的空间语义理解(System Report for CCL23-Eval Task4:Spatial Semantic Understanding Based on Deep Learning.)
ChenKun Tan (谭臣坤) | XianNian Hu (胡先念) | XinPeng Qiu (邱锡鹏)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“本文介绍了参赛系统在第三届中文空间语义理解评测(SpaCE2023)采用的技术路线:面向空间语义异常识别任务提出了抽取方法,并结合生成器进一步完成了空间语义角色标注任务,空间场景异同判断任务则使用了大语言模型生成。本文进一步探索了大语言模型在评测数据集上的应用,发现指令设计是未来工作的重点和难点。参赛系统的代码和模型见https://github.com/ShacklesLay/Space2023。”