Jia-Chen Zhang

2025

pdf bib abs
Parameter-Efficient Fine-Tuning of Large Language Models via Deconvolution in Subspace
Jia-Chen Zhang | Yu-Jie Xiong | Chun-Ming Xia | Dong-Hai Zhu | Xi-He Qiu
Proceedings of the 31st International Conference on Computational Linguistics

This paper proposes a novel parameter-efficient fine-tuning method that combines the knowledge completion capability of deconvolution with the subspace learning ability, reducing the number of parameters required for fine-tuning by 8 times . Experimental results demonstrate that our method achieves superior training efficiency and performance compared to existing models.

pdf bib abs
Sugar-Coated Poison: Benign Generation Unlocks Jailbreaking
Yuhang Wu | Yu-Jie Xiong | Hao Zhang | Jia-Chen Zhang | Zheng Zhou
Findings of the Association for Computational Linguistics: EMNLP 2025

With the increasingly deep integration of large language models (LLMs) across diverse domains, the effectiveness of their safety mechanisms is encountering severe challenges. Currently, jailbreak attacks based on prompt engineering, which induce models to generate potentially harmful content, have become a major security threat. However, existing methods primarily rely on black-box manipulation of prompt templates, resulting in high costs and poor generalizability. To break through the bottleneck, this study reveals the potential impact of the generation of LLMs on safety for the first time that Defense Threshold Decay (DTD) phenomena: as benign content generation increases, the model’s attention to input instructions progressively diminishes. Building on this insight, we propose the Sugar-Coated Poison (SCP) attack paradigm, using a “semantic reversal” strategy, where benign inputs that are opposite in meaning to malicious intent are crafted to induce the model into a safety response mode. When the defense threshold decays, an adversarial reasoning mechanism easily bypasses safety mechanisms. Experiments show SCP outperforms existing baselines. For defense, we propose Part-of-Speech Defense (POSD), leveraging verb-noun dependencies for syntactic analysis to enhance robustness and security of LLMs. Our code is available at https://anonymous.4open.science/r/SCP-9092.

Co-authors

Zheng Zhou 1

Dong-Hai Zhu 1

Venues

coling1
findings1

Fix author