Jinghan Jia
2026
BLUR: A Bi-Level Optimization Approach for LLM Unlearning
Hadi Reisizadeh | Jinghan Jia | Zhiqi Bu | Bhanukiran Vinzamuri | Anil Ramakrishna | Kai-Wei Chang | Volkan Cevher | Sijia Liu | Mingyi Hong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Hadi Reisizadeh | Jinghan Jia | Zhiqi Bu | Bhanukiran Vinzamuri | Anil Ramakrishna | Kai-Wei Chang | Volkan Cevher | Sijia Liu | Mingyi Hong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which unlearns certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model’s utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics.
2025
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
Haomin Zhuang | Yihua Zhang | Kehan Guo | Jinghan Jia | Gaowen Liu | Sijia Liu | Xiangliang Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haomin Zhuang | Yihua Zhang | Kehan Guo | Jinghan Jia | Gaowen Liu | Sijia Liu | Xiangliang Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements in LLMs unlearning have shown remarkable success in removing unwanted data-model influences while preserving the model’s utility for legitimate knowledge. Despite these strides, sparse Mixture-of-Experts (MoE) LLMs–a key subset of the LLM family–have remained unexplored in the context of unlearning. As MoE LLMs are celebrated for their exceptional performance, we ask:How can unlearning be performed effectively and efficiently on MoE LLMs? Our pilot study shows that the dynamic routing nature of MoE LLMs introduces unique challenges, leading to excessive forgetting, uncontrolled knowledge erasure and substantial utility drops when existing unlearning methods are applied. To address this, we propose a novel Selected-Expert Unlearning Framework (SEUF). Through expert attribution, unlearning is concentrated on the most actively engaged experts for the specified knowledge. Concurrently, an anchor loss is applied to the router to stabilize the active state of this targeted expert, ensuring focused and controlled unlearning. SEUF is compatible with various standard unlearning algorithms. Extensive experiments demonstrate that SEUF enhances both forget quality up to 5% and model utility by 35% on MoE LLMs across various benchmarks and LLM architectures (compared to standard unlearning algorithms), while only unlearning 0.06% of the model parameters.
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Changsheng Wang | Chongyu Fan | Yihua Zhang | Jinghan Jia | Dennis Wei | Parikshit Ram | Nathalie Baracaldo | Sijia Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Changsheng Wang | Chongyu Fan | Yihua Zhang | Jinghan Jia | Dennis Wei | Parikshit Ram | Nathalie Baracaldo | Sijia Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent advances in large reasoning models (LRMs) have enabled strong multi-step reasoning capabilities. However, existing machine unlearning algorithms are tailored to standard language modeling and fail to address the unique challenges posed by LRMs. In this work, we present the first systematic study of LRM unlearning and reveal that conventional unlearning methods often overlook critical information leakage in reasoning traces, even when final answers are successfully removed. To address this, we propose Reasoning-aware Representation Misdirection for Unlearning (R2MU), a method that suppresses sensitive reasoning traces while preserving the model’s general reasoning ability. Our experiments demonstrate that R2MU significantly reduces reasoning trace leakage and achieves strong performance across both reasoning and safety benchmarks, including WMDP, StrongReject, JBB-Behaviors and WildJailbreak, under state-of-the-art models such as DeepSeek-R1-Distill-LLaMA-8B and DeepSeek-R1-Distill-Qwen-14B. To the best of our knowledge, MU is the first principled approach to both expose and mitigate reasoning trace leakage in LRM unlearning, while preserving reasoning ability.
2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia | Yihua Zhang | Yimeng Zhang | Jiancheng Liu | Bharat Runwal | James Diffenderfer | Bhavya Kailkhura | Sijia Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jinghan Jia | Yihua Zhang | Yimeng Zhang | Jiancheng Liu | Bharat Runwal | James Diffenderfer | Bhavya Kailkhura | Sijia Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility beyond the scope of unlearning. While interest in studying LLM unlearning is growing, the impact of the optimizer choice for LLM unlearning remains unexplored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between second-order optimization and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order optimization-based LLM unlearning framework, termed Second-Order UnLearning (SOUL), which extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, indicating that second-order optimization offers an effective and broadly applicable solution for LLM unlearning.
Leveraging LLMs for Dialogue Quality Measurement
Jinghan Jia | Abi Komma | Timothy Leffel | Xujun Peng | Ajay Nagesh | Tamer Soliman | Aram Galstyan | Anoop Kumar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Jinghan Jia | Abi Komma | Timothy Leffel | Xujun Peng | Ajay Nagesh | Tamer Soliman | Aram Galstyan | Anoop Kumar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zero- and few-shot capabilities across NLP tasks. Our paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and proprietary datasets. Manipulating factors such as model size, in-context examples, and selection techniques, we examine “chain-of-thought” (CoT) reasoning and label extraction procedures. Our results show that (1) larger models yield more accurate dialogue labels; (2) algorithmic selection of in-context examples outperforms random selection,; (3) CoT reasoning where an LLM is asked to provide justifications before outputting final labels improves performance; and (4) fine-tuned LLMs outperform out-of-the-box ones. In addition, we find that suitably tuned LLMs exhibit high accuracy in dialogue evaluation compared to human judgments.
Search
Fix author
Co-authors
- Sijia Liu 3
- Yihua Zhang 3
- Nathalie Baracaldo 1
- Zhiqi Bu 1
- Volkan Cevher 1
- Kai-Wei Chang 1
- James Diffenderfer 1
- Chongyu Fan 1
- Aram Galstyan 1
- Kehan Guo 1
- Mingyi Hong 1
- Bhavya Kailkhura 1
- Abi Komma 1
- Anoop Kumar 1
- Timothy Leffel 1
- Jiancheng Liu 1
- Gaowen Liu 1
- Sijia Liu 1
- Ajay Nagesh 1
- Xujun Peng 1
- Parikshit Ram 1
- Anil Ramakrishna 1
- Hadi Reisizadeh 1
- Bharat Runwal 1
- Tamer Soliman 1
- Bhanukiran Vinzamuri 1
- Changsheng Wang 1
- Dennis Wei 1
- Yimeng Zhang 1
- Xiangliang Zhang 1
- Haomin Zhuang 1