Mingyi Hong


2025

pdf bib
Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs
Sruthi Gorantla | Aditya Rawal | Devamanyu Hazarika | Kaixiang Lin | Mingyi Hong | Mahdi Namazifar
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce a zero-shot merging framework for large language models (LLMs) that consolidates specialized domain experts into a single model without any further training. Our core contribution lies in leveraging relative task vectors—difference representations encoding each expert’s unique traits with respect to a shared base model—to guide a principled and efficient merging process. By dissecting parameters into common dimensions (averaged across experts) and complementary dimensions (unique to each expert), we strike an optimal balance between generalization and specialization. We further devise a compression mechanism for the complementary parameters, retaining only principal components and scalar multipliers per expert, thereby minimizing overhead. A dynamic router then selects the most relevant domain at inference, ensuring that domain-specific precision is preserved. Experiments on code generation, mathematical reasoning, medical question answering, and instruction-following benchmarks confirm the versatility and effectiveness of our approach. Altogether, this framework enables truly adaptive and scalable LLMs that seamlessly integrate specialized knowledge for improved zero-shot performance.

pdf bib
LUME: LLM Unlearning with Multitask Evaluations
Anil Ramakrishna | Yixin Wan | Xiaomeng Jin | Kai-Wei Chang | Zhiqi Bu | Bhanukiran Vinzamuri | Volkan Cevher | Mingyi Hong | Rahul Gupta
Findings of the Association for Computational Linguistics: EMNLP 2025

Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark LUME that features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further release two fine-tuned LLMs of 1B and 7B parameter sizes as the target models. We conduct detailed evaluations of several recently-proposed algorithms and present results on carefully crafted metrics to understand their behavior and limitations.

pdf bib
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science
An Luo | Xun Xian | Jin Du | Fangqiao Tian | Ganghua Wang | Ming Zhong | Shengchun Zhao | Xuan Bi | Zirui Liu | Jiawei Zhou | Jayanth Srinivasa | Ashish Kundu | Charles Fleming | Mingyi Hong | Jie Ding
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) have advanced the automation of data science workflows. Yet it remains unclear whether they can critically leverage external domain knowledge as human data scientists do in practice. To answer this question, we introduce AssistedDS (Assisted Data Science), a benchmark designed to systematically evaluate how LLMs handle domain knowledge in tabular prediction tasks. AssistedDS features both synthetic datasets with explicitly known generative mechanisms and real-world Kaggle competitions, each accompanied by curated bundles of helpful and adversarial documents. These documents provide domain-specific insights into data cleaning, feature engineering, and model selection. We assess state-of-the-art LLMs on their ability to discern and apply beneficial versus harmful domain knowledge, evaluating submission validity, information recall, and predictive performance. Our results demonstrate three key findings: (1) LLMs frequently exhibit an uncritical adoption of provided information, significantly impairing their predictive performance when adversarial content is introduced, (2) helpful guidance is often insufficient to counteract the negative influence of adversarial information, and (3) in Kaggle datasets, LLMs often make errors in handling time-series data, applying consistent feature engineering across different folds, and interpreting categorical variables correctly. These findings highlight a substantial gap in current models’ ability to critically evaluate and leverage expert knowledge, underscoring an essential research direction for developing more robust, knowledge-aware automated data science systems. Our data and code are publicly available [here](https://github.com/jeremyxianx/Assisted-DS).

pdf bib
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
Xiaomeng Jin | Zhiqi Bu | Bhanukiran Vinzamuri | Anil Ramakrishna | Kai-Wei Chang | Volkan Cevher | Mingyi Hong
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Machine unlearning has been used to remove unwanted knowledge acquired by large language models (LLMs). In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance. In particular, we introduce a normalized gradient difference algorithm, enabling us to have better control over the trade-off between the objectives, while integrating a new, automatic learning rate scheduler. We provide a theoretical analysis and empirically demonstrate the superior performance of among state-of-the-art unlearning methods on the TOFU and MUSE datasets while exhibiting stable training.

pdf bib
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
Anil Ramakrishna | Yixin Wan | Xiaomeng Jin | Kai-Wei Chang | Zhiqi Bu | Bhanukiran Vinzamuri | Volkan Cevher | Mingyi Hong | Rahul Gupta
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We introduce SemEval-2025 Task 4: unlearn- ing sensitive content from Large Language Models (LLMs). The task features 3 subtasks for LLM unlearning spanning different use cases: (1) unlearn long form synthetic creative documents spanning different genres; (2) un- learn short form synthetic biographies contain- ing personally identifiable information (PII), in- cluding fake names, phone number, SSN, email and home addresses, and (3) unlearn real docu- ments sampled from the target model’s training dataset. We received over 100 submissions from over 30 institutions and we summarize the key techniques and lessons in this paper.