Wanli Li

2025

Supervised fine-tuning (SFT) is widely adopted for tailoring large language models (LLMs) to specific downstream tasks. However, the substantial computational demands of LLMs hinder iterative exploration of fine-tuning datasets and accurate evaluation of individual sample importance. To address this challenge, we introduce Meta-LoRA, a memory-efficient method for automatic sample reweighting. Meta-LoRA learns to reweight fine-tuning samples by minimizing the loss on a small, high-quality validation set through an end-to-end bi-level optimization framework based on meta-learning. To reduce memory usage associated with computing second derivatives, we approximate the bi-level optimization using gradient similarity between training and validation datasets, replacing bi-dimensional gradient similarity with the product of one-dimensional activation states and their corresponding gradients. Further memory optimization is achieved by refining gradient computations, selectively applying them to the low-rank layers of LoRA, which results in as little as 4% additional memory usage. Comprehensive evaluations across benchmark datasets in mathematics, coding, and medical domains demonstrate Meta-LoRA’s superior efficacy and efficiency. The source code is available at https://github.com/liweicheng-ai/meta-lora.

2022

pdf bib abs
Graph-based Model Generation for Few-Shot Relation Extraction
Wanli Li | Tieyun Qian
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Few-shot relation extraction (FSRE) has been a challenging problem since it only has a handful of training instances. Existing models follow a ‘one-for-all’ scheme where one general large model performs all individual N-way-K-shot tasks in FSRE, which prevents the model from achieving the optimal point on each task. In view of this, we propose a model generation framework that consists of one general model for all tasks and many tiny task-specific models for each individual task. The general model generates and passes the universal knowledge to the tiny models which will be further fine-tuned when performing specific tasks. In this way, we decouple the complexity of the entire task space from that of all individual tasks while absorbing the universal knowledge.Extensive experimental results on two public datasets demonstrate that our framework reaches a new state-of-the-art performance for FRSE tasks. Our code is available at: https://github.com/NLPWM-WHU/GM_GEN.

Co-authors

Lixin Zou 1

Venues

coling1
emnlp1

Fix data