Yue Zhang
Other people with similar names: Yue Zhang, Yue Zhang, Yue Zhang
Unverified author pages with similar names: Yue Zhang
2026
ELLA: Efficient Lifelong Learning for Adapters in Large Language Models
Shristi Das Biswas | Yue Zhang | Anwesan Pal | Radhika Bhargava | Kaushik Roy
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Shristi Das Biswas | Yue Zhang | Anwesan Pal | Radhika Bhargava | Kaushik Roy
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) suffer from severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and could potentially violate privacy, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively reducing the residual degrees of freedom and eliminating forward transfer by forbidding overlap in shared representations. In this work, we introduce ELLA, a training framework built on the principle of selective subspace de-correlation. Rather than forbidding all overlap, ELLA explicitly characterizes the structure of past updates and penalizes alignments along their high-energy, task-specific directions, while preserving freedom in the low-energy residual subspaces to enable transfer. Formally, this is realized via a lightweight regularizer on a single aggregated update matrix. This mechanism is proven to be an anisotropic shrinkage operator that bounds interference, yielding a penalty that is both memory- and compute-constant regardless of task sequence length. ELLA requires no data replay, no architectural expansion, and negligible storage. Empirically, it achieves state-of-the-art CL performance on three popular benchmarks spanning both classification and generative tasks, with relative accuracy gains of up to 9.6% and a 35× smaller memory footprint. Furthermore, ELLA scales robustly across architectures and actively enhances the model’s zero-shot generalization performance on unseen tasks, establishing a principled and scalable solution for constructive lifelong LLM adaptation.
2024
Prompt-Tuned Muti-Task Taxonomic Transformer (PTMTTaxoFormer)
Rajashekar Vasantha | Nhan Nguyen | Yue Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Rajashekar Vasantha | Nhan Nguyen | Yue Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Hierarchical Text Classification (HTC) is a subclass of multi-label classification, it is challenging because the hierarchy typically has a large number of diverse topics. Existing methods for HTC fall within two categories, local methods (a classifier for each level, node, or parent) or global methods (a single classifier for everything). Local methods are computationally expensive, whereas global methods often require complex explicit injection of the hierarchy, verbalizers, and/or prompt engineering. In this work, we propose Prompt Tuned Multi Task Taxonomic Transformer, a single classifier that uses a multi-task objective to predict one or more topics. The approach is capable of understanding the hierarchy during training without explicit injection, complex heads, verbalizers, or prompt engineering. PTMTTaxoFormer is a novel model architecture and training paradigm using differentiable prompts and labels that are learnt through backpropagation. PTMTTaxoFormer achieves state of the art results on several HTC benchmarks that span a range of topics consistently. Compared to most other HTC models, it has a simpler yet effective architecture, making it more production-friendly in terms of latency requirements (a factor of 2-5 lower latency). It is also robust and label-efficient, outperforming other models with 15%-50% less training data.