Optimizing Lifelong Fine-Tuning for Multiple Tasks via Dataless Distribution Replay

Zhenxing Wang

Optimizing Lifelong Fine-Tuning for Multiple Tasks via Dataless Distribution Replay

Abstract

The recent emergence of various large language models, which can be fine-tuned with minimal instruction data, has demonstrated impressive performance across various tasks. However, a phenomenon of forgetting occurs during life- long fine-tuning because training on new tasks interferes with the previously acquired knowl- edge. To mitigate catastrophic forgetting, con- ventional data replay methods achieve high per- formance, but at the cost of compromising data privacy and security. This paper introduces a dataless distribution replay approach for life- long fine-tuning. Concretely, the distribution distillation is applied to replay the output dis- tribution of the linear layers at previous task stages. The optimal solution for this distri- bution replay can be directly computed using the retained inner product matrix of the input data, thereby eliminating the need for previ- ous data. Additionally, Singular Value Decom- position (SVD) and module accumulation are employed to further enhance the performance of dataless distribution replay method. Finally, the evaluation is conducted in a lifelong fine- tuning scenario involving multiple tasks. The experimental results and analysis show that the proposed method achieves significant improve- ments compared to several strong lifelong fine- tuning methods.

Anthology ID:: 2025.coling-main.746
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11261–11273
Language:
URL:: https://aclanthology.org/2025.coling-main.746/
DOI:
Bibkey:
Cite (ACL):: Zhenxing Wang. 2025. Optimizing Lifelong Fine-Tuning for Multiple Tasks via Dataless Distribution Replay. In Proceedings of the 31st International Conference on Computational Linguistics, pages 11261–11273, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Optimizing Lifelong Fine-Tuning for Multiple Tasks via Dataless Distribution Replay (Wang, COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.746.pdf

PDF Cite Search Fix data