Da Ma


2024

pdf bib
Sparsity-Accelerated Training for Large Language Models
Da Ma | Lu Chen | Pengyu Wang | Hongshen Xu | Hanqi Li | Liangtai Sun | Su Zhu | Shuai Fan | Kai Yu
Findings of the Association for Computational Linguistics: ACL 2024

Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging sparsity in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a 45% throughput improvement in continual pre-training and saves 38% training time in supervised fine-tuning. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training.

2021

pdf bib
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
Zhi Chen | Lu Chen | Hanqi Li | Ruisheng Cao | Da Ma | Mengyue Wu | Kai Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021