Shuai Fan
2024
Sparsity-Accelerated Training for Large Language Models
Da Ma
|
Lu Chen
|
Pengyu Wang
|
Hongshen Xu
|
Hanqi Li
|
Liangtai Sun
|
Su Zhu
|
Shuai Fan
|
Kai Yu
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging sparsity in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a 45% throughput improvement in continual pre-training and saves 38% training time in supervised fine-tuning. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training.
2022
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis
Shuai Fan
|
Chen Lin
|
Haonan Li
|
Zhenghao Lin
|
Jinsong Su
|
Hang Zhang
|
Yeyun Gong
|
JIan Guo
|
Nan Duan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information. In this paper, we propose SentiWSP, a novel Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.The word level pre-training task detects replaced sentiment words, via a generator-discriminator framework, to enhance the PLM’s knowledge about sentiment words.The sentence level pre-training task further strengthens the discriminator via a contrastive learning framework, with similar sentences as negative samples, to encode sentiments in a sentence.Extensive experimental results show that SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks. We have made our code and model publicly available at https://github.com/XMUDM/SentiWSP.
Search
Co-authors
- Da Ma 1
- Lu Chen 1
- Pengyu Wang 1
- Hongshen Xu 1
- Hanqi Li 1
- show all...