Liangtai Sun
2024
Sparsity-Accelerated Training for Large Language Models
Da Ma
|
Lu Chen
|
Pengyu Wang
|
Hongshen Xu
|
Hanqi Li
|
Liangtai Sun
|
Su Zhu
|
Shuai Fan
|
Kai Yu
Findings of the Association for Computational Linguistics ACL 2024
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging sparsity in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a 45% throughput improvement in continual pre-training and saves 38% training time in supervised fine-tuning. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training.
2022
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun
|
Xingyu Chen
|
Lu Chen
|
Tianle Dai
|
Zichen Zhu
|
Kai Yu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Task-oriented dialogue (TOD) systems have been widely used by mobile phone intelligent assistants to accomplish tasks such as calendar scheduling or hotel reservation. Current TOD systems usually focus on multi-turn text/speech interaction, then they would call back-end APIs designed for TODs to perform the task. However, this API-based architecture greatly limits the information-searching capability of intelligent assistants and may even lead to task failure if TOD-specific APIs are not available or the task is too complicated to be executed by the provided APIs. In this paper, we propose a new TOD architecture: GUI-based task-oriented dialogue system (GUI-TOD). A GUI-TOD system can directly perform GUI operations on real APPs and execute tasks without invoking TOD-specific backend APIs. Furthermore, we release META-GUI, a dataset for training a Multi-modal convErsaTional Agent on mobile GUI. We also propose a multi-model action prediction and response model, which show promising results on META-GUI. The dataset, codes and leaderboard are publicly available.
Search
Co-authors
- Lu Chen 2
- Kai Yu 2
- Xingyu Chen 1
- Tianle Dai 1
- Zichen Zhu 1
- show all...