Minghua Ma
Also published as: MingHua Ma
2025
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
|
MingHua Ma
|
Zekun Wang
|
Jingchang Chen
|
Shan Liping
|
Qing Yang
|
Dongliang Xu
|
Ming Liu
|
Bing Qin
Proceedings of the 31st International Conference on Computational Linguistics
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward pass to compute feature activations. Specifically, we first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block. In addition, we introduce a recovery fine-tuning strategy that adaptively allocates training overhead based on coarse-grained importance to further improve performance. Experimental results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets. Our code will be available at https://github.com/wyxscir/CFSP.
2024
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation
Ruomeng Ding
|
Chaoyun Zhang
|
Lu Wang
|
Yong Xu
|
Minghua Ma
|
Wei Zhang
|
Si Qin
|
Saravan Rajmohan
|
Qingwei Lin
|
Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL 2024
This paper introduce a novel thought prompting approach called ”Everything of Thoughts” (XoT) for Large Language Models (LLMs) to defy the law of ”Penrose triangle” of existing thought paradigms, to achieve three key perspectives in thought generation simultaneously: performance, efficiency, and flexibility. XoT leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge and planning capability into thoughts, thereby enhancing LLMs’ decision-making capabilities. Through the MCTS-LLM collaborative thought revision framework, XoT autonomously produces high-quality comprehensive cognitive mappings with minimal LLM interactions. Additionally, XoT empowers LLMs to utilize flexible cognitive mappings for solving problems with multiple solutions.We evaluate XoT on several challenging problem-solving tasks, including Game of 24, 8-Puzzle, and Pocket Cube. Our results demonstrate that XoT significantly outperforms existing approaches in various dimensions, showcasing its remarkable proficiency in addressing complex problems across diverse domains. The data and code are available at https://github.com/microsoft/Everything-of-Thoughts-XoT.
Search
Fix data
Co-authors
- Jingchang Chen 1
- Ruomeng Ding 1
- Qingwei Lin 1
- Shan Liping 1
- Ming Liu 1
- show all...