Guangli Li


2025

Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs, serving as a promising paradigm for accelerating model inference. Nevertheless, most large language models (LLMs) adopt activation functions without intrinsic activation sparsity (e.g., GELU and Swish). Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to pursue activation sparsity and acceleration, but few can simultaneously obtain high activation sparsity and comparable model performance. This paper introduces a simple and effective method named “ProSparse” to sparsify LLMs while achieving both targets. Specifically, after introducing ReLU activation, ProSparse adopts progressive sparsity regularization with a factor smoothly increasing for multiple stages. This can enhance activation sparsity and mitigate performance degradation by avoiding radical shifts in activation distributions. With ProSparse, we obtain high sparsity of 89.32% for LLaMA2-7B, 88.80% for LLaMA2-13B, and 87.89% for end-size MiniCPM-1B, respectively, with comparable performance to their original Swish-activated versions. These present the most sparsely activated models among open-source LLaMA versions and competitive end-size models. Inference acceleration experiments further demonstrate the significant practical acceleration potential of LLMs with higher activation sparsity, obtaining up to 4.52x inference speedup.
Automatic radiology report generation has attracted considerable attention with the rise of computer-aided diagnostic systems. Due to the inherent biases in medical imaging data, generating reports with precise clinical details is challenging yet crucial for accurate diagnosis. To this end, we design a disease description graph that encapsulates comprehensive and pertinent disease information. By aligning visual features with the graph, our model enhances the quality of the generated reports. Furthermore, we introduce a novel informed prompting method which increases the accuracy of short-gram predictions, acting as an implicit bag-of-words planning for surface realization. Notably, this informed prompt succeeds with a three-layer decoder, reducing the reliance on conventional prompting methods that require extensive model parameters. Extensive experiments on two widely-used datasets, IU-Xray and MIMIC-CXR, demonstrate that our method outperforms previous state-of-the-art models.