Numerical Optimizations for Weighted Low-rank Estimation on Language Models
Ting Hua | Yen-Chang Hsu | Felicity Wang | Qian Lou | Yilin Shen | Hongxia Jin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect the task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighed value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models.Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy.The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models.
CRYPTOGRU: Low Latency Privacy-Preserving Text Analysis With GRU
Bo Feng | Qian Lou | Lei Jiang | Geoffrey Fox
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Homomorphic encryption (HE) and garbled circuit (GC) provide the protection for users’ privacy. However, simply mixing the HE and GC in RNN models suffer from long inference latency due to slow activation functions. In this paper, we present a novel hybrid structure of HE and GC gated recurrent unit (GRU) network, , for low-latency secure inferences. replaces computationally expensive GC-based tanh with fast GC-based ReLU, and then quantizes sigmoid and ReLU to smaller bit-length to accelerate activations in a GRU. We evaluate with multiple GRU models trained on 4 public datasets. Experimental results show achieves top-notch accuracy and improves the secure inference latency by up to 138× over one of the state-of-the-art secure networks on the Penn Treebank dataset.
- Bo Feng 1
- Lei Jiang 1
- Geoffrey Fox 1
- Ting Hua 1
- Yen-Chang Hsu 1
- show all...