2024
pdf
bib
abs
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Guanchu Wang
|
Yu-Neng Chuang
|
Ruixiang Tang
|
Shaochen Zhong
|
Jiayi Yuan
|
Hongye Jin
|
Zirui Liu
|
Vipin Chaudhary
|
Shuai Xu
|
James Caverlee
|
Xia Hu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. Specifically, TaylorMLP preserves the ownership of LLMs by transforming the weights of LLMs into parameters of Taylor-series. Instead of releasing the original weights, developers can release the Taylor-series parameters with users, thereby ensuring the security of LLMs. Moreover, TaylorMLP can prevent abuse of LLMs by adjusting the generation speed. It can induce low-speed token generation for the protected LLMs by increasing the terms in the Taylor-series. This intentional delay helps LLM developers prevent potential large-scale unauthorized uses of their models. Empirical experiments across five datasets and three LLM architectures demonstrate that TaylorMLP induces over increase in latency, producing the tokens precisely matched with original LLMs. Subsequent defensive experiments further confirm that TaylorMLP effectively prevents users from reconstructing the weight values based on downstream datasets.
pdf
bib
abs
Secure Your Model: An Effective Key Prompt Protection Mechanism for Large Language Models
Ruixiang Tang
|
Yu-Neng Chuang
|
Xuanting Cai
|
Mengnan Du
|
Xia Hu
Findings of the Association for Computational Linguistics: NAACL 2024
Large language models (LLMs) have notably revolutionized many domains within natural language processing due to their exceptional performance. Their security has become increasingly vital. This study is centered on protecting LLMs against unauthorized access and potential theft. We propose a simple yet effective protective measure wherein a unique key prompt is embedded within the LLM. This mechanism enables the model to respond only when presented with the correct key prompt; otherwise, LLMs will refuse to react to any input instructions. This key prompt protection offers a robust solution to prevent the unauthorized use of LLMs, as the model becomes unusable without the correct key. We evaluated the proposed protection on multiple LLMs and NLP tasks. Results demonstrate that our method can successfully protect the LLM without significantly impacting the model’s original function. Moreover, we demonstrate potential attacks that attempt to bypass the protection mechanism will adversely affect the model’s performance, further emphasizing the effectiveness of the proposed protection method.
pdf
bib
abs
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
|
Hongyi Liu
|
Shaochen Zhong
|
Yu-Neng Chuang
|
Songchen Li
|
Guanchu Wang
|
Duy Le
|
Hongye Jin
|
Vipin Chaudhary
|
Zhaozhuo Xu
|
Zirui Liu
|
Xia Hu
Findings of the Association for Computational Linguistics: EMNLP 2024
Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the growing size of the KV cache and the intrinsic complexity of attending to extended inputs; where multiple schools of efficiency-driven approaches — such as KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures — have been proposed to produce efficient yet long context-capable models. Despite these advancements, no existing work has comprehensively benchmarked these methods in a reasonably aligned environment. In this work, we fill this gap by providing a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks. Our work reveals numerous previously unknown phenomena and offers insights — as well as a friendly workbench — for the future development of long context-capable LLMs. The source code is available at https://github.com/henryzhongsc/longctx_bench.
pdf
bib
abs
Learning to Compress Prompt in Natural Language Formats
Yu-Neng Chuang
|
Tianwei Xing
|
Chia-Yuan Chang
|
Zirui Liu
|
Xun Chen
|
Xia Hu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.
2023
pdf
bib
abs
Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks
Ruixiang Tang
|
Gord Lueck
|
Rodolfo Quispe
|
Huseyin Inan
|
Janardhan Kulkarni
|
Xia Hu
Findings of the Association for Computational Linguistics: EMNLP 2023
Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model’s API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model’s resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
pdf
bib
abs
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du
|
Subhabrata Mukherjee
|
Yu Cheng
|
Milad Shokouhi
|
Xia Hu
|
Ahmed Hassan Awadallah
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty.
2021
pdf
bib
abs
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
Mengnan Du
|
Varun Manjunatha
|
Rajiv Jain
|
Ruchi Deshpande
|
Franck Dernoncourt
|
Jiuxiang Gu
|
Tong Sun
|
Xia Hu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Recent studies indicate that NLU models are prone to rely on shortcut features for prediction, without achieving true language understanding. As a result, these models fail to generalize to real-world out-of-distribution data. In this work, we show that the words in the NLU training set can be modeled as a long-tailed distribution. There are two findings: 1) NLU models have strong preference for features located at the head of the long-tailed distribution, and 2) Shortcut features are picked up during very early few iterations of the model training. These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LGTR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental results on three NLU benchmarks demonstrate that our long-tailed distribution explanation accurately reflects the shortcut learning behavior of NLU models. Experimental analysis further indicates that LGTR can improve the generalization accuracy on OOD data, while preserving the accuracy on in-distribution data.
2012
pdf
bib
A Semi-Supervised Bayesian Network Model for Microblog Topic Classification
Yan Chen
|
Zhoujun Li
|
Liqiang Nie
|
Xia Hu
|
Xiangyu Wang
|
Tat-Seng Chua
|
Xiaoming Zhang
Proceedings of COLING 2012
2009
pdf
bib
Query Segmentation Based on Eigenspace Similarity
Chao Zhang
|
Nan Sun
|
Xia Hu
|
Tingzhu Huang
|
Tat-Seng Chua
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers