Pang Wei Koh
2024
Instructional Fingerprinting of Large Language Models
Jiashu Xu
|
Fei Wang
|
Mingyu Ma
|
Pang Wei Koh
|
Chaowei Xiao
|
Muhao Chen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (eg restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License.
2020
ExpBERT: Representation Engineering with Natural Language Explanations
Shikhar Murty
|
Pang Wei Koh
|
Percy Liang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text. In this paper, we allow model developers to specify these types of inductive biases as natural language explanations. We use BERT fine-tuned on MultiNLI to “interpret” these explanations with respect to the input sentence, producing explanation-guided representations of the input. Across three relation extraction tasks, our method, ExpBERT, matches a BERT baseline but with 3–20x less labeled data and improves on the baseline by 3–10 F1 points with the same amount of labeled data.
Search
Co-authors
- Shikhar Murty 1
- Percy Liang 1
- Jiashu Xu 1
- Fei Wang 1
- Mingyu Ma 1
- show all...